Deep Analysis of GCleaner
Howdy! I’m finally back with another malware deep dive report. This time we are digging into GCleaner.
GCleaner is a Pay-Per-Install (PPI) loader first discovered in early 2019, it has been used to deploy other malicious families like Smokeloader, Amadey, Redline and Raccoon.
We will be working on this sample:
(SHA256: 020d370b51711b0814901d7cc32d8251affcc3506b9b4c15db659f3dbb6a2e6b
)
Initial Triage
Let’s start by running the sample in Triage sandbox to get an overview of what it does.
We can see from the process tree that it drops and runs another binary out of "%APPDATA%"
folder with a seemingly random name then it kills itself using "taskkill"
and deletes the sample binary from disk.
The network tab shows communications to different IP addresses which are considered as C2 servers in Triage’s malware config tab. Each C2 has a different URL path, we will dig deeper to find out what each of them is responsible for.
Right when we open the sample in IDA we don’t have much to look at, there are some interesting strings and API imports but not very helpful to start with.
We can see a repeated pattern across the code where some values are pushed into the stack then xored with 0x2E
, so we first need to decrypt these values.
String Decryption
Automating the decryption for stack strings in this sample can be a bit tricky, luckily I noticed a specific instruction that occurs after loading the encrypted strings into stack (cmp eax, [reg+4]
).
So we can find all occurrences of this instruction then walk back to find the mov
instructions and get the encrypted values. Let’s apply this to an IDA python script.
# Lowest address used in the program
addr = idc.get_inf_attr(INF_MIN_EA)
while True:
# Search for "cmp eax, [reg+4]"
addr = ida_search.find_binary(addr, idc.BADADDR, "3B ?? 04 00 00 00", 16, ida_search.SEARCH_NEXT | ida_search.SEARCH_DOWN)
if addr == idc.BADADDR:
break
enc_bytes = b''
# Search for possible stack strings in the previous 12 instructions
for i in range(12):
ea = idc.prev_head(ea)
if (idc.print_insn_mnem(ea) == "mov" and
idc.get_operand_type(ea, 0) == idc.o_displ and
idc.get_operand_type(ea, 1) == idc.o_imm):
# Get the value of the second operand
operand_value = idc.get_operand_value(ea, 1)
The returned operand value is an integer but we need to store it as a byte array, so we first need to figure out the size of that operand to store it correctly.
# Get the size of the second operand
insn = ida_ua.insn_t()
ida_ua.decode_insn(insn, ea)
operand_size = ida_ua.get_dtype_size(insn.Op2.dtype)
# Specify the correct data type
if operand_size == 4:
operand_bytes = struct.pack("<I", operand_value)
elif operand_size == 2:
operand_bytes = struct.pack("<H", operand_value)
else:
operand_bytes = struct.pack("<B", operand_value)
enc_bytes = operand_bytes + enc_bytes
One more thing I noticed is that some strings use a combination of stack values and other values stored in the ".rdata"
section (retrieved using the XMM instruction "movaps"
).
So we can search for this "movaps"
instruction after the "cmp"
instruction, if found we can read the values stored at its operand address and append it to the encrypted bytes.
# Find possible xmmword movaps
xmmword_addr = ida_search.find_binary(addr, addr+50, pattern2, 16, ida_search.SEARCH_NEXT | ida_search.SEARCH_DOWN)
if xmmword_addr != idc.BADADDR:
# Read the xmmword value
xmmword_value = idc.get_bytes(get_operand_value(xmmword_addr, 1), 16)
enc_bytes = xmmword_value + enc_bytes
Finally we can xor the encrypted values with 0x2E
(this key has been the same for all GCleaner samples I looked at).
# Decrypt and strip encrypted bytes
dec_bytes = bytes(c ^ 0x2E for c in enc_bytes)
dec_str = dec_bytes.strip(b'\x00').decode('utf-8')
if len(dec_str) != 0:
print(f"{hex(addr)} --> {dec_str}")
# Set a comment with the decrypted string
if dec_str and comment_addr != idc.BADADDR:
set_comment(comment_addr, dec_str)
Here is the list of decrypted strings:
Expand to see more
45.12.253.56
45.12.253.72
45.12.253.98
45.12.253.75/dll.php
mixinte
mixtwo
BUSERPROFILE
CCleaner
VLC media player
Acrobat Reader DC
Russian
admin
Shah
testBench
taskmgr
Taskmgr
wireshark
Process Hacker
Wireshark
C:\Program Files
C:\ProgramData
C:\Temp
C:\Program Files
C:\ProgramData
C:\Temp
/advertisting/plus.php?s=
&str=mixtwo
&substr=
/default/stuk.php
/default/puk.php
NOSUB
chk
/chk
test
We can now see the C2 IPs, URL paths and some other interesting strings. Let’s keep going.
Anti Checks (or is it..?)
GCleaner is filled with host checks but weirdly enough it doesn’t do anything them, maybe they were like test features? copy-paste code? not really sure but let’s quickly go though them.
Checking username
Get the current username using "GetUserNameA()"
and compare it to hardcoded names ("admin"
, "Shah"
, "testBench"
).
Checking foreground window
Get the title of the foreground window using "GetWindowTextA()"
and compare it to hardcoded strings.
Checking desktop files
Search for Desktop files with specific strings in their name ("CCleaner"
, "VLC media player"
, "Acrobat Reader DC"
).
Checking locale and keyboard layout
Check if the computer locale is Russian and compare the keyboard layout against specific values (CIS countries).
Dropped Binary
Looking back at the process tree we need to figure out where does that child binary with random name comes from.
"%APPDATA%\{846ee340-7039-11de-9d20-806e6f6e6963}\34LMAylZs6FixF.exe"
We can see below that the sample reads the "%APPDATA%"
path using "getenv()"
then creates a random directory using the GUID of the current hardware profile, if retrieving the hardware profile failed it will fall back to generating a random folder name. Other possible locations for creating the random directory are "C:\Program Files"
, "C:\Temp"
, "C:\ProgramData"
(fallback locations).
Next it generates a random file name, appends ".exe"
extension to it then drops it to the newly created directory and runs it from there.
The binary file is hardcoded into the parent sample.
All that binary child does is…well…sleep for 10 seconds, that’s it :|
C2 Communications
The actors behind GCleaner have been known to use BraZZZers fast flux service to hide their infrastructure, it works more like a proxy system between the victims and the real C2 server.
Before reaching out to the C2 servers, GCleaner adds hardcoded HTTP headers (could be used for a network sig) an a custom user-agent to each C2 request.
Now to figure out what each C2 request is responsible for.
First C2
- IP:
45[.]12.253.56
- UA:
OK
- PCAP:
This C2 is likely responsible for bot registration. The sample will only continue execution if the server response is "0"
or "1"
, otherwise it goes to sleep and tries again.
The "str"
and "substr"
parameters in the C2 request above are possibly referring to the campaign ID, GCleaner has been known to use similar values in the past like "usone"
, "ustwo"
, "euthree"
, "cafive"
, "mixshop"
, …
Second C2
- IP:
45[.]12.253.72
- UA:
OK
- PCAP:
The first request to this C2 is responsible for getting an AES key.
The key length must be between 10 and 100 bytes, otherwise it breaks the execution.
The second request is responsible for getting an AES encrypted PE file (notice the filename in the response headers!), That PE file is decrypted using the key from the previous request.
The decryption routine is pretty trivial, the sample first calculates the SHA256 hash of the server key then derives the session key used for decryption (AES_128).
After that it loads the decrypted PE file into memory (without touching disk) to get the address of an export function called "GetLicInfo"
which is used in the next stage.
Downloaded DLL
Before going further we first need to take a look at the downloaded PE file. To be able to analyze it we can either use the debugger to dump the decrypted file or get the encrypted response from the PCAP and decrypt it manually.
We can easily implement the decryption code in Python as follow:
import hashlib
from Crypto.Cipher import AES
enc = open("puk.php.bin", "rb").read()
key = "kvQoRqtcCyMtHmQyQXOUu".encode("utf-16le") # Important to encode!!
sha256_hash = hashlib.sha256(key)
aes_key = sha256_hash.digest()[:16]
cipher = AES.new(aes_key, mode=AES.MODE_CBC, IV=b"\x00"*16)
dec = cipher.decrypt(enc)
open("out.bin", "wb").write(dec)
Now let’s see what this export function "GetLicInfo"
does.
Basically it sends an http request to the supplied C2 server then checks the response length, if the length is greater than 2048 bytes it creates a a new directory with a random name under "%APPDATA%"
or "%TEMP%"
folder then generates a random filename and appends ".exe"
extension to it.
Finally it writes the server response to a disk file with the generated random filename and executes that file.
Third C2
- IP:
45[.]12.253.75
- UA:
B
- PCAP:
This C2 is responsible for downloading further payloads, notice the user-agent used here is the one from the decrypted strings list unlike the previous 2 C2s.
The address is supplied to the external function "GetLicInfo"
which downloads and executes the payload as we stated above. GCleaner tries to get a payload from the server for 10 iterations with a sleep period of 2 seconds between every try.
If no further payload is received from the server the samples kills its process and deletes the parent file from disk.
Forth C2
- IP:
45[.]12.253.98
This C2 wasn’t used in the sample we are looking at.
Config Extraction
We can use the IDA python script we used for string decryption to build a standalone config extractor as most of the interesting stuff are in the decrypted strings list.
Here’s the output of the code after extracting the useful information:
The code can be found here.
(this script is not optimized for production, it’s just for research purposes)
Hunting
Urlscan
The URL path of the first C2 request can be a good candidate to hunt for more C2s on urlscan.
I looked at more samples and found these two URL patterns:
s=NOSUB&str=...&substr=...
sub=NOSUB&stream=...&substream=...
So we can use the "page.url"
field to search for the first part of these patterns.
Yara
We saw that many strings were encrypted but we can use some of the hardcoded ones to create a simple yara rule for hunting more samples.
rule GCleaner {
meta:
description = "Detects GCleaner payload"
author = "Abdallah Elshinbary (@_n1ghtw0lf)"
hash1 = "020d370b51711b0814901d7cc32d8251affcc3506b9b4c15db659f3dbb6a2e6b"
hash2 = "73ed1926e850a9a076a8078932e76e1ac5f109581996dd007f00681ae4024baa"
strings:
// Kill self
$s1 = "\" & exit" ascii fullword
$s2 = "\" /f & erase " ascii fullword
$s3 = "/c taskkill /im \"" ascii fullword
// Anti checks
$s4 = " Far " ascii fullword
$s5 = "roxifier" ascii fullword
$s6 = "HTTP Analyzer" ascii fullword
$s7 = "Wireshark" ascii fullword
$s8 = "NetworkMiner" ascii fullword
// HTTP headers
$s9 = "Accept-Language: ru-RU,ru;q=0.9,en;q=0.8" ascii fullword
$s10 = "Accept-Charset: iso-8859-1, utf-8, utf-16, *;q=0.1" ascii fullword
$s11 = "Accept-Encoding: deflate, gzip, x-gzip, identity, *;q=0" ascii fullword
$s12 = "Accept: text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1" ascii fullword
condition:
uint16(0) == 0x5a4d and
10 of them
}