ACERGION

>Domain

>Summary

This writeup documents the full analysis of a malicious PDF report.pdf located in this challenge. The PDF contained embedded JavaScript which in turn contained a .NET serialized payload that attempted to deploy cryptocurrency mining malware. The CTF flag was embedded inside the miner's command-line parameters.

Flag: MetaCTF{I_4m_n0t_@_m1n3r_1_@m_a_b4nk5m4n}

>Goals

Extract and inspect any embedded JavaScript in report.pdf.
Decode any embedded payloads (base64, hex, UTF-16) and search for indicators and the flag.
Provide extracted scripts and commands used during analysis so the results are reproducible.

>Tools used

poppler-utils (pdfinfo)
strings
grep
Python 3 (with small helper scripts)

>High-level steps

Identify suspicious PDF features (JavaScript present).
Extract JavaScript with pdfinfo -js report.pdf.
Locate var serialized_obj = "..." inside the JavaScript.
Concatenate multi-line quoted base64 strings into a single base64 blob.
Decode the base64 blob (it was a .NET BinaryFormatter serialized object).
Search decoded binary for human-readable strings and UTF-16-encoded text.
Locate miner commandline string that contained the flag.

>Key commands and why

Run these from the challenge directory (/home/noigel/Desktop/RSTCONctf/Forensics/Domain):

bash

# Check PDF for JavaScript
pdfinfo report.pdf

# Extract JavaScript embedded in the PDF
pdfinfo -js report.pdf > javascript_output.txt

# Quick scan for serialized object variable
grep -n "serialized_obj" javascript_output.txt

# Decode the base64 payload using provided Python scripts (see below)
python3 decode_payload.py
python3 extract_strings.py
python3 final_decode.py

>Scripts used (full source)

Below are the Python helper scripts used during the analysis. Each script is included verbatim.

decode_payload.py

python

#!/usr/bin/env python3
import re
import base64

# Read the JavaScript file
with open('javascript_output.txt', 'r') as f:
    content = f.read()

# Extract the serialized_obj variable value
# It spans multiple lines with "+" concatenation
# Format is: var serialized_obj = "string1"+\n"string2"+\n"string3"+...;
match = re.search(r'var serialized_obj = (.*?);', content, re.DOTALL)
if match:
    raw_value = match.group(1)
    # Extract all quoted strings and concatenate them
    strings = re.findall(r'"([^\"]*)"', raw_value)
    base64_data = ''.join(strings)
    
    print(f"Base64 data length: {len(base64_data)}")
    
    try:
        # Add padding if needed
        missing_padding = len(base64_data) % 4
        if missing_padding:
            base64_data += '=' * (4 - missing_padding)
        
        # Decode the base64
        decoded = base64.b64decode(base64_data)
        print(f"Decoded binary length: {len(decoded)} bytes")
        
        # Convert to string and search for the flag
        decoded_str = decoded.decode('latin-1')  # Use latin-1 to handle binary data
        
        # Search for MetaCTF flag pattern
        flag_match = re.search(r'MetaCTF\{[^}]+\}', decoded_str)
        if flag_match:
            print(f"\n🚩 FLAG FOUND: {flag_match.group(0)}")
        else:
            print("\n❌ Flag not found in decoded payload")
            print("\n🔍 Looking for interesting strings...")
            if 'miner' in decoded_str.lower():
                print("✓ Found 'miner' references")
            if 'metactf' in decoded_str.lower():
                print("✓ Found 'metactf' references")
                idx = decoded_str.lower().find('metactf')
                if idx != -1:
                    context = decoded_str[max(0, idx-50):min(len(decoded_str), idx+100)]
                    print(f"Context: {repr(context)}")
    except Exception as e:
        print(f"Error decoding base64: {e}")
else:
    print("❌ Could not find serialized_obj variable")

extract_strings.py

python

#!/usr/bin/env python3
import re
import base64

# Read the JavaScript file
with open('javascript_output.txt', 'r') as f:
    content = f.read()

# Extract the serialized_obj variable value
match = re.search(r'var serialized_obj = (.*?);', content, re.DOTALL)
if match:
    raw_value = match.group(1)
    strings = re.findall(r'"([^\"]*)"', raw_value)
    base64_data = ''.join(strings)
    
    # Add padding if needed
    missing_padding = len(base64_data) % 4
    if missing_padding:
        base64_data += '=' * (4 - missing_padding)
    
    # Decode the base64
    decoded = base64.b64decode(base64_data)
    
    # Save the raw binary
    with open('decoded_payload.bin', 'wb') as f:
        f.write(decoded)
    print(f"✓ Saved decoded payload to decoded_payload.bin ({len(decoded)} bytes)")
    
    # Now search for strings in it (simple printable extraction)
    import string
    printable = set(bytes(string.printable, 'ascii'))
    
    min_length = 10
    current_string = []
    found_strings = []
    
    for byte in decoded:
        if byte in printable:
            current_string.append(chr(byte))
        else:
            if len(current_string) >= min_length:
                s = ''.join(current_string)
                found_strings.append(s)
            current_string = []
    
    if len(current_string) >= min_length:
        found_strings.append(''.join(current_string))
    
    print(f"Found {len(found_strings)} strings")
    
    for s in found_strings:
        if 'MetaCTF' in s or 'm1n3r' in s or 'b4nk' in s:
            print(f"🚩 Interesting: {s[:200]}")

final_decode.py

python

#!/usr/bin/env python3
import base64
import re

# The flag is in these base64 chunks (extracted from the tail section)
base64_chunks = [
    "AG0AcwAuAGMAbwBtACAALQB1ACAAbQBpAG4AZQByACAALQBwACAATQBlAHQAYQBDAFQARgB7AEkA",
    "XwA0AG0AXwBuADAAdABfAEAAXwBtADEAbgAzAHIAXwAxAF8AQABtAF8AYQBfAGIANABuAGsANQBt",
    "ADQAbgB9ACAALQBSACAALQAtAHYAYQByAGkAYQBuAHQAPQAtADEAIAAB",
]

# Decode each chunk and combine
full_base64 = ''.join(base64_chunks)
decoded_bytes = base64.b64decode(full_base64)

# Extract printable ASCII characters from UTF-16 LE encoded data
ascii_chars = []
for i in range(1, len(decoded_bytes), 2):  # Take the second byte of each UTF-16 LE pair
    if i < len(decoded_bytes):
        char = decoded_bytes[i]
        if 32 <= char <= 126:
            ascii_chars.append(chr(char))
        else:
            ascii_chars.append('?')

text = ''.join(ascii_chars)
print(f"Decoded text: {text}")

# Extract the flag
flag_match = re.search(r'MetaCTF\{[^}]+\}', text)
if flag_match:
    flag = flag_match.group(0)
    print("\nFLAG FOUND:\n")
    print(flag)

>Findings / Analysis

The key artifact was serialized_obj inside the JavaScript. It is a base64-encoded .NET BinaryFormatter payload.
The BinaryFormatter contents include UTF-16-LE strings (typical of .NET) which, when decoded, revealed a miner command line that included the CTF flag as the -p parameter.
The final extraction required decoding specific base64 chunks and interpreting them as UTF-16-LE encoded text, then extracting ASCII characters.

>Reproducibility

Ensure you have poppler-utils installed (for pdfinfo).
Run pdfinfo -js report.pdf > javascript_output.txt to extract the JavaScript.
Run the provided Python scripts in order: decode_payload.py, extract_strings.py, then final_decode.py.

>Notes & Mitigations

Malicious PDFs can contain JavaScript that leverages Windows COM/ActiveX to spin up .NET components and deserialize crafted payloads. Treat PDFs with embedded JavaScript as potentially dangerous. Analyze in an isolated VM.
BinaryFormatter deserialization is unsafe; never deserialize untrusted data in .NET.
Keep software and PDF readers patched and disable JavaScript in PDF viewers when possible.

>Appendix — Useful commands recap

bash

pdfinfo -js report.pdf > javascript_output.txt
grep -n "serialized_obj" javascript_output.txt
python3 decode_payload.py
python3 extract_strings.py
python3 final_decode.py