>Domain
>Summary
This writeup documents the full analysis of a malicious PDF report.pdf located in this challenge. The PDF contained embedded JavaScript which in turn contained a .NET serialized payload that attempted to deploy cryptocurrency mining malware. The CTF flag was embedded inside the miner's command-line parameters.
Flag: MetaCTF{I_4m_n0t_@_m1n3r_1_@m_a_b4nk5m4n}
>Goals
- Extract and inspect any embedded JavaScript in
report.pdf. - Decode any embedded payloads (base64, hex, UTF-16) and search for indicators and the flag.
- Provide extracted scripts and commands used during analysis so the results are reproducible.
>Tools used
- poppler-utils (pdfinfo)
- strings
- grep
- Python 3 (with small helper scripts)
>High-level steps
- Identify suspicious PDF features (JavaScript present).
- Extract JavaScript with
pdfinfo -js report.pdf. - Locate
var serialized_obj = "..."inside the JavaScript. - Concatenate multi-line quoted base64 strings into a single base64 blob.
- Decode the base64 blob (it was a .NET BinaryFormatter serialized object).
- Search decoded binary for human-readable strings and UTF-16-encoded text.
- Locate miner commandline string that contained the flag.
>Key commands and why
Run these from the challenge directory (/home/noigel/Desktop/RSTCONctf/Forensics/Domain):
bash
# Check PDF for JavaScript
pdfinfo report.pdf
# Extract JavaScript embedded in the PDF
pdfinfo -js report.pdf > javascript_output.txt
# Quick scan for serialized object variable
grep -n "serialized_obj" javascript_output.txt
# Decode the base64 payload using provided Python scripts (see below)
python3 decode_payload.py
python3 extract_strings.py
python3 final_decode.py
>Scripts used (full source)
Below are the Python helper scripts used during the analysis. Each script is included verbatim.
decode_payload.py
python
#!/usr/bin/env python3
import re
import base64
# Read the JavaScript file
with open('javascript_output.txt', 'r') as f:
content = f.read()
# Extract the serialized_obj variable value
# It spans multiple lines with "+" concatenation
# Format is: var serialized_obj = "string1"+\n"string2"+\n"string3"+...;
match = re.search(r'var serialized_obj = (.*?);', content, re.DOTALL)
if match:
raw_value = match.group(1)
# Extract all quoted strings and concatenate them
strings = re.findall(r'"([^\"]*)"', raw_value)
base64_data = ''.join(strings)
print(f"Base64 data length: {len(base64_data)}")
try:
# Add padding if needed
missing_padding = len(base64_data) % 4
if missing_padding:
base64_data += '=' * (4 - missing_padding)
# Decode the base64
decoded = base64.b64decode(base64_data)
print(f"Decoded binary length: {len(decoded)} bytes")
# Convert to string and search for the flag
decoded_str = decoded.decode('latin-1') # Use latin-1 to handle binary data
# Search for MetaCTF flag pattern
flag_match = re.search(r'MetaCTF\{[^}]+\}', decoded_str)
if flag_match:
print(f"\nš© FLAG FOUND: {flag_match.group(0)}")
else:
print("\nā Flag not found in decoded payload")
print("\nš Looking for interesting strings...")
if 'miner' in decoded_str.lower():
print("ā Found 'miner' references")
if 'metactf' in decoded_str.lower():
print("ā Found 'metactf' references")
idx = decoded_str.lower().find('metactf')
if idx != -1:
context = decoded_str[max(0, idx-50):min(len(decoded_str), idx+100)]
print(f"Context: {repr(context)}")
except Exception as e:
print(f"Error decoding base64: {e}")
else:
print("ā Could not find serialized_obj variable")
extract_strings.py
python
#!/usr/bin/env python3
import re
import base64
# Read the JavaScript file
with open('javascript_output.txt', 'r') as f:
content = f.read()
# Extract the serialized_obj variable value
match = re.search(r'var serialized_obj = (.*?);', content, re.DOTALL)
if match:
raw_value = match.group(1)
strings = re.findall(r'"([^\"]*)"', raw_value)
base64_data = ''.join(strings)
# Add padding if needed
missing_padding = len(base64_data) % 4
if missing_padding:
base64_data += '=' * (4 - missing_padding)
# Decode the base64
decoded = base64.b64decode(base64_data)
# Save the raw binary
with open('decoded_payload.bin', 'wb') as f:
f.write(decoded)
print(f"ā Saved decoded payload to decoded_payload.bin ({len(decoded)} bytes)")
# Now search for strings in it (simple printable extraction)
import string
printable = set(bytes(string.printable, 'ascii'))
min_length = 10
current_string = []
found_strings = []
for byte in decoded:
if byte in printable:
current_string.append(chr(byte))
else:
if len(current_string) >= min_length:
s = ''.join(current_string)
found_strings.append(s)
current_string = []
if len(current_string) >= min_length:
found_strings.append(''.join(current_string))
print(f"Found {len(found_strings)} strings")
for s in found_strings:
if 'MetaCTF' in s or 'm1n3r' in s or 'b4nk' in s:
print(f"š© Interesting: {s[:200]}")
final_decode.py
python
#!/usr/bin/env python3
import base64
import re
# The flag is in these base64 chunks (extracted from the tail section)
base64_chunks = [
"AG0AcwAuAGMAbwBtACAALQB1ACAAbQBpAG4AZQByACAALQBwACAATQBlAHQAYQBDAFQARgB7AEkA",
"XwA0AG0AXwBuADAAdABfAEAAXwBtADEAbgAzAHIAXwAxAF8AQABtAF8AYQBfAGIANABuAGsANQBt",
"ADQAbgB9ACAALQBSACAALQAtAHYAYQByAGkAYQBuAHQAPQAtADEAIAAB",
]
# Decode each chunk and combine
full_base64 = ''.join(base64_chunks)
decoded_bytes = base64.b64decode(full_base64)
# Extract printable ASCII characters from UTF-16 LE encoded data
ascii_chars = []
for i in range(1, len(decoded_bytes), 2): # Take the second byte of each UTF-16 LE pair
if i < len(decoded_bytes):
char = decoded_bytes[i]
if 32 <= char <= 126:
ascii_chars.append(chr(char))
else:
ascii_chars.append('?')
text = ''.join(ascii_chars)
print(f"Decoded text: {text}")
# Extract the flag
flag_match = re.search(r'MetaCTF\{[^}]+\}', text)
if flag_match:
flag = flag_match.group(0)
print("\nFLAG FOUND:\n")
print(flag)
>Findings / Analysis
- The key artifact was
serialized_objinside the JavaScript. It is a base64-encoded .NET BinaryFormatter payload. - The BinaryFormatter contents include UTF-16-LE strings (typical of .NET) which, when decoded, revealed a miner command line that included the CTF flag as the
-pparameter. - The final extraction required decoding specific base64 chunks and interpreting them as UTF-16-LE encoded text, then extracting ASCII characters.
>Reproducibility
- Ensure you have
poppler-utilsinstalled (forpdfinfo). - Run
pdfinfo -js report.pdf > javascript_output.txtto extract the JavaScript. - Run the provided Python scripts in order:
decode_payload.py,extract_strings.py, thenfinal_decode.py.
>Notes & Mitigations
- Malicious PDFs can contain JavaScript that leverages Windows COM/ActiveX to spin up .NET components and deserialize crafted payloads. Treat PDFs with embedded JavaScript as potentially dangerous. Analyze in an isolated VM.
- BinaryFormatter deserialization is unsafe; never deserialize untrusted data in .NET.
- Keep software and PDF readers patched and disable JavaScript in PDF viewers when possible.
>Appendix ā Useful commands recap
bash
pdfinfo -js report.pdf > javascript_output.txt
grep -n "serialized_obj" javascript_output.txt
python3 decode_payload.py
python3 extract_strings.py
python3 final_decode.py