Skip to content

SECURE_CONNECTION//PRESS[CTRL+J]FOR ROOT ACCESS

BACK TO INTEL
ForensicsMedium

Excavator

CTF writeup for Excavator from Bsides

//Excavator

>TL;DR

We’re given a partially corrupted Linux core dump from a crashed stager. The stager was staging a ZIP, then decrypting config.bin using a saved stream-cipher state.

The trick is:

  1. Carve the embedded ZIP out of the core using the ZIP EOCD signature.
  2. Repair the ZIP because the malware intentionally zeroed the first 30 bytes of the ZIP local header (memset(zip_buf, 0, 30)), making the archive look “corrupt”.
  3. Extract config.bin from the repaired ZIP.
  4. Recover the stream cipher state from memory: the struct state { s[256], i, j } is effectively RC4 state.
  5. Decrypt config.bin using the recovered RC4 state.

Recovered plaintext config:

{!!!Y0u_D1d_I7_l1k3_4n_3xC4vat0r!_w3eeell_d0ooone!!!}

Flag:

shellmates{!!!Y0u_D1d_I7_l1k3_4n_3xC4vat0r!_w3eeell_d0ooone!!!}

>Files Provided

The challenge ZIP contained:

  • core.522044 — 64-bit Linux ELF core dump (x86_64)
  • stager.c — pseudo-code / reconstructed C showing what the malware did

>Step 0 — Read the clue: stager.c

The most important lines are:

  • It loads a persistent cipher state:
c

struct state {

    uint8_t s[256];

    uint8_t i;

    uint8_t j;

};

load_state("state", &ctx);
  • It loads a ZIP into memory (zip_buf), then runs unzip on /tmp/config.zip (downloaded earlier) and wipes the first 30 bytes of zip_buf:
c

uint8_t *zip_buf = load("config.zip", &zip_len);

...

memset(zip_buf, 0, 30);
  • It loads config.bin, deletes the files, then crashes before/while decrypting:
c

uint8_t *enc = load("config.bin", &enc_len);

...

if (decrypt(&ctx, enc, enc_len) != 0) { ... }

startInfoStealer(enc, enc_len);

Why this matters

This tells us exactly what should be inside the core dump:

  • A heap buffer containing config.zip (or parts of it)
  • A heap buffer containing config.bin
  • A ctx structure with a 256-byte permutation and two indices

Also, the memset(zip_buf, 0, 30) is a massive hint: 30 bytes is the fixed-size portion of a ZIP Local File Header.

That clue is what made me go after ZIP header repair instead of “generic carving”.


>Step 1 — Quick recon: strings + ZIP signatures

A core dump is just a big memory snapshot. A fast first pass is:

  • strings -a core.522044 to see if filenames, magic bytes, or fragments exist

In the output, we saw:

  • config.binPK
  • config.bin

That PK is the classic ZIP marker (0x4b50 little endian), so I pivoted to ZIP structure carving.


>Step 2 — Carve ZIPs reliably using EOCD

Why EOCD?

ZIP files are easiest to recover from the end, because the End Of Central Directory record contains:

  • central directory size
  • central directory offset

So even if the beginning is damaged, EOCD often survives.

The EOCD signature is:

  • PK\\x05\\x06

Strategy:

  1. Scan the core for PK\\x05\\x06
  2. Parse the EOCD fields
  3. Compute:

$$\text{zip_start} = \text{eocd_offset} - (\text{cd_offset} + \text{cd_size})$$

  1. Extract [zip_start : eocd_end] as a candidate ZIP blob

I implemented this in a small helper:

  • carve_zip_from_core.py

Running it produced small candidate ZIP blobs.


>Step 3 — Why the carved ZIP “looks corrupt”

When testing the carved ZIP with unzip -t, it failed with:

  • bad zipfile offset (local header sig)

This is exactly what you’d expect if the ZIP central directory is present (end of file) but the local file header at the start is broken.

Now re-check the malware clue:

  • memset(zip_buf, 0, 30)

And from the ZIP spec:

  • the fixed part of the local file header is 30 bytes

So the archive isn’t “randomly corrupted”; it’s intentionally sabotaged in-memory.


>Step 4 — Repair the zeroed local header from the central directory

ZIP has redundancy:

  • The Central Directory File Header (CDFH) repeats most important metadata.

So we can reconstruct the local header fields (version needed, flags, method, timestamps, CRC, compressed/uncompressed size, filename length, extra length) from the CDFH.

In this challenge the ZIP is tiny and contains only config.bin, so a narrow repair is sufficient:

  1. Find EOCD and central directory offset
  2. Parse the first CDFH (PK\\x01\\x02)
  3. Build the 30-byte local header fixed structure:
  • signature PK\\x03\\x04
  • fields copied from central directory
  1. Replace bytes zip[0:30]

I wrote this as:

  • repair_zeroed_local_header.py

After repair, unzip -t succeeded and we extracted config.bin.


>Step 5 — Identify the cipher: struct state screams RC4

The stager’s struct state is:

  • s[256] — a permutation of bytes 0..255
  • i, j — two 8-bit counters

This matches the classic RC4 internal state (array S and indices i,j).

This matters because the core dump likely contains the live state used for decryption.

Why brute-force scanning for the state works

A random 256-byte window almost never forms a full permutation of 0..255.

So a practical technique is:

  1. Slide across the core bytes
  2. For each offset, test if the next 256 bytes are a permutation
  3. If yes, read the next 2 bytes as i and j
  4. Use RC4 PRGA with that state to decrypt config.bin
  5. Look for something that looks like a config/flag

I used a “printable ratio” heuristic and/or direct substring checks.


>Step 6 — End-to-end solver

After extracting config.bin, the combined solver (solve_excavator.py) found a valid RC4 state at a specific core offset and produced plaintext:

{!!!Y0u_D1d_I7_l1k3_4n_3xC4vat0r!_w3eeell_d0ooone!!!}

That becomes the flag by wrapping in the required format:

shellmates{!!!Y0u_D1d_I7_l1k3_4n_3xC4vat0r!_w3eeell_d0ooone!!!}

>How to reproduce (commands)

From the challenge directory:

bash

# unzip challenge

unzip -q excavator.zip -d extracted

# run the full solver

python3 solve_excavator.py extracted/core.522044

# if you want to see intermediate artifacts:

python3 carve_zip_from_core.py extracted/core.522044 carved_zips

python3 repair_zeroed_local_header.py carved_zips/<one>.zip repaired.zip

unzip -o repaired.zip -d recovered

xxd recovered/config.bin

>Solver Code (all scripts)

Below are the exact scripts used.

1) carve_zip_from_core.py

python

#!/usr/bin/env python3

"""Carve ZIP blobs from an ELF core by locating EOCD records.

This is a small CTF helper: given a core file, find all occurrences of the

ZIP End-Of-Central-Directory signature (PK\\x05\\x06), parse the EOCD, and

attempt to carve the full ZIP range based on central directory offset/size.

Usage:

  python3 carve_zip_from_core.py extracted/core.522044 out_dir

"""

from __future__ import annotations

import os

import struct

import sys

from dataclasses import dataclass

EOCD_SIG = b"PK\\x05\\x06"

@dataclass(frozen=True)

class Eocd:

    offset: int

    disk_no: int

    cd_start_disk: int

    disk_entries: int

    total_entries: int

    cd_size: int

    cd_offset: int

    comment_len: int

def parse_eocd(buf: bytes, off: int) -> Eocd | None:

    # EOCD fixed header is 22 bytes.

    if off < 0 or off + 22 > len(buf):

        return None

    if buf[off : off + 4] != EOCD_SIG:

        return None

    (disk_no, cd_start_disk, disk_entries, total_entries, cd_size, cd_offset, comment_len) = struct.unpack_from(

        "<HHHHIIH", buf, off + 4

    )

    if off + 22 + comment_len > len(buf):

        return None

    # sanity bounds

    if cd_size > len(buf) or cd_offset > len(buf):

        return None

    return Eocd(

        offset=off,

        disk_no=disk_no,

        cd_start_disk=cd_start_disk,

        disk_entries=disk_entries,

        total_entries=total_entries,

        cd_size=cd_size,

        cd_offset=cd_offset,

        comment_len=comment_len,

    )

def carve_zip(buf: bytes, eocd: Eocd) -> tuple[int, int] | None:

    # EOCD begins at zip_start + cd_offset + cd_size

    zip_start = eocd.offset - (eocd.cd_offset + eocd.cd_size)

    zip_end = eocd.offset + 22 + eocd.comment_len

    if zip_start < 0 or zip_end > len(buf) or zip_start >= zip_end:

        return None

    # Basic sanity: central directory signature should be present.

    cd_start = zip_start + eocd.cd_offset

    if cd_start < 0 or cd_start + 4 > len(buf):

        return None

    if buf[cd_start : cd_start + 2] != b"PK":

        return None

    return zip_start, zip_end

def main() -> int:

    if len(sys.argv) != 3:

        print(__doc__.strip())

        return 2

    core_path = sys.argv[1]

    out_dir = sys.argv[2]

    with open(core_path, "rb") as f:

        buf = f.read()

    os.makedirs(out_dir, exist_ok=True)

    hits = []

    start = 0

    while True:

        idx = buf.find(EOCD_SIG, start)

        if idx == -1:

            break

        e = parse_eocd(buf, idx)

        if e is not None:

            rng = carve_zip(buf, e)

            if rng is not None:

                hits.append((e, rng))

        start = idx + 1

    if not hits:

        print("No carveable EOCD records found.")

        return 1

    for n, (e, (zs, ze)) in enumerate(hits, 1):

        out_path = os.path.join(out_dir, f"carved_{n}_start{zs}_end{ze}.zip")

        with open(out_path, "wb") as f:

            f.write(buf[zs:ze])

        print(

            f"[{n}] wrote {out_path} (len={ze-zs}) eocd_off={e.offset} cd_off={e.cd_offset} cd_size={e.cd_size} entries={e.total_entries}"

        )

    return 0

if __name__ == "__main__":

    raise SystemExit(main())

2) repair_zeroed_local_header.py

python

#!/usr/bin/env python3

"""Repair a ZIP whose first 30 bytes (local header fixed fields) were zeroed.

This matches the common CTF pattern where `memset(buf, 0, 30)` clobbers the

local-file-header fixed portion but leaves filename/extra/data intact.

We reconstruct the local header fixed fields from the first central directory

record and write a repaired zip.

Usage:

  python3 repair_zeroed_local_header.py in.zip out.zip

"""

from __future__ import annotations

import struct

import sys

CD_SIG = b"PK\\x01\\x02"

LFH_SIG = b"PK\\x03\\x04"

EOCD_SIG = b"PK\\x05\\x06"

def main() -> int:

    if len(sys.argv) != 3:

        print(__doc__.strip())

        return 2

    in_path, out_path = sys.argv[1], sys.argv[2]

    data = bytearray(open(in_path, "rb").read())

    eocd = data.rfind(EOCD_SIG)

    if eocd == -1:

        raise SystemExit("EOCD not found")

    # EOCD: sig(4) + H H H H I I H

    disk_no, cd_start_disk, disk_entries, total_entries, cd_size, cd_offset, comment_len = struct.unpack_from(

        "<HHHHIIH", data, eocd + 4

    )

    if total_entries < 1:

        raise SystemExit("No central directory entries")

    cd = cd_offset

    if data[cd : cd + 4] != CD_SIG:

        raise SystemExit("Central directory signature not found at cd_offset")

    # Central directory fixed fields (46 bytes after signature)

    (

        ver_made,

        ver_needed,

        flags,

        method,

        mtime,

        mdate,

        crc32,

        comp_size,

        uncomp_size,

        fname_len,

        extra_len,

        comment_len2,

        disk_start,

        int_attr,

        ext_attr,

        lfh_offset,

    ) = struct.unpack_from("<HHHHHHIIIHHHHHII", data, cd + 4)

    if lfh_offset != 0:

        # This script is intentionally narrow: the memset in the challenge hits the start.

        raise SystemExit(f"Unexpected local header offset {lfh_offset}; expected 0")

    # Build local file header fixed part (30 bytes)

    lfh_fixed = struct.pack(

        "<4sHHHHHIIIHH",

        LFH_SIG,

        ver_needed,

        flags,

        method,

        mtime,

        mdate,

        crc32,

        comp_size,

        uncomp_size,

        fname_len,

        extra_len,

    )

    if len(lfh_fixed) != 30:

        raise AssertionError("LFH fixed size mismatch")

    data[0:30] = lfh_fixed

    with open(out_path, "wb") as f:

        f.write(data)

    return 0

if __name__ == "__main__":

    raise SystemExit(main())

3) solve_excavator.py

python

#!/usr/bin/env python3

"""Solve the 'Excavator' challenge by recovering config from a damaged core.

Approach:

1) Extract a ZIP blob from the core whose local header was zeroed.

2) Repair the ZIP local header using central directory metadata.

3) Extract config.bin (ciphertext).

4) Scan the core for RC4 state candidates (a 256-byte permutation + i/j).

5) Decrypt config.bin with each candidate; stop when flag/config is found.

This script is intentionally self-contained and conservative.

Usage:

  python3 solve_excavator.py extracted/core.522044

Outputs recovered plaintext to ./recovered_config.txt when found.

"""

from __future__ import annotations

import os

import struct

import sys

from dataclasses import dataclass

from typing import Iterable

EOCD_SIG = b"PK\\x05\\x06"

CD_SIG = b"PK\\x01\\x02"

LFH_SIG = b"PK\\x03\\x04"

@dataclass(frozen=True)

class ZipCarve:

    zip_start: int

    zip_end: int

    eocd_off: int

    cd_off: int

    cd_size: int

def find_all(buf: bytes, needle: bytes) -> Iterable[int]:

    start = 0

    while True:

        idx = buf.find(needle, start)

        if idx == -1:

            return

        yield idx

        start = idx + 1

def parse_eocd(buf: bytes, off: int):

    if off + 22 > len(buf):

        return None

    if buf[off : off + 4] != EOCD_SIG:

        return None

    disk_no, cd_start_disk, disk_entries, total_entries, cd_size, cd_off, comment_len = struct.unpack_from(

        "<HHHHIIH", buf, off + 4

    )

    if off + 22 + comment_len > len(buf):

        return None

    if cd_off + cd_size > len(buf):

        return None

    return (cd_off, cd_size, comment_len, total_entries)

def carve_zips(core: bytes) -> list[ZipCarve]:

    out: list[ZipCarve] = []

    for eocd in find_all(core, EOCD_SIG):

        parsed = parse_eocd(core, eocd)

        if not parsed:

            continue

        cd_off, cd_size, comment_len, total_entries = parsed

        if total_entries < 1:

            continue

        zip_start = eocd - (cd_off + cd_size)

        zip_end = eocd + 22 + comment_len

        if zip_start < 0 or zip_end > len(core):

            continue

        cd_start = zip_start + cd_off

        if core[cd_start : cd_start + 4] != CD_SIG:

            continue

        out.append(ZipCarve(zip_start, zip_end, eocd, cd_off, cd_size))

    # Deduplicate (same start/end)

    uniq = {(z.zip_start, z.zip_end): z for z in out}

    return list(uniq.values())

def repair_zeroed_local_header(zip_bytes: bytes) -> bytes:

    data = bytearray(zip_bytes)

    eocd = data.rfind(EOCD_SIG)

    if eocd == -1:

        raise ValueError("EOCD not found")

    disk_no, cd_start_disk, disk_entries, total_entries, cd_size, cd_offset, comment_len = struct.unpack_from(

        "<HHHHIIH", data, eocd + 4

    )

    if total_entries < 1:

        raise ValueError("No central directory entries")

    cd = cd_offset

    if data[cd : cd + 4] != CD_SIG:

        raise ValueError("Central directory signature not found")

    (

        ver_made,

        ver_needed,

        flags,

        method,

        mtime,

        mdate,

        crc32,

        comp_size,

        uncomp_size,

        fname_len,

        extra_len,

        comment_len2,

        disk_start,

        int_attr,

        ext_attr,

        lfh_offset,

    ) = struct.unpack_from("<HHHHHHIIIHHHHHII", data, cd + 4)

    if lfh_offset != 0:

        raise ValueError(f"Unexpected local header offset {lfh_offset}")

    lfh_fixed = struct.pack(

        "<4sHHHHHIIIHH",

        LFH_SIG,

        ver_needed,

        flags,

        method,

        mtime,

        mdate,

        crc32,

        comp_size,

        uncomp_size,

        fname_len,

        extra_len,

    )

    data[0:30] = lfh_fixed

    return bytes(data)

def extract_stored_file_from_single_entry_zip(zip_bytes: bytes) -> tuple[str, bytes]:

    """Parse a single-entry ZIP (stored or deflated) without relying on external tools."""

    # Local file header

    if zip_bytes[:4] != LFH_SIG:

        raise ValueError("Bad local header signature")

    (

        sig,

        ver_needed,

        flags,

        method,

        mtime,

        mdate,

        crc32,

        comp_size,

        uncomp_size,

        fname_len,

        extra_len,

    ) = struct.unpack_from("<4sHHHHHIIIHH", zip_bytes, 0)

    header_len = 30 + fname_len + extra_len

    name = zip_bytes[30 : 30 + fname_len].decode("utf-8", errors="replace")

    data_start = header_len

    data_end = data_start + comp_size

    payload = zip_bytes[data_start:data_end]

    if method == 0:  # stored

        return name, payload

    if method == 8:  # deflate

        import zlib

        # raw DEFLATE stream

        decompressed = zlib.decompress(payload, -zlib.MAX_WBITS)

        return name, decompressed

    raise ValueError(f"Unsupported compression method {method}")

def is_permutation_0_255(block: bytes) -> bool:

    if len(block) != 256:

        return False

    # Fast-ish check: all bytes distinct and cover 0..255

    return len(set(block)) == 256

def rc4_decrypt_from_state(s: bytes, i0: int, j0: int, data: bytes) -> bytes:

    S = bytearray(s)

    i = i0

    j = j0

    out = bytearray(len(data))

    for n, b in enumerate(data):

        i = (i + 1) & 0xFF

        j = (j + S[i]) & 0xFF

        S[i], S[j] = S[j], S[i]

        k = S[(S[i] + S[j]) & 0xFF]

        out[n] = b ^ k

    return bytes(out)

def score_printable(data: bytes) -> float:

    if not data:

        return 0.0

    printable = 0

    for b in data:

        if b in (9, 10, 13) or 32 <= b <= 126:

            printable += 1

    return printable / len(data)

def main() -> int:

    if len(sys.argv) != 2:

        print(__doc__.strip())

        return 2

    core_path = sys.argv[1]

    core = open(core_path, "rb").read()

    # Step 1-3: carve/repair/extract config.bin

    zips = carve_zips(core)

    if not zips:

        print("No candidate ZIP blobs found in core")

        return 1

    config_bin = None

    config_zip_bytes = None

    for z in sorted(zips, key=lambda x: x.zip_end - x.zip_start):

        raw = core[z.zip_start : z.zip_end]

        try:

            repaired = repair_zeroed_local_header(raw)

            name, payload = extract_stored_file_from_single_entry_zip(repaired)

            if name.endswith("config.bin") and payload:

                config_bin = payload

                config_zip_bytes = repaired

                break

        except Exception:

            continue

    if config_bin is None:

        print("Failed to extract config.bin from any carved ZIP")

        return 1

    print(f"Extracted config.bin ({len(config_bin)} bytes)")

    # Step 4-5: scan for RC4 state and decrypt

    targets = [b"shellmates{", b"{", b"[", b"C2", b"http", b"https"]

    best = (0.0, None, None)

    for off in range(0, len(core) - 258):

        s = core[off : off + 256]

        if not is_permutation_0_255(s):

            continue

        i0 = core[off + 256]

        j0 = core[off + 257]

        pt = rc4_decrypt_from_state(s, i0, j0, config_bin)

        if b"shellmates{" in pt:

            text = pt.decode("utf-8", errors="replace")

            open("recovered_config.txt", "w", encoding="utf-8").write(text)

            print(f"FOUND FLAG using state at core offset {off} (i={i0}, j={j0})")

            print(text)

            return 0

        sc = score_printable(pt)

        if sc > best[0]:

            best = (sc, off, pt)

        # Early accept: looks like JSON-ish and pretty printable

        if sc > 0.9 and any(t in pt for t in targets):

            text = pt.decode("utf-8", errors="replace")

            open("recovered_config.txt", "w", encoding="utf-8").write(text)

            print(f"Likely config using state at core offset {off} (i={i0}, j={j0})")

            print(text)

            return 0

    if best[1] is not None:

        sc, off, pt = best

        print(f"No flag found. Best printable candidate: score={sc:.3f} at offset={off}")

        print(pt)

    return 1

if __name__ == "__main__":

    raise SystemExit(main())

>References (format + crypto)

These are the references I leaned on for the “why this works” parts:

  1. PKWARE ZIP AppNote (official format details, including Local File Header = 30 bytes, Central Directory, EOCD):
  1. ZIP structure overview (EOCD/CDFH/Local header layout; helpful for quick recall):
  1. RC4 state and PRGA description (why S[256] + i,j is RC4):