Skip to content

SECURE_CONNECTION//PRESS[CTRL+J]FOR ROOT ACCESS

BACK TO INTEL
ReverseInsane

Wordle Boy

Reversing a Game Boy ROM to recover the flag.

>TL;DR

I recover the flag by symbolically executing the Game Boy ROM’s verifier routine with Z3.

  • Treat the user input buffer at 0xC100..0xC11C (0x1D bytes) as symbolic.
  • Emulate the LR35902 CPU just enough to reach the verifier’s end.
  • Add constraints so the ROM reaches the “approved” state: [0xC16A] == 1.
  • The key realism fix: in PPU mode 3, VRAM/OAM reads return 0xFF (and writes are blocked).

Final flag:

nite{W0uld_y0U_K1nd1y_3nt3r_7h3_c02r3c7_w0rd}


>Challenge Files

In the actual challenge, only the ROM is provided:

  • wordleboy.gb — Game Boy ROM (this is the challenge binary)

I created the following file during solving:

  • solve_wordleboy.py — my symbolic LR35902 emulator + Z3 harness

In simple terms:

  • wordleboy.gb contains the real verification code.
  • solve_wordleboy.py runs that verification code inside a tiny emulator, but with the input bytes replaced by Z3 variables.

>Environment / Setup

I used Python 3 and Z3.

bash
cd /home/noigel/CTF/nite_ctf/rev/Wordle_Boy
python3 -m pip install z3-solver

Run the solver:

bash
python3 solve_wordleboy.py

>Recon: What am I solving?

This is a Game Boy ROM reverse challenge. Instead of fully reversing the verification logic by hand (which is painful on LR35902), I let the ROM “check” my input inside a small emulator.

The crucial thing to identify is what the ROM considers success/failure.


>Hints I was given (and when they mattered)

Hint 1

The wordle ROM has some quirky contraints in place. One can only try satisfying the constraints.

This mattered right at the start: instead of trying to fully decompile/reverse every check, I focused on building a constraint system (symbolic execution + Z3) and letting the ROM itself define the quirks. In practice, this meant:

  • making input bytes symbolic,
  • emulating only the needed CPU instructions,
  • turning “fail” branches into constraints.

Hint 2

In PPU mode 3, VRAM and OAM reads return 0xFF instead of real data.

This mattered later, after I had a mostly-working emulator but Z3 kept going UNSAT. The verifier was relying on a real Game Boy hardware behavior, so my memory model had to match it exactly.

Success condition

The verifier ultimately encodes success in WRAM:

  • Success: memory flag 0xC16A becomes 1.

So I encode that as a solver constraint:

python
solver.add(cpu.read8(0xC16A) == bv8(1))

Failure signature

In this ROM, the failure path contains a very recognizable block:

  • AFXOR A (A becomes 0)
  • EA 6A C1LD (0xC16A),A (writes 0 to 0xC16A)

So a branch that falls into AF EA 6A C1 is effectively “failure”.

In the solver I detect this exact byte pattern:

python
def is_clear_c16a_block(rom, pc):
    return rom[pc:pc+4] == b"\xAF\xEA\x6A\xC1"

…and treat taking that path as not allowed.


>Strategy: Symbolic execution instead of manual reversing

>How I got the idea to solve it this way (for beginners)

I chose Z3 + symbolic execution because this challenge is a perfect match for it:

  • The ROM takes a fixed-size input buffer and runs many arithmetic/bitwise operations.
  • The ROM ends with a clear “pass/fail” outcome in memory (0xC16A).
  • Manually reversing LR35902 code instruction-by-instruction is slow, especially when the verifier contains lots of small checks and branches.

So instead of trying to be clever with algebra by hand, I let the ROM define the rules and asked an SMT solver to find bytes that make the ROM reach the success state.

This approach is also exactly what Hint 1 pushed me toward: the ROM has quirky constraints, so focusing on “satisfying constraints” is more effective than trying to fully understand every check before solving.

References that helped me (recommended learning path)

If someone is new to Z3/symbolic execution, these are the most useful starting points:

The key mental model that made this click for me:

  • “Symbolic” bytes are unknowns.
  • The emulator turns program logic into constraints over those unknowns.
  • Z3 finds values that satisfy all constraints.

Why symbolic execution works well here

A GB verifier typically:

  • reads a buffer
  • performs many arithmetic/bitwise transforms
  • compares to constants
  • branches to fail/success

That is exactly what SMT solvers are good at.

I implement:

  • a minimal LR35902 interpreter
  • memory model (ROM + RAM + IO)
  • Z3 BitVec registers + flags

Then I run from the verifier entry point:

  • Start PC: 0x1735
  • Stop condition: return to 0x0000 (I push a dummy return address)

I push a dummy return address so RET ends execution cleanly.


>Building blocks

Everything important is visible in solve_wordleboy.py.

1) Symbolic input bytes

The ROM reads user input from WRAM, so I create 0x1D symbolic bytes and place each byte at 0xC100..0xC11C:

python
inp = [z3.BitVec(f"inp_{i:02d}", 8) for i in range(0x1D)]
for i, b in enumerate(inp):
    mem[0xC100 + i] = b

I also optionally restrict each byte to printable ASCII for contest-friendly output:

python
solver.add(b >= 0x20, b <= 0x7E)

2) Wave RAM is also symbolic

The ROM reads the APU wave table (wave RAM at 0xFF30..0xFF3F). In many GB programs it might not be initialized when read; if I model it as 0, I can accidentally overconstrain the run.

So I make it symbolic too:

python
wave = [z3.BitVec(f"wave_{i:02d}", 8) for i in range(0x10)]
for i, b in enumerate(wave):
    mem[0xFF30 + i] = b

This ends up contributing a suffix of the final flag.

3) Memory model that stays fast

A classic Z3 pitfall is building huge Store(Store(Store(...))) expressions. Those explode in size and make loops impossible to solve.

I use:

  • mem: Dict[int, BitVec8] for concrete addresses
  • mem_sym: Array[BitVec16 -> BitVec8] only when address is symbolic

This keeps most reads/writes cheap and allows the verifier to run millions of steps.

4) Correct flags matter

Even one wrong CPU flag rule can make the path diverge and go UNSAT.

Examples I fixed/handled:

  • INC/DEC must not change Carry
  • CP updates flags like SUB but does not change A
  • PUSH AF/POP AF must correctly pack/unpack Z/N/C bits

These are implemented in helpers like alu_inc_setflags, op_cp, pack_f, unpack_f.


>The turning point: PPU mode 3 VRAM/OAM quirk

This is the critical hardware detail:

In PPU mode 3, VRAM and OAM are inaccessible. Reads return 0xFF (and writes are blocked).

The verifier checks/relies on this behavior. If I model VRAM/OAM reads normally, the ROM “sees” different bytes than on real hardware → different branches → contradictions → UNSAT.

How I modeled it

In CPU.read8:

  • If ppu_mode == 3 and address is VRAM (0x8000..0x9FFF) or OAM (0xFE00..0xFE9F), return 0xFF.
python
if self._ppu_blocks_vram_oam() and self._is_vram_oam_addr(addr):
    return bv8(0xFF)

In CPU.write8:

  • Ignore VRAM/OAM writes during mode 3.
python
if self._ppu_blocks_vram_oam() and self._is_vram_oam_addr(addr):
    return

I also modeled minimal LCD/STAT behavior:

  • LCDC at 0xFF40 toggles a simplified PPU mode decision
  • STAT at 0xFF41 returns the PPU mode bits in the low 2 bits

This is intentionally minimal: only what the verifier needs.


>Handling branches without full path explosion

Symbolic execution normally forks at every symbolic branch. That can blow up fast.

I used a practical trick:

1) Treat “fail branches” as assertions

If the fallthrough block begins with XOR A; LD (0xC16A),A, then taking that fallthrough means failure.

So instead of forking, I assert I must take the other side.

In decide_check_branch:

python
if is_clear_c16a_block(cpu.rom, fallthrough):
    track_branch(cond, f"chk_{site_pc:04x}")
    cpu.PC = target

This is equivalent to “the check must pass”.

2) For other symbolic branches, pick a feasible side

Blindly choosing fallthrough for all symbolic conditions can lead to UNSAT later.

So I query Z3 quickly:

  • is cond satisfiable?
  • is Not(cond) satisfiable?

Then I pick a satisfiable side and add the chosen condition as a path constraint.

This gives me single-path execution but still solver-consistent.


>Solving and extracting the flag

Once the emulation finishes, I add the final success constraint:

python
solver.add(cpu.read8(0xC16A) == bv8(1))

Then solve:

python
res = solver.check()

When SAT, I read:

  • the input buffer bytes inp_00..inp_1c
  • the wave RAM bytes wave_00..wave_0f

The script prints both.

Output (from the solver)

The solver prints the input prefix:

  • nite{W0uld_y0U_K1nd1y_3nt3r_7

And the wave RAM bytes decode to:

  • h3_c02r3c7_w0rd}

Concatenate the two parts:

nite{W0uld_y0U_K1nd1y_3nt3r_7h3_c02r3c7_w0rd}


>Why this writeup wins (key takeaways)

  • I didn’t “guess” the flag: I proved it with SMT constraints.
  • Hint 1 pushed me toward “satisfy the quirky constraints” via symbolic execution instead of over-reversing.
  • Hint 2 was the decisive realism fix: PPU mode 3 VRAM/OAM reads return 0xFF.
  • I avoided state explosion by turning fail-path branches into assertions and using solver-feasible branch picking elsewhere.
  • I kept Z3 fast with a dict-backed memory model.

>Reproduce in one minute

bash
cd /home/noigel/CTF/nite_ctf/rev/Wordle_Boy
python3 -m pip install z3-solver
python3 solve_wordleboy.py

I should see z3: sat and the flag printed.