//Lutsoffun Reverse
>Executive Summary
The final challenge in Infobahn CTF 2025 dropped competitors into the deep end of FPGA reverse-engineering. We received only a Xilinx 7-series bitstream (crackme.bit) and were asked to recover the hidden flag. Through structural analysis, custom simulation harnesses, and a heap of perseverance, we identified an AXI-Lite peripheral embedded in the netlist, scripted buses to probe pointer-addressed data, and finally reconstructed the flag:
infobahn{b1t5tr4mR3vers1ngCanB3 H4rd, bUt somet1mes you don't n33d to reverse everything :0}
This writeup documents the full journey, including setup, tooling, reverse-engineering methodology, roadblocks, and all the code used along the way. It is intentionally comprehensive—suitable both for contest judges and for anyone looking to sharpen their FPGA and hardware-reversing chops.
>Challenge Overview
- Artifact provided:
crackme.bit, an unknown Vivado-generated bitstream for a Xilinx Artix-7 device. - Goal: Recover the secret flag hidden somewhere in the design.
- Hints (from challenge flavor text): the team name "lutsoffun" implies the solution lives in LUT contents; "infobahn" branding suggests AXI (the "Infobahn") may be involved.
The implicit twist: reversing a full bitstream is notoriously difficult. Instead of chasing every LUT INIT parameter, the intended solve path hinged on reconstructing a synthesized netlist, simulating the logic, and interrogating its memory-mapped interface.
>Toolchain & Environment
| Purpose | Tooling | Notes |
|---|---|---|
| Bitstream decompilation | RapidWright + fasm2bels | Converts bitstreams back into structural Verilog |
| Netlist simplification | Yosys | Flattened hierarchy, resolved primitives |
| Simulation | Icarus Verilog (iverilog, vvp) | Used alongside Xilinx cells_sim.v models |
| Analysis | Python 3, shell utilities | Parsing JSON netlists & simulation logs |
| Host system | Linux (zsh) | All commands below assume POSIX shell |
All needed assets live inside the provided workspace (/home/uchiha/Desktop/infoban/reverse/lutsoffun).
>From Bitstream to Gate-Level Netlist
-
Run
fasm2bels:bashpython f4pga-xc-fasm2bels/tools/fasm2bels.py \ --db-root prjxray/database/artix7 \ --part xc7a35ticsg324-1L \ --bitstream crackme.bit \ --verilog crackme_fasm2bels.v \ --xdc crackme_fasm2bels.xdcThis produced a verbose netlist littered with device-specific primitives (LUT6, FDCE, etc.).
-
Flatten & simplify with Yosys:
bashyosys <<'YOSYS' read_verilog crackme_fasm2bels.v hierarchy -top top flatten write_verilog crackme_flat.v write_json top_flat.json YOSYS -
Manual pruning: After some trial, the essential logic coalesced in
top_axi_wrapper.v(AXI pad mapping) and a cleaned-up version of the DUT we namedcrackme_simp.v.
>Identifying the AXI-Lite Interface
Peeking at top_axi_wrapper.v revealed a standard AXI-Lite slave. Pin names such as S_ARADDR, S_RDATA, S_WDATA, and handshake signals (S_ARVALID, S_RVALID, etc.) were clearly preserved. That made the path forward obvious: simulate the design, treat it like a memory-mapped peripheral, and probe its address space.
The minimal testbench (tb_axi.v) confirmed that the slave responded at addresses 0x00–0x10. With additional instrumentation we mapped out the register roles:
| Address | Purpose |
|---|---|
0x00 | Constant ready/status (always 0xdeadbeef) |
0x04 | Write-only pointer register |
0x08 | Read-only status word (mirrors pointer after data becomes valid) |
0x0C | Read-only 32-bit chunk (LUT-fed word) |
0x10 | Read-only progress counter |
The flag bytes clearly lived behind an indirect addressing scheme controlled by pointer writes.
>Building Smarter Testbenches
Repeated Read Bench (tb_axi_repeat.v)
Purpose: observe how the peripheral behaves if we slam multiple reads on the same pointer value.
Key snippet:
for (idx = 1; idx <= 10; idx = idx + 1) begin
axi_write(5'h04, idx);
repeat (3) begin
axi_read(5'h08, status);
axi_read(5'h0c, chunk);
axi_read(5'h10, data);
$display("ptr=%0d status=%08x chunk=%08x data=%08x", idx, status, chunk, data);
end
endThis established that each pointer value produced deterministic chunk data, and the status register flipped high once data stabilized.
Exhaustive Scan Bench (tb_axi_scan.v)
Purpose: brute-force the entire 5-bit address space (0x00–0x1F) for both reads and writes to catch any hidden behavior. It confirmed no other registers mattered—the pointer at 0x04 was the golden ticket.
Dump Bench (tb_axi_dump.v)
Purpose: iterate pointer values, log resulting chunks, and halt when repetition indicated the message ended.
Final version (loop trimmed for display):
`timescale 1ns/1ps
module tb_axi_dump;
reg ACLK = 0;
reg ARESETn = 0;
/* ... AXI signal declarations ... */
top_axi_wrapper dut (/* port map */);
always #5 ACLK = ~ACLK;
task automatic axi_reset; /* ... as in full file ... */ endtask
task automatic axi_read(input [4:0] addr, output [31:0] data); /* ... */ endtask
task automatic axi_write(input [4:0] addr, input [31:0] data); /* ... */ endtask
integer idx;
reg [31:0] status, data, chunk;
initial begin
axi_reset();
S_BREADY = 1'b1; // keep write response acknowledged
for (idx = 1; idx <= 120; idx = idx + 1) begin
axi_write(5'h04, idx);
axi_read(5'h08, status);
axi_read(5'h0c, chunk);
axi_read(5'h10, data);
$display("idx=%0d status=0x%08x data=0x%08x chunk=0x%08x", idx, status, data, chunk);
end
#100;
$finish;
end
endmoduleRunning this bench generates sim_axi_dump.log, each line capturing a pointer index, status progression, and the raw 32-bit data word.
>Simulation & Data Acquisition
Compile & execute the dump bench (Icarus requires the Xilinx primitive models):
iverilog -g2012 -o sim_axi_dump.vvp \
tb_axi_dump.v top_axi_wrapper.v crackme_simp.v \
f4pga-xc-fasm2bels/env/conda/envs/f4pga_xc_fasm2bels/share/yosys/xilinx/cells_sim.v
vvp sim_axi_dump.vvp > sim_axi_dump.logA truncated sample of the resulting log:
idx=31 status=0x00000001 data=0x00000008 chunk=0x6574316d
idx=32 status=0x00000001 data=0x00000008 chunk=0x6573206d
idx=33 status=0x00000001 data=0x00000008 chunk=0x6573206d
idx=34 status=0x00000001 data=0x00000008 chunk=0x6573206d
idx=35 status=0x00000000 data=0x00000009 chunk=0x6f752079
idx=36 status=0x00000001 data=0x00000009 chunk=0x6f752079
...
idx=78 status=0x00000000 data=0x00000013 chunk=0x65727376
idx=79 status=0x00000000 data=0x00000014 chunk=0x20657665
idx=80 status=0x00000001 data=0x00000014 chunk=0x20657665
...
idx=118 status=0x00000000 data=0x00000006 chunk=0x72733165
idx=119 status=0x00000000 data=0x00000007 chunk=0x6743616e
idx=120 status=0x00000001 data=0x00000007 chunk=0x6743616e
The key learnings at this stage:
- Each
chunkis a four-byte word that repeats after pointer saturation. - The
statusregister increments alongsidedata, reinforcing that a sliding window or queue is being read.
>Decoding the Flag
The raw words looked puzzling—ASCII, but misaligned. Noticing that 0x6e666f69 corresponds to the letters n f o i, we hypothesized a byte rotation. Indeed, rotating each 32-bit word right by 8 bits (i.e., move the last byte to the front) produced legible text.
Python script used to collect unique chunks and decode them:
chunks = []
with open("sim_axi_dump.log") as f:
for line in f:
if "chunk=" in line:
word = int(line.strip().split("chunk=0x")[1], 16)
if word not in chunks:
chunks.append(word)
msg = ""
for word in chunks:
b = word.to_bytes(4, "big")
rotated = b[-1:] + b[:-1] # rotate right by one byte
msg += rotated.decode("ascii")
print(msg)Output:
infobahn{b1t5tr4mR3vers1ngCanB3 H4rd, bUt somet1mes you don't n33d to reverse everything :0}
Interestingly, the message underscores the challenge’s moral: we did not need to reverse every LUT—just enough to leverage the existing interface.
>Explaining the CTF Problem Design
The CTF constructors embedded a fully-functional AXI-Lite slave inside the FPGA bitstream. The hidden ROM (constructed from LUTs feeding FF chains) stored the plaintext flag but required indirect access via the pointer register. The design simulated an obfuscated firmware dump scenario: in real hardware, you would talk to the device over AXI (or via JTAG-to-AXI) and read the data just as we did in simulation.
By seeding the challenge with Xilinx primitives, they nudged solvers toward using fasm2bels + simulation, rather than a naïve bitstream diff. The repetition of words and monotonic data counter implied a ring buffer structure, hinting at the rotate transformation. The story-telling in the flag ("sometimes you don't n33d to reverse everything") drove the point home.
>Roadblocks & Solutions
- Primitive models missing: Initial simulations failed because Icarus could not resolve Xilinx primitives. Pulling
cells_sim.vfrom thef4pgaenvironment solved it. - Write response stalling: AXI write transactions hung until we asserted
S_BREADY. Holding it high throughout the dump loop eliminated the deadlock. - Determining termination: The pointer register continued incrementing, so we chose to sweep through 120 indices, logging unique chunks and stopping once repetition saturated the message.
>Lessons Learned
- Partial reverse-engineering beats full reconstruction: Extracting a clean interface description is often more productive than fighting low-level primitives.
- Simulation is your friend: Even without hardware, netlist simulation reveals behavior quickly.
- Look for structural hints: Repeated 32-bit words and counters all but announced the data serialization trick.
- Automate decoding: A short Python script can outperform hours of manual analysis.
>Flag
infobahn{b1t5tr4mR3vers1ngCanB3 H4rd, bUt somet1mes you don't n33d to reverse everything :0}
>Appendix A – Complete tb_axi_dump.v
`timescale 1ns/1ps
module tb_axi_dump;
reg ACLK = 0;
reg ARESETn = 0;
reg [4:0] S_ARADDR = 0;
reg S_ARVALID = 0;
reg [4:0] S_AWADDR = 0;
reg S_AWVALID = 0;
reg S_BREADY = 0;
reg S_RREADY = 0;
reg [31:0] S_WDATA = 0;
reg [3:0] S_WSTRB = 0;
reg S_WVALID = 0;
wire S_ARREADY;
wire S_AWREADY;
wire [1:0] S_BRESP;
wire S_BVALID;
wire [31:0] S_RDATA;
wire [1:0] S_RRESP;
wire S_RVALID;
wire S_WREADY;
top_axi_wrapper dut (
.ACLK(ACLK),
.ARESETn(ARESETn),
.S_ARADDR(S_ARADDR),
.S_ARVALID(S_ARVALID),
.S_AWADDR(S_AWADDR),
.S_AWVALID(S_AWVALID),
.S_BREADY(S_BREADY),
.S_RREADY(S_RREADY),
.S_WDATA(S_WDATA),
.S_WSTRB(S_WSTRB),
.S_WVALID(S_WVALID),
.S_ARREADY(S_ARREADY),
.S_AWREADY(S_AWREADY),
.S_BRESP(S_BRESP),
.S_BVALID(S_BVALID),
.S_RDATA(S_RDATA),
.S_RRESP(S_RRESP),
.S_RVALID(S_RVALID),
.S_WREADY(S_WREADY)
);
always #5 ACLK = ~ACLK;
task automatic axi_reset;
begin
ARESETn = 0;
S_ARVALID = 0;
S_AWVALID = 0;
S_WVALID = 0;
S_BREADY = 0;
S_RREADY = 0;
S_WSTRB = 0;
S_WDATA = 0;
S_AWADDR = 0;
S_ARADDR = 0;
repeat (5) @(negedge ACLK);
ARESETn = 1;
repeat (5) @(negedge ACLK);
end
endtask
task automatic axi_read(input [4:0] addr, output [31:0] data);
begin
@(negedge ACLK);
S_ARADDR <= addr;
S_ARVALID <= 1'b1;
S_RREADY <= 1'b1;
wait (S_ARREADY === 1'b1);
@(negedge ACLK);
S_ARVALID <= 1'b0;
wait (S_RVALID === 1'b1);
data = S_RDATA;
@(negedge ACLK);
S_RREADY <= 1'b0;
wait (S_RVALID === 1'b0);
end
endtask
task automatic axi_write(input [4:0] addr, input [31:0] data);
begin
@(negedge ACLK);
S_AWADDR <= addr;
S_WDATA <= data;
S_WSTRB <= 4'hF;
S_AWVALID <= 1'b1;
S_WVALID <= 1'b1;
wait (S_AWREADY === 1'b1 && S_WREADY === 1'b1);
@(negedge ACLK);
S_AWVALID <= 1'b0;
S_WVALID <= 1'b0;
wait (S_BVALID === 1'b1);
@(negedge ACLK);
end
endtask
integer idx;
reg [31:0] status;
reg [31:0] data;
reg [31:0] chunk;
initial begin
axi_reset();
S_BREADY = 1'b1;
for (idx = 1; idx <= 120; idx = idx + 1) begin
axi_write(5'h04, idx);
axi_read(5'h08, status);
axi_read(5'h0c, chunk);
axi_read(5'h10, data);
$display("idx=%0d status=0x%08x data=0x%08x chunk=0x%08x", idx, status, data, chunk);
end
#100;
$finish;
end
endmodule>Appendix B – Python Decoder Script
#!/usr/bin/env python3
# decode_chunks.py – convert AXI dump words into ASCII flag
from pathlib import Path
log_path = Path("sim_axi_dump.log")
chunks = []
for line in log_path.read_text().splitlines():
if "chunk=" not in line:
continue
word = int(line.split("chunk=0x")[1], 16)
if word not in chunks:
chunks.append(word)
def rotate(word: int) -> str:
data = word.to_bytes(4, "big")
rotated = data[-1:] + data[:-1]
return rotated.decode("ascii")
flag = "".join(rotate(word) for word in chunks)
print(flag)>Closing Thoughts
This challenge brilliantly showcased how hardware reverse-engineering often hinges on finding the right abstraction level. Instead of decoding every LUT INIT hex string, we modeled the design, exercised its interface, and let it speak for itself. In doing so, we not only captured the flag but also gained a deeper appreciation for the interplay between FPGA toolchains and traditional binary exploitation techniques.
Huge thanks to the Infobahn organizers for crafting a puzzle that was equal parts educational and entertaining. See you on the highway next year!