//Allo! (PCAP)
Category: Misc
Author: Houssem0x1
Flag format: nexus{}
>TL;DR
This PCAP is mostly VoIP traffic. The flag is spoken over two RTP audio streams (G.711 μ-law). Extract the RTP payloads, convert to WAV, listen, and you’ll hear:
Flag: nexus{1337483127*$}
>1) Initial triage (what’s inside the PCAP?)
Start by identifying what protocols exist. The fastest approach is tshark protocol hierarchy:
tshark -r capture.pcapng -q -z io,phs
What we learn from the hierarchy:
-
There is a small amount of DNS/HTTP/TLS.
-
There is SIP signaling.
-
There is a huge amount of RTP (real-time media).
That combination (SIP + RTP) strongly suggests a VoIP phone call capture, where:
-
SIP sets up the call (INVITE/OK/BYE)
-
RTP carries the actual audio
>2) Confirm VoIP streams (RTP streams list)
Next, list RTP streams:
tshark -r capture.pcapng -q -z rtp,streams
This prints multiple RTP streams and their codec. In this challenge, the payload is:
g711U(G.711 μ-law), typically 8000 Hz mono.
At this point you can solve it in two different ways:
-
Wireshark GUI (easiest for humans)
-
Command-line extraction (repeatable, good for writeups)
I’ll show both.
>3A) Wireshark GUI method (quickest)
-
Open the PCAP in Wireshark.
-
Go to:
- Telephony → VoIP Calls
- Select the detected call, then click:
- Player or Prepare Filter / Play Streams (depends on Wireshark version)
- In RTP Streams, you can:
- Select a stream and Analyze / Play
- Or Save the audio
In this capture there are 5 RTP streams, but only two contain clear speech:
-
stream_0x6e9c0aa7.wav -
stream_0x2839728f.wav
The other streams are silent/empty (RMS near 0 when decoded), which is common in VoIP captures due to:
-
one-way audio in some directions,
-
comfort noise / muted direction,
-
unused negotiated streams.
Listening to the two speech streams reveals the flag read out loud.
>3B) Command-line method (repeatable and contest-friendly)
Step 1 — Export RTP payloads (by SSRC)
First, extract each RTP stream’s payload into raw μ-law (.ulaw) files.
Because tshark can print the RTP payload as hex, we can reassemble it safely.
Example script (extracting the SSRCs reported by tshark -z rtp,streams):
python3 - <<'PY'
import subprocess
# SSRC values taken from: tshark -r capture.pcapng -q -z rtp,streams
ssrcs = [
"0x2839728f",
"0x66556019",
"0x644366bb",
"0x6e9c0aa7",
"0x69cc8abd",
]
for ssrc in ssrcs:
cmd = [
"tshark","-r","capture.pcapng",
"-Y", f"rtp.ssrc=={ssrc}",
"-T","fields",
"-e","rtp.payload",
]
out = subprocess.check_output(cmd)
lines = out.decode().splitlines()
# Each line is hex bytes (sometimes with ':' separators depending on tshark)
data = b''.join(bytes.fromhex(line.replace(':','')) for line in lines if line)
fname = f"stream_{ssrc}.ulaw"
with open(fname,"wb") as f:
f.write(data)
print(ssrc, len(lines), "packets", len(data), "bytes")
PY
After this, you’ll have five .ulaw files.
Step 2 — Convert μ-law to WAV
G.711 μ-law is standard telephony audio: 8 kHz, mono.
Convert each stream with ffmpeg:
for f in stream_*.ulaw; do
ffmpeg -y -f mulaw -ar 8000 -ac 1 -i "$f" "${f%.ulaw}.wav"
done
Now you’ll have five WAVs.
Step 3 — Listen and identify the spoken flag
Open the WAV files in any audio player.
Only these two contain the spoken message:
-
stream_0x6e9c0aa7.wav -
stream_0x2839728f.wav
The other WAVs decode to silence (no meaningful audio).
The voice in the audio gives the flag:
nexus{1337483127*$}
>4) Why there were “extra” streams
It’s normal to see multiple RTP streams in SIP/RTP captures:
-
One stream per direction (caller → callee, callee → caller)
-
Sometimes extra negotiated streams that never carry audio
-
Sometimes NAT/phone behavior causes one direction to be silent
So the key is not “how many streams exist”, but “which streams actually contain speech”.
>Final Flag
nexus{1337483127*$}