//Allo! (PCAP)

Category: Misc

Author: Houssem0x1

Flag format: nexus{}

>TL;DR

This PCAP is mostly VoIP traffic. The flag is spoken over two RTP audio streams (G.711 μ-law). Extract the RTP payloads, convert to WAV, listen, and you’ll hear:

Flag: nexus{1337483127*$}

>1) Initial triage (what’s inside the PCAP?)

Start by identifying what protocols exist. The fastest approach is tshark protocol hierarchy:

bash


tshark -r capture.pcapng -q -z io,phs

What we learn from the hierarchy:

There is a small amount of DNS/HTTP/TLS.
There is SIP signaling.
There is a huge amount of RTP (real-time media).

That combination (SIP + RTP) strongly suggests a VoIP phone call capture, where:

SIP sets up the call (INVITE/OK/BYE)
RTP carries the actual audio

>2) Confirm VoIP streams (RTP streams list)

Next, list RTP streams:

bash


tshark -r capture.pcapng -q -z rtp,streams

This prints multiple RTP streams and their codec. In this challenge, the payload is:

g711U (G.711 μ-law), typically 8000 Hz mono.

At this point you can solve it in two different ways:

Wireshark GUI (easiest for humans)
Command-line extraction (repeatable, good for writeups)

I’ll show both.

>3A) Wireshark GUI method (quickest)

Open the PCAP in Wireshark.
Go to:

- Telephony → VoIP Calls

Select the detected call, then click:

- Player or Prepare Filter / Play Streams (depends on Wireshark version)

In RTP Streams, you can:

- Select a stream and Analyze / Play

- Or Save the audio

In this capture there are 5 RTP streams, but only two contain clear speech:

stream_0x6e9c0aa7.wav
stream_0x2839728f.wav

The other streams are silent/empty (RMS near 0 when decoded), which is common in VoIP captures due to:

one-way audio in some directions,
comfort noise / muted direction,
unused negotiated streams.

Listening to the two speech streams reveals the flag read out loud.

>3B) Command-line method (repeatable and contest-friendly)

Step 1 — Export RTP payloads (by SSRC)

First, extract each RTP stream’s payload into raw μ-law (.ulaw) files.

Because tshark can print the RTP payload as hex, we can reassemble it safely.

Example script (extracting the SSRCs reported by tshark -z rtp,streams):

bash


python3 - <<'PY'

import subprocess

  

# SSRC values taken from: tshark -r capture.pcapng -q -z rtp,streams

ssrcs = [

    "0x2839728f",

    "0x66556019",

    "0x644366bb",

    "0x6e9c0aa7",

    "0x69cc8abd",

]

  

for ssrc in ssrcs:

    cmd = [

        "tshark","-r","capture.pcapng",

        "-Y", f"rtp.ssrc=={ssrc}",

        "-T","fields",

        "-e","rtp.payload",

    ]

    out = subprocess.check_output(cmd)

    lines = out.decode().splitlines()

  

    # Each line is hex bytes (sometimes with ':' separators depending on tshark)

    data = b''.join(bytes.fromhex(line.replace(':','')) for line in lines if line)

  

    fname = f"stream_{ssrc}.ulaw"

    with open(fname,"wb") as f:

        f.write(data)

  

    print(ssrc, len(lines), "packets", len(data), "bytes")

PY

After this, you’ll have five .ulaw files.

Step 2 — Convert μ-law to WAV

G.711 μ-law is standard telephony audio: 8 kHz, mono.

Convert each stream with ffmpeg:

bash


for f in stream_*.ulaw; do

  ffmpeg -y -f mulaw -ar 8000 -ac 1 -i "$f" "${f%.ulaw}.wav"

done

Now you’ll have five WAVs.

Step 3 — Listen and identify the spoken flag

Open the WAV files in any audio player.

Only these two contain the spoken message:

stream_0x6e9c0aa7.wav
stream_0x2839728f.wav

The other WAVs decode to silence (no meaningful audio).

The voice in the audio gives the flag:

nexus{1337483127*$}

>4) Why there were “extra” streams

It’s normal to see multiple RTP streams in SIP/RTP captures:

One stream per direction (caller → callee, callee → caller)
Sometimes extra negotiated streams that never carry audio
Sometimes NAT/phone behavior causes one direction to be silent

So the key is not “how many streams exist”, but “which streams actually contain speech”.

>Final Flag

nexus{1337483127*$}