Skip to content

SECURE_CONNECTION//PRESS[CTRL+J]FOR ROOT ACCESS

BACK TO INTEL
MiscEasy

Allo! (Pcap)

CTF writeup for Allo! (Pcap) from Next Hunt

//Allo! (PCAP)

Category: Misc

Author: Houssem0x1

Flag format: nexus{}

>TL;DR

This PCAP is mostly VoIP traffic. The flag is spoken over two RTP audio streams (G.711 μ-law). Extract the RTP payloads, convert to WAV, listen, and you’ll hear:

Flag: nexus{1337483127*$}


>1) Initial triage (what’s inside the PCAP?)

Start by identifying what protocols exist. The fastest approach is tshark protocol hierarchy:

bash

tshark -r capture.pcapng -q -z io,phs

What we learn from the hierarchy:

  • There is a small amount of DNS/HTTP/TLS.

  • There is SIP signaling.

  • There is a huge amount of RTP (real-time media).

That combination (SIP + RTP) strongly suggests a VoIP phone call capture, where:

  • SIP sets up the call (INVITE/OK/BYE)

  • RTP carries the actual audio


>2) Confirm VoIP streams (RTP streams list)

Next, list RTP streams:

bash

tshark -r capture.pcapng -q -z rtp,streams

This prints multiple RTP streams and their codec. In this challenge, the payload is:

  • g711U (G.711 μ-law), typically 8000 Hz mono.

At this point you can solve it in two different ways:

  • Wireshark GUI (easiest for humans)

  • Command-line extraction (repeatable, good for writeups)

I’ll show both.


>3A) Wireshark GUI method (quickest)

  1. Open the PCAP in Wireshark.

  2. Go to:

   - Telephony → VoIP Calls

  1. Select the detected call, then click:

   - Player or Prepare Filter / Play Streams (depends on Wireshark version)

  1. In RTP Streams, you can:

   - Select a stream and Analyze / Play

   - Or Save the audio

In this capture there are 5 RTP streams, but only two contain clear speech:

  • stream_0x6e9c0aa7.wav

  • stream_0x2839728f.wav

The other streams are silent/empty (RMS near 0 when decoded), which is common in VoIP captures due to:

  • one-way audio in some directions,

  • comfort noise / muted direction,

  • unused negotiated streams.

Listening to the two speech streams reveals the flag read out loud.


>3B) Command-line method (repeatable and contest-friendly)

Step 1 — Export RTP payloads (by SSRC)

First, extract each RTP stream’s payload into raw μ-law (.ulaw) files.

Because tshark can print the RTP payload as hex, we can reassemble it safely.

Example script (extracting the SSRCs reported by tshark -z rtp,streams):

bash

python3 - <<'PY'

import subprocess

  

# SSRC values taken from: tshark -r capture.pcapng -q -z rtp,streams

ssrcs = [

    "0x2839728f",

    "0x66556019",

    "0x644366bb",

    "0x6e9c0aa7",

    "0x69cc8abd",

]

  

for ssrc in ssrcs:

    cmd = [

        "tshark","-r","capture.pcapng",

        "-Y", f"rtp.ssrc=={ssrc}",

        "-T","fields",

        "-e","rtp.payload",

    ]

    out = subprocess.check_output(cmd)

    lines = out.decode().splitlines()

  

    # Each line is hex bytes (sometimes with ':' separators depending on tshark)

    data = b''.join(bytes.fromhex(line.replace(':','')) for line in lines if line)

  

    fname = f"stream_{ssrc}.ulaw"

    with open(fname,"wb") as f:

        f.write(data)

  

    print(ssrc, len(lines), "packets", len(data), "bytes")

PY

After this, you’ll have five .ulaw files.

Step 2 — Convert μ-law to WAV

G.711 μ-law is standard telephony audio: 8 kHz, mono.

Convert each stream with ffmpeg:

bash

for f in stream_*.ulaw; do

  ffmpeg -y -f mulaw -ar 8000 -ac 1 -i "$f" "${f%.ulaw}.wav"

done

Now you’ll have five WAVs.

Step 3 — Listen and identify the spoken flag

Open the WAV files in any audio player.

Only these two contain the spoken message:

  • stream_0x6e9c0aa7.wav

  • stream_0x2839728f.wav

The other WAVs decode to silence (no meaningful audio).

The voice in the audio gives the flag:

nexus{1337483127*$}


>4) Why there were “extra” streams

It’s normal to see multiple RTP streams in SIP/RTP captures:

  • One stream per direction (caller → callee, callee → caller)

  • Sometimes extra negotiated streams that never carry audio

  • Sometimes NAT/phone behavior causes one direction to be silent

So the key is not “how many streams exist”, but “which streams actually contain speech”.


>Final Flag

nexus{1337483127*$}