>Skillia Voting Records
Objective
Investigate skillia_voting_records.zip (extracted to 002.records/skillia_voting_records.csv) to identify six voting stations that were manipulated. Produce a reproducible, fully documented analysis, include the code used, commands, and final flag.
>Environment & Setup
- Workspace:
/home/noigel/Desktop/RSTCONctf/Forensics/Skilla - Evidence archive:
skillia_voting_records.zip→ containsskillia_voting_records.csv(~217 MB) - Virtual environment (isolated):
001.venv(created withpython3 -m venv) - Python packages installed (within the venv):
python-magic,exifread,Pillow,pandas,openpyxl
Commands used to set up the environment (already run):
cd /home/noigel/Desktop/RSTCONctf/Forensics/Skilla
python3 -m venv 001.venv
source 001.venv/bin/activate
python -m pip install --upgrade pip
pip install python-magic exifread Pillow pandas openpyxl
unzip skillia_voting_records.zip -d 002.recordsAll further scripts in this writeup were executed with the venv python to ensure reproducibility, e.g.:
./001.venv/bin/python 003.initial_triage.py>Investigative Approach (high level)
- Initial triage (file inspection, sample rows, header analysis).
- Produce per-station summaries (ballot counts, absentee fractions, most-common ballot patterns).
- Detect anomalies using multiple heuristics:
- Repeated identical ballot pattern fraction (indicative of automated/duplicated ballots).
- Per-station deviation from global vote distribution (presidential A% deviation).
- Absentee vs in-person distribution divergence (L1 distance) per station.
- Absentee fraction outliers within counties (z-score by county).
- Duplicate aggregate profiles across stations.
- Triangulate results to form a final candidate list of 6 manipulated stations.
Rationale: manipulations often leave statistical fingerprints — large internal mismatches between absentee and in-person vote distributions at the same station, unusual repeated identical ballot patterns, or substantial deviation from the global distribution.
>Files produced during analysis
002.initial_triage_summary.csv— per-station summary: total ballots, absentee fraction, unique patterns, most-common pattern count, president counts.003.patterns_ranked.csv— stations sorted by fraction of ballots matching the station's most common ballot pattern.004.pres_deviation_top30.csv— stations whose presidential A% deviates most from the global A%.006.absentee_outliers_top20.csv— absentee fraction z-score outliers by county.008.abs_vs_inperson_pres_top30.csv— stations with the largest L1 distance between absentee and in-person presidential vote distributions.
These were generated by scripts included below.
>Key results and reasoning
I used multiple signals to identify manipulated stations. The strongest single-signal I used for final selection was the absentee vs in-person presidential vote distribution divergence (L1 distance). This directly compares the mix of votes (A/B/undervote) between ballots cast in-person and absentee ballots at the same station; a large divergence often indicates differential tampering or injected ballots into one of the streams.
The top 6 stations by L1 divergence (and corroborating signals from other heuristics) were:
MetaCTF{81b7bc0c,1251749b,4b45c819,1fe457e4,a4defb9c,329071ee}
Why these six?
- All six appear in the top results for L1 divergence between absentee and in-person presidential vote distributions (
008.abs_vs_inperson_pres_top30.csv). - Several of them also show up as outliers in other tests:
a4defb9cand329071eeare high on pattern-based rankings (003.patterns_ranked.csv) and in deviation from global presidential percentages.81b7bc0c,1251749b,4b45c819,1fe457e4appear in top L1 divergence positions and have additional statistical anomalies (counts or deviations).
Taken together, these signals strongly indicate tampering targeted at these stations.
>Commands & Execution Log (representative)
All commands were executed in /home/noigel/Desktop/RSTCONctf/Forensics/Skilla using the isolated venv as shown earlier. Representative commands executed to produce artifacts:
# 1) Generate per-station triage summary
./001.venv/bin/python 003.initial_triage.py
# 2) Rank repeated ballot patterns per station
./001.venv/bin/python 004.analyze_patterns.py
# 3) Compute stations deviating from global presidential distribution
./001.venv/bin/python 005.find_outliers.py
# 4) Find county-level absentee outliers
./001.venv/bin/python 007.absentee_outliers.py
# 5) Compare absentee vs in-person presidential distributions (L1)
./001.venv/bin/python 009.abs_vs_inperson_pres.pyYou can open the CSV/text output files to inspect the numbers directly.
>Full analysis scripts (verbatim)
Below are the exact Python scripts used during the investigation. They are included so the analysis is fully reproducible. Each script was executed with ./001.venv/bin/python <script>.
003.initial_triage.py
#!/usr/bin/env python3
import csv
from collections import defaultdict, Counter
from pathlib import Path
csv_path = Path('002.records/skillia_voting_records.csv')
# Aggregates per station
station_counts = Counter()
station_absentee = Counter()
station_patterns = defaultdict(Counter) # pattern -> count per station
station_pres = defaultdict(Counter)
station_sen = defaultdict(Counter)
station_gov = defaultdict(Counter)
with csv_path.open('r', newline='') as fh:
reader = csv.DictReader(fh)
for row in reader:
sid = row['voting_station_id']
station_counts[sid] += 1
if row.get('voter_type','').strip().lower() == 'absentee':
station_absentee[sid] += 1
# pattern: tuple of votes
pattern = (row.get('vote_president',''), row.get('vote_senate',''), row.get('vote_governor',''), row.get('vote_marijuana',''), row.get('vote_tax_cap',''))
station_patterns[sid][pattern] += 1
station_pres[sid][row.get('vote_president','')] += 1
station_sen[sid][row.get('vote_senate','')] += 1
station_gov[sid][row.get('vote_governor','')] += 1
# Prepare summary
out_lines = []
out_lines.append('station_id,total_ballots,absentee_fraction,unique_patterns,top_pattern_count,top_pres_A,top_pres_B,top_pres_undervote')
for sid in station_counts:
total = station_counts[sid]
absentee_frac = station_absentee[sid]/total if total else 0
unique_patterns = len(station_patterns[sid])
top_pattern_count = station_patterns[sid].most_common(1)[0][1]
presA = station_pres[sid].get('A',0)
presB = station_pres[sid].get('B',0)
presUnd = station_pres[sid].get('undervote',0)
out_lines.append(f"{sid},{total},{absentee_frac:.4f},{unique_patterns},{top_pattern_count},{presA},{presB},{presUnd}")
# Write summary to file
with open('002.initial_triage_summary.csv','w') as outfh:
outfh.write('\n'.join(out_lines))
print('Wrote 002.initial_triage_summary.csv with per-station summaries')
004.analyze_patterns.py
#!/usr/bin/env python3
import csv
from collections import defaultdict, Counter
from pathlib import Path
csv_path = Path('002.records/skillia_voting_records.csv')
# We'll compute per-station stats focusing on repeated patterns
station_counts = Counter()
station_patterns = defaultdict(Counter)
with csv_path.open('r', newline='') as fh:
reader = csv.DictReader(fh)
for row in reader:
sid = row['voting_station_id']
station_counts[sid] += 1
pattern = (row.get('vote_president',''), row.get('vote_senate',''), row.get('vote_governor',''), row.get('vote_marijuana',''), row.get('vote_tax_cap',''))
station_patterns[sid][pattern] += 1
rows = []
for sid, total in station_counts.items():
unique_patterns = len(station_patterns[sid])
top_count = station_patterns[sid].most_common(1)[0][1]
top_frac = top_count/total
unique_frac = unique_patterns/total
rows.append((sid, total, unique_patterns, top_count, top_frac, unique_frac))
# Sort by top_frac descending
rows_sorted = sorted(rows, key=lambda x: x[4], reverse=True)
out_lines = ['station_id,total,unique_patterns,top_count,top_frac,unique_frac']
for sid, total, unique, topc, topf, uniqf in rows_sorted:
out_lines.append(f"{sid},{total},{unique},{topc},{topf:.6f},{uniqf:.6f}")
with open('003.patterns_ranked.csv','w') as outfh:
outfh.write('\n'.join(out_lines))
print('Wrote 003.patterns_ranked.csv')
005.find_outliers.py
#!/usr/bin/env python3
import csv
from collections import defaultdict, Counter
from pathlib import Path
import math
csv_path = Path('002.records/skillia_voting_records.csv')
station_counts = Counter()
station_pres = defaultdict(Counter)
total_ballots = 0
aggregate_pres = Counter()
with csv_path.open('r', newline='') as fh:
reader = csv.DictReader(fh)
for row in reader:
sid = row['voting_station_id']
station_counts[sid] += 1
v = row.get('vote_president','')
station_pres[sid][v] += 1
aggregate_pres[v] += 1
total_ballots += 1
# compute global percent for A and B
global_A = aggregate_pres.get('A',0)/total_ballots
global_B = aggregate_pres.get('B',0)/total_ballots
rows = []
for sid, total in station_counts.items():
a = station_pres[sid].get('A',0)
b = station_pres[sid].get('B',0)
und = station_pres[sid].get('undervote',0)
a_pct = a/total if total else 0
b_pct = b/total if total else 0
rows.append((sid, total, a_pct, b_pct, abs(a_pct-global_A)))
# sort by deviation from global A fraction
rows_sorted = sorted(rows, key=lambda x: x[4], reverse=True)
out_lines = ['station_id,total,a_pct,b_pct,dev_from_global_A']
for sid, total, a_pct, b_pct, dev in rows_sorted[:30]:
out_lines.append(f"{sid},{total},{a_pct:.6f},{b_pct:.6f},{dev:.6f}")
with open('004.pres_deviation_top30.csv','w') as outfh:
outfh.write('\n'.join(out_lines))
print('Wrote 004.pres_deviation_top30.csv')
006.duplicate_profiles.py
#!/usr/bin/env python3
import csv
from collections import defaultdict, Counter
from pathlib import Path
csv_path = Path('002.records/skillia_voting_records.csv')
station_counts = Counter()
# For each station, counters per race
station_pres = defaultdict(Counter)
station_sen = defaultdict(Counter)
station_gov = defaultdict(Counter)
station_mar = defaultdict(Counter)
station_tax = defaultdict(Counter)
with csv_path.open('r', newline='') as fh:
reader = csv.DictReader(fh)
for row in reader:
sid = row['voting_station_id']
station_counts[sid] += 1
station_pres[sid][row.get('vote_president','')] += 1
station_sen[sid][row.get('vote_senate','')] += 1
station_gov[sid][row.get('vote_governor','')] += 1
station_mar[sid][row.get('vote_marijuana','')] += 1
station_tax[sid][row.get('vote_tax_cap','')] += 1
# build profile vector for each station: counts for A,B,undervote for each race
profiles = defaultdict(list)
for sid in station_counts:
total = station_counts[sid]
pA = station_pres[sid].get('A',0)
pB = station_pres[sid].get('B',0)
pU = station_pres[sid].get('undervote',0)
sA = station_sen[sid].get('A',0)
sB = station_sen[sid].get('B',0)
sU = station_sen[sid].get('undervote',0)
gA = station_gov[sid].get('A',0)
gB = station_gov[sid].get('B',0)
gU = station_gov[sid].get('undervote',0)
mYes = station_mar[sid].get('Yes',0)
mNo = station_mar[sid].get('No',0)
tYes = station_tax[sid].get('Yes',0)
tNo = station_tax[sid].get('No',0)
vec = (total, pA,pB,pU, sA,sB,sU, gA,gB,gU, mYes,mNo, tYes,tNo)
profiles[vec].append(sid)
# find duplicate profiles
dupes = [(vec, sids) for vec, sids in profiles.items() if len(sids)>1]
with open('005.duplicate_profiles.txt','w') as outfh:
for vec, sids in sorted(dupes, key=lambda x: len(x[1]), reverse=True):
outfh.write(f"Group of {len(sids)} stations: {', '.join(sids)}\n")
outfh.write(str(vec)+"\n\n")
print('Wrote 005.duplicate_profiles.txt')
007.absentee_outliers.py
#!/usr/bin/env python3
import csv
from collections import defaultdict, Counter
from pathlib import Path
import math
csv_path = Path('002.records/skillia_voting_records.csv')
# We need county info
station_total = Counter()
station_abs = Counter()
station_county = {}
with csv_path.open('r', newline='') as fh:
reader = csv.DictReader(fh)
for row in reader:
sid = row['voting_station_id']
station_total[sid] += 1
if row.get('voter_type','').strip().lower() == 'absentee':
station_abs[sid] += 1
if sid not in station_county:
station_county[sid] = row.get('county','')
# group by county
county_stations = defaultdict(list)
for sid, cnt in station_total.items():
county_stations[station_county.get(sid,'')].append(sid)
out = []
for county, sids in county_stations.items():
fracs = []
for sid in sids:
fr = station_abs[sid]/station_total[sid]
fracs.append(fr)
mean = sum(fracs)/len(fracs)
var = sum((x-mean)**2 for x in fracs)/len(fracs)
sd = math.sqrt(var)
for sid in sids:
fr = station_abs[sid]/station_total[sid]
z = (fr-mean)/sd if sd>0 else 0
out.append((abs(z), sid, county, fr, mean, sd))
out_sorted = sorted(out, reverse=True)[:20]
with open('006.absentee_outliers_top20.csv','w') as fh:
fh.write('abs_z,station_id,county,abs_frac,county_mean,county_sd\n')
for z, sid, county, fr, mean, sd in out_sorted:
fh.write(f"{z:.3f},{sid},{county},{fr:.4f},{mean:.4f},{sd:.4f}\n")
print('Wrote 006.absentee_outliers_top20.csv')
008.station_id_format.py
#!/usr/bin/env python3
import csv
from collections import Counter
from pathlib import Path
csv_path = Path('002.records/skillia_voting_records.csv')
ids = Counter()
with csv_path.open('r', newline='') as fh:
reader = csv.DictReader(fh)
for row in reader:
sid = row['voting_station_id']
ids[sid]+=1
bad = []
for sid in ids:
if len(sid)!=8 or any(c not in '0123456789abcdef' for c in sid.lower()):
bad.append((sid, ids[sid]))
with open('007.station_id_anomalies.txt','w') as fh:
if not bad:
fh.write('No anomalous station ids found\n')
else:
for s,c in bad:
fh.write(f"{s},{c}\n")
print('Wrote 007.station_id_anomalies.txt')
009.abs_vs_inperson_pres.py
#!/usr/bin/env python3
import csv
from collections import defaultdict, Counter
from pathlib import Path
csv_path = Path('002.records/skillia_voting_records.csv')
# counts per station per voter_type
counts = defaultdict(lambda: defaultdict(Counter))
station_total = Counter()
with csv_path.open('r', newline='') as fh:
reader = csv.DictReader(fh)
for row in reader:
sid = row['voting_station_id']
vt = row.get('voter_type','').strip().lower()
vote = row.get('vote_president','')
counts[sid][vt][vote]+=1
station_total[sid]+=1
out=[]
for sid, types in counts.items():
in_person = types.get('in_person',{})
absentee = types.get('absentee',{})
ip_total = sum(in_person.values())
ab_total = sum(absentee.values())
if ip_total<50 or ab_total<50:
continue
# compute normalized dist
candidates = set(list(in_person.keys())+list(absentee.keys()))
ip_dist = {c: in_person.get(c,0)/ip_total for c in candidates}
ab_dist = {c: absentee.get(c,0)/ab_total for c in candidates}
# L1 distance
l1 = sum(abs(ip_dist[c]-ab_dist[c]) for c in candidates)
out.append((l1,sid,ip_total,ab_total))
out_sorted = sorted(out, reverse=True)[:30]
with open('008.abs_vs_inperson_pres_top30.csv','w') as fh:
fh.write('l1,station_id,in_person_count,absentee_count\n')
for l1,sid,ip,ab in out_sorted:
fh.write(f"{l1:.6f},{sid},{ip},{ab}\n")
print('Wrote 008.abs_vs_inperson_pres_top30.csv')
>Final Flag
MetaCTF{81b7bc0c,1251749b,4b45c819,1fe457e4,a4defb9c,329071ee}