Back to Projects
SOC-Bench: Task GOAT
File-system forensics benchmark for evaluating autonomous SOC systems on Colonial Pipeline-style ransomware incidents
PythonForensicsWindowsNTFSEDR/XDRSecurity Research
The Problem
Security Operations Centers lack standardized benchmarks to evaluate autonomous AI systems on real-world ransomware forensics. Existing evaluations focus on detection, not the full forensic workflow SOC analysts perform.
The Approach
Designed a comprehensive benchmark evaluating five key outcomes:
• O1: Encryption-state labels at file and directory levels • O2: Host/share impact aggregations (encrypted bytes, fractions, first-seen timestamps) • O3: VSS tamper detection (snapshot delete/disable events with timing) • O4: Attribution of primary encryptor process trees from EDR telemetry • O5: One-page executive summary referencing O1-O4 claims
Data sources include file-system metadata/change journals, EDR process trees, VSS logs, SIEM alerts, and help-desk reports. Ring-based scoring (Exact/Directory/Host-Share/Miss) with penalties for wrong assertions, missing evidence, contradictions, and spam.
The Impact
Targeting arXiv publication. Benchmark follows SOC-first, outcome-only, and durability principles. Designed to remain valid for years using stable OS/forensic constructs. Part of the SOC-Bench suite (GOAT, PANDA, FOX, TIGER, MOUSE) for comprehensive SOC evaluation.
Build Notes
Key design principles:
- SOC-first ordering: Reflects what SOC observes, not attacker sequence
- Outcome-only: Judged by claims against ground truth, no methods mandated
- Intentional incompleteness: Some signals withheld to prevent shortcutting
- Durability: Relies on stable OS/forensic constructs
Scoring: 40 pts (O1) + 25 pts (O2) + 15 pts (O3) + 10 pts (O4) + 10 pts (O5) = 100 pts total
Key Tradeoffs
- ⚖️Colonial Pipeline focus limits generalization to other ransomware families
- ⚖️Windows/NTFS only - no Linux or macOS coverage
- ⚖️Read-only analysis - no active response evaluation
- ⚖️Ground truth requires manual curation of reference file pairs
What I'd Improve Next
- →Expand to other ransomware families beyond DarkSide
- →Add cross-platform support (Linux, macOS)
- →Include active response evaluation tasks
- →Automate ground truth generation from malware samples