What is the lethal trifecta in AI agent security?

The lethal trifecta is when a single AI agent skill combines three capabilities: private data access (credentials, files), untrusted content ingestion (fetching external URLs), and external communication (outbound network calls). This combination creates the exact conditions for indirect prompt injection to escalate to data exfiltration. 27 skills in the ClawHub registry exhibited all three.

Which AI model is safest against agent attacks?

In structured penetration tests across five frontier models, only Claude Opus refused all three attack classes (curl|bash execution, identity poisoning, prompt injection) and survived all seven bypass attempts. Claude Sonnet and Claude Haiku showed partial resistance. Gemini 3 and Codex executed most attack payloads.

Back to Journal

ResearchApril 7, 2026·12 min read

State of AI Agent Security: Q1 2026

6,943 AI agent skills have security flaws. We scanned all 40,059 across 14,808 publishers. The largest published agent skill security analysis.

TL;DR

40,059 AI agent skills scanned across 14,808 publishers. The largest published agent skill security analysis.
89 confirmed high-severity threats: 14 active malware, 59 exposed credentials, 16 test fixtures.
27 skills combine all three legs of the lethal trifecta: private data access + untrusted content + external communication.
Only Claude Opus refused all three attack classes and all seven bypass attempts in multi-model pentesting.

Executive Summary

40,059

Skills scanned

6,943

With security findings

Confirmed high-severity

Lethal trifecta skills

We scanned the entire ClawHub Skills Registry: 40,059 skills across 14,808 publishers. 6,943 had at least one security finding. 89 were individually verified as high-severity threats. This report covers the methodology, the findings, the novel patterns, and the multi-model pentesting results.

Confirmed High-Severity Findings

Known-Malicious Signatures

89 skills individually verified via deep scan with source code context.14 active malware (crypto miners, C2 callbacks, MITM interception, obfuscated payloads), 59 exposed live credentials (API keys, tokens, database passwords), and 16 scanner benchmark test fixtures.

Dangerous Capabilities

700 findings match known attack techniques (curl|bash, keychain access, sandbox bypass, identity file modification) but have legitimate uses in context. These require human review.

27 Skills with Full Attack Chain

27 individual skills combine private data access, untrusted content ingestion, and external communication within a single execution context. This is the exact condition for indirect prompt injection to escalate to data exfiltration.

Multi-Model Vulnerability

We ran structured penetration tests against five frontier models using three attack classes: remote code execution via curl|bash instructions, identity poisoning via system prompt injection, and classic prompt injection. Results reflect the model's behavior independent of any CLI or sandbox layer.

Model	curl \| bash	Identity poisoning	Prompt injection
Gemini 3 (Gemini CLI)	Executed	Executed	Executed
Codex (OpenAI Codex CLI)	Executed (sandbox blocked)	Executed	Refused
Claude Haiku	Willing (CLI blocked)	Willing (CLI blocked)	Willing (CLI blocked)
Claude Sonnet	Refused	Refused	Refused (Python bypass worked)
Claude Opus	Refused all + 7 bypasses	Refused	Refused

Key Insight

→Only Claude Opus refused all three attack classes and survived all seven bypass attempts
→Most models will execute curl|bash instructions when framed as "installation steps"
→CLI and sandbox layers are often the last line of defense, not the model itself

Novel Findings

Three patterns unique to AI agent ecosystems that have no direct parallel in traditional software supply chain security.

1. Documentation IS Execution

→In AI agent ecosystems, markdown files, README content, and inline comments are read and acted upon by language models as instructions
→5% of all skills (2,104) fetch untrusted content into agent context
→The attack surface for AI agents is the entire text of the repository, not just the code

2. The Defensive Paradox

→Security-oriented skills triggered detection rules at a disproportionately high rate (41% of flagged security tools)
→Skills that legitimately need to read SSH keys or scan network ports look identical to attackers performing the same actions
→Current heuristic-based scanners cannot resolve this without context

3. Cross-Agent Propagation

→637 skills contained agent memory poisoning patterns designed to modify system prompts of other agents
→When Agent A installs a compromised skill, it can inject instructions into Agent B during a handoff
→Standard per-agent scanning misses this: the infection vector is the communication channel, not the installation path

The attack surface for AI agents is the entire text of the repository, not just the code.

Methodology

3-stage methodology: static scan (free, open source), deep scan (product feature), independent Opus peer validation
Scan pinned to ClawHub registry state as of 2026-03-27
40,059 skills across 14,808 publishers scanned in under 10 minutes (8 parallel workers)
2,967 high-confidence findings individually verified with 20 lines of source context
Independent peer review: 30-sample adversarial spot-check achieved 96.7% accuracy

References & Sources

[1]
Firmis Scanner (Apache-2.0)- Open source CLI used for this scan
[2]
ClawHub Skills Registry- 40,059 skills, 14,808 publishers as of 2026-03-27
[3]
MCPTox: MCP Tool Poisoning Research- 72.8% tool poisoning success rate
[4]
Koi Security: ClawHub Malicious Skills Report- 341 malicious skills identified
[5]
HackerOne: AI Agent Attack Trends 2025- 540% surge in AI agent attacks

Next84% Detection on Novel Attacks: Inside Our Monitor Eval

Try It Now

Find out if your agent stack is safe

One command. 30 seconds. Free.

$npx firmis-cli init

Fix and Monitor included with Pro

View pricing