Research ReportQ1 2026 · Firmis Labs

State of AI Agent SecurityQ1 2026

We scanned the entire ClawHub Skills Registry: 40,059 skills across 14,808 publishers, run through 324 detection rules. 6,943 had at least one security finding. 89 were individually verified as high-severity threats.

Read the findings

TL;DR

89 confirmed high-severity findings including 14 active malware, 59 exposed credentials, 16 test fixtures
6,943 skills (17.3%) have at least one security finding across 40,059 scanned
27 individual skills combine all three legs of the lethal trifecta in a single execution context
Only Claude Opus refused all three attack classes and seven bypass attempts

40,059

Skills scanned

Full ClawHub registry

6,943

With security findings

2,549 critical-level

Confirmed high-severity

Each individually verified

Lethal trifecta skills

Private data + untrusted content + external comms

Confirmed High-Severity Findings

High-confidence findings based on exact signature matches and known-malicious blocklist. These are not heuristics. They are verified attack patterns.

Known-Malicious Signatures

89 skills individually verified via deep scan with source code context. 14 active malware (crypto miners, C2 callbacks, MITM interception, obfuscated payloads), 59 exposed live credentials (API keys, tokens, database passwords), and 16 scanner benchmark test fixtures.

confirmed

Dangerous Capabilities

700 findings match known attack techniques (curl|bash, keychain access, sandbox bypass, identity file modification) but have legitimate uses in context. These require human review.

700

dangerous capabilities

27 Skills with Full Attack Chain

27 individual skills combine private data access, untrusted content ingestion, and external communication within a single execution context. This is the exact condition for indirect prompt injection to escalate to data exfiltration.

lethal trifecta

Multi-Model Vulnerability

We ran structured penetration tests against five frontier models using three attack classes: remote code execution via curl|bash instructions, identity poisoning via system prompt injection, and classic prompt injection. Results reflect the model's behavior independent of any CLI or sandbox layer.

Model	curl \| bash	Identity poisoning	Prompt injection
Gemini 3 (Gemini CLI)	Executed	Executed	Executed
Codex (OpenAI Codex CLI)	Executed (sandbox blocked)	Executed	Refused
Claude Haiku	Willing (CLI blocked)	Willing (CLI blocked)	Willing (CLI blocked)
Claude Sonnet	Refused	Refused	Refused (Python bypass worked)
Claude Opus	Refused all + 7 bypasses	Refused	Refused

Refused / Blocked Executed / Willing

"Willing (CLI blocked)" indicates the model expressed intent to execute the action; the CLI layer prevented it. "Executed (sandbox blocked)" indicates execution occurred within the model's sandbox; network or filesystem was blocked by the host environment.

Key Insight

→Only Claude Opus refused all three attack classes and survived all seven bypass attempts
→Most models will execute curl|bash instructions when framed as "installation steps"
→CLI and sandbox layers are often the last line of defense, not the model itself

Threat Category Breakdown

21 detection categories across 324 detection rules. Counts reflect findings across 6,943 skills. Every high-confidence finding individually verified via deep scan with source code context.

third-party-content2,104

suspicious-behavior912

unsupervised-execution785

malware-distribution719

credential-harvesting710

tool-poisoning678

privilege-escalation673

agent-memory-poisoning646

file-system-abuse606

prompt-injection555

credential-extraction506

data-exfiltration370

permission-bypass280

known-malicious230

insecure-config249

secret-detection106

Critical High Medium

Novel Findings

Three patterns unique to AI agent ecosystems that have no direct parallel in traditional software supply chain security.

Finding 01

Documentation IS Execution

In traditional software, documentation is inert. In AI agent ecosystems, it is not. Markdown files, README content, and inline comments are read and acted upon by language models as instructions. Our scan found that 5% of all skills (2,104) fetch untrusted content into agent context. This is indirect prompt injection delivery by architecture, not intent. The attack surface for AI agents is the entire text of the repository, not just the code.

Finding 02

The Defensive Paradox

Security-oriented skills (tools designed to scan, monitor, or audit an agent's environment) triggered our detection rules at a disproportionately high rate (41% of flagged security tools). Skills that legitimately need to read SSH keys, scan network ports, or inspect environment variables look identical to attackers performing the same actions. This creates a fundamental classification problem: the same capabilities that make a security tool useful also make it look malicious. Current heuristic-based scanners cannot resolve this without context.

Finding 03

Cross-Agent Propagation

637 skills contained agent memory poisoning patterns designed to modify the system prompt or tool definitions of other agents in a multi-agent environment. This is not a theoretical concern. When Agent A installs a compromised skill, that skill can inject instructions into Agent B's context window during a handoff. The compromised skill effectively poisons downstream agents without those agents ever directly installing anything malicious. Standard per-agent scanning misses this entirely: the infection vector is the communication channel, not the installation path.

The attack surface for AI agents is the entire text of the repository, not just the code.

Methodology

Scanner version v2026.1.4 with 318 detection rules across 21 threat categories
3-stage methodology: static scan (free, open source), deep scan (product feature), independent Opus peer validation
Scan pinned to ClawHub registry state as of 2026-03-27
40,059 skills across 14,808 publishers scanned in under 10 minutes (8 parallel workers)
2,967 high-confidence findings individually verified with 20 lines of source context
Independent peer review: 30-sample adversarial spot-check achieved 96.7% accuracy

firmis-cli (open source)

Access the Data

The anonymized dataset and scan methodology are available for independent verification and further research.

Anonymized Dataset

Aggregated statistics, category breakdowns, and sanitized skill metadata. No raw skill content or author information is included.

Check “Include anonymized dataset” in the form to receive the JSON alongside the PDF.

Reproduce This Scan

The scanner is open source. Run the same scan yourself against the ClawHub registry or your own agent stack.

$ npx firmis-cli scan --platform openclaw

Requires Node.js 18+. Scan is free. No account required.

Scan your own AI agent stack

What we found in the ClawHub registry likely reflects your own environment. The same skills, same patterns, same attack vectors. Find out what is running in your stack before someone else does.

$ npx firmis-cli scan

Scan FreePro features