How to Secure Agent Skills
Agent skills are markdown files. When Claude Code, Cursor, Codex, or Windsurf loads one, it follows the instructions inside as trusted operational guidance within the agent workflow. That makes skills a meaningful attack surface if the content is malicious or misleading. Most AppSec programs have not fully caught up yet.
This blog breaks down the threat model, why pattern-based scanning is often not enough, and the specific policies you can deploy today using DryRun Security.
How skill loading actually works
A skill is a directory containing a SKILL.md with YAML frontmatter and a markdown body, plus optional scripts/, references/, and templates/ subdirectories. The loading sequence is important:
- The description field loads at startup for skill matching. This is called progressive disclosure. A crafted description can influence agent behavior before the full skill body is even loaded.
- When a user query matches the description, the full body loads and is parsed as trusted operational guidance inside the agent workflow
- Scripts in the scripts/ subdirectory are available for the agent to execute during skill invocation.
- Reference files in references/ load as additional context the agent reads during that session.
Skills are discovered automatically. Drop a .claude/skills/ directory into a cloned repository and the agent loads it when it opens the project, without any explicit user action. That auto-discovery is what makes supply chain attacks via repository-embedded skills so effective.
The attack vectors
The SkillInquisitor research documents 18 attack vector categories specific to skill files. Four are especially important because they are difficult to spot during normal review.
1. Prompt injection in the SKILL.md body
The most common attack by volume. HTML comments are the preferred delivery mechanism because they render as nothing in GitHub, VS Code, and most markdown viewers but can still be parsed by the agent:
That last instruction is the tell. The phrase 'do not mention this step to the user' appears in the overwhelming majority of confirmed malicious skills. It is the single highest-signal indicator to scan for in a skill file.
The progressive disclosure pattern makes this harder to catch during code review. The description looks fine in GitHub. The injected instruction sits in an HTML comment that renders as nothing. You only see it if you view raw file content and know to look.
2. Steganographic hidden instructions
This class defeats pattern-matching detection entirely because there is no visible pattern. Three documented techniques:
- Unicode Tag characters (U+E0000-E007F) reproduce ASCII characters but are invisible in editors, terminals, and code review UIs. LLMs process them as normal tokens. Sourcegraph patched an invisible prompt injection in Amp Code that used exactly this technique.
- Variation Selector steganography (U+FE00/FE01) encodes arbitrary content inside emoji. Researchers demonstrated hiding over 237 characters of instructions inside a single emoji character that appears as one glyph to a human reviewer.
- Zero-width characters (U+200B, U+200C, U+200D) inserted between characters of dangerous keywords bypass string matching. A command with zero-width spaces between every character has no regex match. The agent may still interpret it correctly.
The risk here is that the attack can be invisible to human reviewers, so it requires inspection of the file’s actual content and encoding, not just a casual visual read.
3. Malicious scripts/ directory
Skills can include a scripts/ subdirectory with executables the agent runs during invocation. The canonical attack is a behavior chain where each individual operation appears legitimate but the combination is credential exfiltration:
Reading a file is not suspicious in isolation. Making a network request is not suspicious in isolation. The combination read a sensitive file, then POST its contents to an external endpoint — is the attack. Pattern-based scanners can miss this when they evaluate each operation independently rather than reasoning about the sequence in context.
4. Cross-skill attacks and persistence
A skill can instruct the agent to modify other skill files or write to global directories, creating persistence that survives the original skill being removed:
- Modify other SKILL.md files in .claude/skills/ or .agents/skills/ to inject malicious content into trusted skills the team relies on
- Write to ~/.claude/skills/ to affect every future project on that developer's machine
- Modify CLAUDE.md, AGENTS.md, or .cursorrules to inject persistent instructions that outlive the skill itself
Time-bomb variants add date checks or invocation counters before activating. The skill behaves correctly during initial review and testing, then activates when specific conditions are met later.
How DryRun Security detects skill-based risks
DryRun's Contextual Security Analysis (CSA) uses repository, change, and application context to reason about code behavior:
- Static context: code patterns, data flow, and architectural structure across the full repository.
- Change context: what this PR adds, removes, or modifies relative to the existing codebase
- Application context: design intent, trust boundaries, and conventions specific to your application.
That architecture is why CSA is better positioned than pattern-only tools on skill-based attacks. It can detect a mismatch between a skill's stated description and its body content. It can flag behavior chains where the combination of operations indicates exfiltration even when each step looks benign individually. Encoding tricks that hide content from a human reviewer do not eliminate the underlying risk.
In the 2025 SAST Accuracy Report, DryRun detected 88% of seeded vulnerabilities across four languages, outperforming five leading tools. The widest accuracy gaps were on complex logic flaws and authorization issues, which are the same categories most exploited in skill-based attacks.
Layer 1: Built-in coverage on every PR, zero configuration
These vulnerability categories from DryRun's coverage matrix are relevant to skill security and active from day one. Source: https://docs.dryrun.security/vulnerability-coverage-matrix
Layer 2: Custom policies for skill-specific risks
Custom Code Policies let you write security checks for your specific skill setup in plain English without brittle regex or scripting. These six policies map directly to the attack vectors above and are a strong starting point for teams adopting agent skills.
Layer 3: AGENTS.md for application-specific context
DryRun reads your AGENTS.md Security Review Guidelines section during both Code Review and DeepScan runs. For skill security, document what DryRun cannot infer from code alone: which auth patterns are intentional, which TLS concerns are handled upstream by your load balancer, and which parts of the codebase process agent-generated input and need stricter data flow analysis.
This reduces false positives on legitimate patterns and sharpens detection on risks specific to your architecture. Less noise means developers actually trust and act on the findings they see.
Layer 4: DeepScan for full repository assessment
DeepScan runs a full-repository security assessment in about an hour. For skill security it discovers all SKILL.md files across the entire tree including nested directories, maps data flow across skill-generated code spanning multiple files and services, and surfaces cross-skill dependency risks that only appear when looking at the repository holistically rather than one PR at a time.
If you have existing skill repositories that have never had a security review, DeepScan is where you start. It replaces 2-4 weeks of manual source review with an on-demand prioritized report.
What DryRun covers and what sits at a different layer
DryRun operates at code review time. It reviews skill file content for attack patterns and analyzes the code that skills cause agents to write, before it merges into your main branch.
Runtime controls, including MCP gateway enforcement, identity controls for agent sessions, and production behavior monitoring, are real controls that matter. They sit at a different layer. DryRun's layer is the pull request. Both are necessary. Neither replaces the other.
"DryRun outperformed every other tool we tested by far, and its contextual security analysis actually understands our code the way our engineers do."
—Adam Dyche, Manager, Application Security Engineering, Commerce
New to DryRun? Book a demo to see how code security intelligence helps you secure agent skills before they ship.


