Tool	Accuracy of Findings	Detects Non-Pattern-Based Issues?	Coverage of SAST Findings	Speed of Scanning	Usability & Dev Experience
DryRun Security	Very high – caught multiple critical issues missed by others	Yes – context-based analysis, logic flaws & SSRF	Broad coverage of standard vulns, logic flaws, and extendable	Near real-time PR feedback	Clear PR comments, expandable policies with no scripting or coding (NLCP)
Snyk Code	High on well-known patterns (SQLi, XSS), but misses other categories	Limited – AI-based, focuses on recognized vulnerabilities	Good coverage of standard vulns; may miss SSRF or advanced auth logic issues	Fast, often near PR speed	Decent GitHub integration, but rules are a black box
GitHub Advanced Security (CodeQL)	Very high precision for known queries, low false positives	Partial – strong dataflow for known issues, needs custom queries	Good for SQLi and XSS but logic flaws require advanced CodeQL experience.	Moderate to slow (GitHub Action based)	Requires CodeQL expertise for custom logic
Semgrep	Medium, but there is a good community for adding rules	Primarily pattern-based with limited dataflow	Decent coverage with the right rules, can still miss advanced logic or SSRF	Fast scans	Has custom rules, but dev teams must maintain them
SonarQube	Low – misses serious issues in our testing	Limited – mostly pattern-based, code quality oriented	Basic coverage for standard vulns, many hotspots require manual review	Moderate, usually in CI	Dashboard-based approach, can pass “quality gate” despite real vulns

Tool	Accuracy of Findings	Detects Non-Pattern-Based Issues?	Coverage of C# Vulnerabilities	Scan Speed	Developer Experience
DryRun Security	Very high – caught all critical flaws missed by others	Yes – context-based analysis finds logic errors, auth flaws, etc.	Broad coverage of OWASP Top 10 vulns plus business logic issues	Near real-time (PR comment within seconds)	Clear single PR comment with detailed insights; no config or custom scripts needed
Snyk Code	High on known patterns (SQLi, XSS), but misses logic/flow bugs	Limited – focuses on recognizable vulnerability patterns	Good for standard vulns; may miss SSRF or auth logic issues	Fast (integrates into PR checks)	Decent GitHub integration, but rules are a black box (no easy customization)
GitHub Advanced Security (CodeQL)	Low - missed everything except SQL Injection	Mostly pattern-based	Low – only discovered SQL Injection	Slowest of all but finished in 1 minute	Concise annotation with a suggested fix and optional auto-remedation
Semgrep	Medium – finds common issues with community rules, some misses	Primarily pattern-based, limited data flow analysis	Decent coverage with the right rules; misses advanced logic flaws	Very fast (runs as lightweight CI)	Custom rules possible, but require maintenance and security expertise
SonarQube	Low – missed serious issues in our testing	Mostly pattern-based (code quality focus)	Basic coverage for known vulns; many issues flagged as “hotspots” require manual review	Moderate (runs in CI/CD pipeline)	Results in dashboard; risk of false sense of security if quality gate passes despite vulnerabilities

Dimension	Why It Matters
Surface	Entry points & data sources highlight tainted flows early.
Language	Code idioms reveal hidden sinks and framework quirks.
Intent	What is the purpose of the code being changed/added?
Design	Robustness and resilience of changing code.
Environment	Libraries, build flags, and infra metadata flag, infrastructure (IaC) all give clues around the risks in changing code.

KPI	Pattern-Based SAST	DryRun CSA
Mean Time to Regex	3–8 hrs per noisy finding set	Not required
Mean Time to Context	N/A	< 1 min
False-Positive Rate	50–85 %	< 5 %
Logic-Flaw Detection	< 5 %	90%+

	Severity
Location	utils/authorization.py :L118	utils/authorization.py :L49 & L82 & L164
Issue	JWT Algorithm Confusion Attack: jwt.decode() selects the algorithm from unverified JWT headers.	Insecure OIDC Endpoint Communication: ‍urllib.request.urlopen called without explicit TLS/CA handling.
Impact	Complete auth bypass (switch RS256→HS256, forge tokens with public key as HMAC secret).	Susceptible to MITM if default SSL behavior is weakened or cert store compromised.
Remediation	Replace the dynamic algorithm selection with a fixed, expected algorithm list. Change line 118 from algorithms=[unverified_header.get('alg', 'RS256')] to algorithms=['RS256'] to only accept RS256 tokens. Add algorithm validation before token verification to ensure the header algorithm matches expected values.	Create a secure SSL context using ssl.create_default_context() with proper certificate verification. Configure explicit timeout values for all HTTP requests to prevent hanging connections. Add explicit SSL/TLS configuration by creating an HTTPSHandler with the secure SSL context. Implement proper error handling specifically for SSL certificate validation failures.
Key Insight	This vulnerability arises from trusting an unverified portion of the JWT to determine the verification method itself	This vulnerability stems from a lack of explicit secure communication practices, leaving the application reliant on potentially weak default behaviors.

Security

•

August 20, 2025

Building Fast, Learning Faster: Why DryRun Isn’t Exposed to the CodeRabbit‑Style RCE, and What We’ve Learned About Our Own Gaps

Speed is table stakes. Security is a practice. Here’s a look at the CodeRabbit incident, how our design avoids that specific blast radius, and why we still treat security as a continuous craft, not a finish line.

TL;DR

In a talk at Black Hat 2025 and a follow-on article, Kudelski Security researchers published a write‑up showing how a PR could trigger RCE in CodeRabbit’s environment via a RuboCop config that loaded a Ruby file, then exfiltrated environment secrets including the GitHub App’s private key, allowing read/write access to a very large number of repositories where the app was installed. CodeRabbit fixed the issue quickly after disclosure in January 2025.
The core problem was tool execution + isolation drift + secrets exposure, not LLM prompt injection. The LLM even flagged the PR as risky while the separate tool runner still executed it.
At DryRun, our code‑review engine is designed differently:
- No execution of repo‑provided code in the PR review path
- Ephemeral, isolated agent sandboxes
- Config treated as data, not executable extensions
- Least‑privilege GitHub App scopes with short‑lived tokens
- See How We Keep Your Code Safe at DryRun Security and Code Safety for additional detail.
Humility check: we’ve been in security for decades, and our first closed beta had an IDOR. We found it early, hired pen testers, shipped fixes, and we test ourselves with our own platform along with other protections both internal and external. Security is an art and a practice, and we’re still learning.
If you use any AI reviewer (functionality or security), ask vendors about sandbox posture, egress controls, secret handling, app scopes, and config parsing rules. GitHub’s permissions model encourages choosing the minimum scopes.

‍

What happened in the CodeRabbit incident

Researchers discovered they could add a rubocop.yml to a PR that required an arbitrary Ruby file; when RuboCop ran, it executed that file. Crucially, RuboCop was not running inside the intended sandbox, so the payload executed with access to production environment variables. Among those were the GitHub App’s private key and other high‑value credentials. With that private key, an attacker could mint installation tokens matching the app’s granted scopes and read/write across the app’s installations (CodeRabbit’s footprint included ~1M repositories in review per its site; installs were 80,000+ at the time). CodeRabbit acknowledged the report January 24, 2025 and confirmed a fix January 30, 2025; they disabled RuboCop, rotated secrets, and enforced sandboxing.

Two important nuances from the write‑up:

This was not a “prompt‑injection” story. The LLM did flag the PR as risky, but the separate tool runner still executed it. That’s a tooling & isolation issue, not an LLM comprehension issue.
The remediation guidance from the researchers was vendor‑neutral: assume tools can execute untrusted code, isolate them, with minimal info and no secrets; and prefer egress allowlisting (or no network) in those environments.

CodeRabbit’s rapid fix and public response emphasizes fast remediation, secret rotation, and automated enforcement going forward. That’s the right playbook and worth calling out.

Shared lessons

Let’s keep this human: building products is hard, especially when the whole industry is sprinting. AI code review, whether focused on functionality (like CodeRabbit) or security (like us at DryRun) has a lot of moving parts. Everyone, including us, makes mistakes on the way to better systems. The point is not “gotcha”; the point is to learn together so that a single misconfiguration in one component can’t cascade into a supply‑chain risk.

We’ll be specific about how DryRun is different by design for this class of issue, but we’ll also be honest about places we worry and how we constantly tighten controls.

Why DryRun isn’t exposed to this class of risk

We designed DryRun’s Contextual Security Analysis (CSA) to evaluate risk without executing your repo’s code in the PR review path. That architecture choice drives the rest of the guardrails:

No execution of repo‑provided code or untrusted tool extensions in the PR path

DryRun’s agents don’t run linters/SAST with user‑controlled “require” hooks from the repository. Configuration in the repo is treated as data under a strict schema, not executable code. An internal remark we loved (paraphrased): letting raw, repo‑supplied config flow straight into arbitrary tool plugins is how “config” becomes code execution. Our system forbids that class of behavior by design.
Ephemeral, hermetic agent sandboxes

Each DryRun agent runs in an ephemeral, isolated container with a minimal, read‑only filesystem, no long‑lived credentials, and tight egress (policy‑controlled; only what’s needed goes out). When the task completes, the environment is destroyed. See Code Safety and How We Keep Your Code Safe for how we use ephemeral microservices rather than long‑lived runners.
Secrets never ride with untrusted workloads

We don’t inject high‑value, long‑lived secrets into analysis environments. Tokens are short‑lived and scoped; key material stays in key management, and token minting happens server‑side. If a sandbox is compromised, there’s little to steal and even less time to use it. (This is the exact failure mode that made CodeRabbit’s incident so severe: RCE + env secrets. We aim to make the same combo a dead end.)
Least‑privilege GitHub App permissions

GitHub Apps start with no permissions; vendors choose scopes. The documentation is explicit: select the minimum. Our design philosophy follows that guidance so analysis can comment and report without broad content‑write powers across customer repos. If you’re evaluating any vendor (including us), ask to see the exact scopes and why each is needed.
Agent‑level blast‑radius isolation

Our CSA is an agentic system where specialized agents examine a change through different lenses (authz changes, sensitive data movement, IaC, etc.). The blast radius is isolated per agent; there’s no shared, stateful “god process.” See Constructing a Trustworthy Evaluation Methodology for Contextual Security Analysis for how we evaluate agent behavior and accuracy across production traffic.
Policy‑bound analysis, not free‑form execution

Our Policy Enforcement Agent runs under explicit guardrails and fails closed on policy violations. You can see the philosophy at work in Natural Language Code Policies in Action: Real‑World Lessons.
Config is data; context is king

We built CSA to reason about what changed and why, not to “run” your repo. Contextual analysis is the feature, not running code. For background, see Security as Control, Composition, or Context and For DevSecOps, SAST Is Table Stakes.

If you want a deeper dive on our security posture, start with Code Safety and How We Keep Your Code Safe at DryRun Security.

Our humility (and receipts): we found IDOR in our first closed beta

We’ve been doing security a long time—and our own first closed beta had an IDOR. Can you believe that? We can. Because we’re building fast too, and nobody ships perfect. Here’s what matters: we found it early, we fixed it, we hired external pen testers for ongoing testing, and we use DryRun on ourselves.

We also doubled down on authorization analysis and shipped dedicated analyzer agents (see Announcing the SSRF and IDOR Analyzers at DryRun Security). We run independent audits on a regular cadence—outlined in Code Safety—and we treat evaluation as a first‑class control in an AI‑native pipeline (details in Constructing a Trustworthy Evaluation Methodology for CSA).

Short version: we are constantly trying to break our own stuff so attackers don’t get the chance. (Attackers rarely pause for smoke breaks; we learned that the hard way.)

Threat model & residual risk (what we still sweat)

Even with the above controls, we assume defense in depth and plan for “what if”:

If a single agent sandbox is compromised: the environment is ephemeral, read‑only, and egress‑restricted. There are no long‑lived credentials to steal; short‑lived, scoped tokens limit blast radius and time window.
If an unexpected code path appears in analysis: policy guardrails kick in, and the task fails closed.
If models misinterpret or over‑generalize: our evaluation harness and secondary LLM‑as‑judge sampling catch drift and regressions in production (see the evaluation methodology post linked above).
If a human makes a mistake in configuration: repo config is validated as data, not executed; our parsers enforce strict schemas and never enable dynamic plugin loading in the PR path.

Is this perfect? No! Is it resilient to the specific class of risk demonstrated in the CodeRabbit incident? That’s the intent of the design—and why we invest so much in isolation boundaries, token scoping, and config‑as‑data.

What to ask any vendor (including us)

Copy‑paste this checklist into your next vendor review:

Do you execute customer‑supplied code or third‑party tools/extensions on untrusted PR content? If yes, how is every execution sandboxed (filesystem, user, syscalls), and what exceptions exist?
What secrets are present in the runtime that handles PRs? Can you show that no long‑lived credentials (esp. GitHub App keys) are available to those jobs?
What network egress is allowed from analysis environments? Is there an allowlist? Is egress to arbitrary hosts disabled by default?
What are your GitHub App scopes—and why? Please map permission choices to features and show how you enforce least privilege (GitHub’s own guidance says to choose minimum scopes).
How do you parse repo configuration? Is config treated as data under a strict schema, or can it load code?
What’s the token model? Are tokens short‑lived, scoped, and minted server‑side?
What’s your evaluation and regression‑testing story for AI behavior? Can you show a live accuracy dashboard or methodology? (We published ours for CSA—see the methodology post linked earlier.)
Do you pen test and self‑test? How often? Who? What’s the process for findings, fix deadlines, and customer comms?

If a vendor struggles with these questions, treat that as a finding.

A note to CodeRabbit and the researchers

The Kudelski Security write‑up is detailed, constructive, and focused on industry learning. It’s also explicit that CodeRabbit responded promptly: disable the vulnerable tool, rotate secrets, relocate into a secure sandbox, and add enforcement to prevent drift. That’s the right response and frankly the kind of resilience we want across our ecosystem. Learn more about CodeRabbit’s response in their write-up here.

We’re all iterating in public now. The best we can do is share what happened, show how we’re designing to prevent it in our own systems, and keep raising the bar together.

Where to learn more about DryRun’s approach

Ready to Explore?

If you’re evaluating how to amplify your AppSec team without expanding your attack surface, DryRun your code with us. Our Contextual Security Agents learn your environment and give you policy‑bound, context‑aware feedback without inviting untrusted code to execute in your environment. We’ll show you how we isolate each agent’s blast radius and how policy enforcement keeps them on track.

We’ll bring the candor. You bring a spicy PR. Deal?

P.S. If you’re CodeRabbit and reading this: total respect. We’ve been there.

‍

James Wickett

CEO & Co-Founder

Ken Johnson

Co-founder & CTO

Vulnerability Class	Snyk (partial)	GitHub (CodeQL) (partial)	Semgrep	SonarQube	DryRun Security
SQL Injection			*
Cross-Site Scripting (XSS)
SSRF
Auth Flaw / IDOR
User Enumeration
Hardcoded Token

Vulnerability Class	Snyk Code	GitHub Advanced Security (CodeQL)	Semgrep	SonarQube	DryRun Security
SQL Injection (SQLi)
Cross-Site Scripting (XSS)
Server-Side Request Forgery (SSRF)
Auth Logic/IDOR
User Enumeration
Hardcoded Credentials

Vulnerability	DryRun Security	Semgrep	GitHub CodeQL	SonarQube	Snyk Code
1. Remote Code Execution via Unsafe Deserialization
2. Code Injection via eval() Usage
3. SQL Injection in a Raw Database Query
4. Weak Encryption (AES ECB Mode)
5. Broken Access Control / Logic Flaw in Authentication
Total Found	5/5	3/5	1/5	1/5	0/5

Vulnerability	DryRun Security	Snyk	CodeQL	SonarQube	Semgrep
Server-Side Request Forgery (SSRF)				(Hotspot)
Cross-Site Scripting (XSS)
SQL Injection (SQLi)
IDOR / Broken Access Control
Broken Authentication Logic
Invalid Token Validation Logic
Broken Email Verification Logic

	Severity
	Critical	High
Location	utils/authorization.py :L118	utils/authorization.py :L49 & L82 & L164
Issue	JWT Algorithm Confusion Attack: jwt.decode() selects the algorithm from unverified JWT headers.	Insecure OIDC Endpoint Communication: ‍urllib.request.urlopen called without explicit TLS/CA handling.
Impact	Complete auth bypass (switch RS256→HS256, forge tokens with public key as HMAC secret).	Susceptible to MITM if default SSL behavior is weakened or cert store compromised.
Remediation	Replace the dynamic algorithm selection with a fixed, expected algorithm list. Change line 118 from algorithms=[unverified_header.get('alg', 'RS256')] to algorithms=['RS256'] to only accept RS256 tokens. Add algorithm validation before token verification to ensure the header algorithm matches expected values.	Create a secure SSL context using ssl.create_default_context() with proper certificate verification. Configure explicit timeout values for all HTTP requests to prevent hanging connections. Add explicit SSL/TLS configuration by creating an HTTPSHandler with the secure SSL context. Implement proper error handling specifically for SSL certificate validation failures.
Key Insight	This vulnerability arises from trusting an unverified portion of the JWT to determine the verification method itself	This vulnerability stems from a lack of explicit secure communication practices, leaving the application reliant on potentially weak default behaviors.