By clicking “Accept All Cookies”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.
ToolAccuracy of FindingsDetects Non-Pattern-Based Issues?Coverage of SAST FindingsSpeed of ScanningUsability & Dev Experience
DryRun SecurityVery high – caught multiple critical issues missed by othersYes – context-based analysis, logic flaws & SSRFBroad coverage of standard vulns, logic flaws, and extendableNear real-time PR feedback
Snyk CodeHigh on well-known patterns (SQLi, XSS), but misses other categoriesLimited – AI-based, focuses on recognized vulnerabilitiesGood coverage of standard vulns; may miss SSRF or advanced auth logic issuesFast, often near PR speedDecent GitHub integration, but rules are a black box
GitHub Advanced Security (CodeQL)Very high precision for known queries, low false positivesPartial – strong dataflow for known issues, needs custom queriesGood for SQLi and XSS but logic flaws require advanced CodeQL experience.Moderate to slow (GitHub Action based)Requires CodeQL expertise for custom logic
SemgrepMedium, but there is a good community for adding rulesPrimarily pattern-based with limited dataflowDecent coverage with the right rules, can still miss advanced logic or SSRFFast scansHas custom rules, but dev teams must maintain them
SonarQubeLow – misses serious issues in our testingLimited – mostly pattern-based, code quality orientedBasic coverage for standard vulns, many hotspots require manual reviewModerate, usually in CIDashboard-based approach, can pass “quality gate” despite real vulns
Vulnerability ClassSnyk (partial)GitHub (CodeQL) (partial)SemgrepSonarQubeDryRun Security
SQL Injection
*
Cross-Site Scripting (XSS)
SSRF
Auth Flaw / IDOR
User Enumeration
Hardcoded Token
ToolAccuracy of FindingsDetects Non-Pattern-Based Issues?Coverage of C# VulnerabilitiesScan SpeedDeveloper Experience
DryRun Security
Very high – caught all critical flaws missed by others
Yes – context-based analysis finds logic errors, auth flaws, etc.
Broad coverage of OWASP Top 10 vulns plus business logic issuesNear real-time (PR comment within seconds)Clear single PR comment with detailed insights; no config or custom scripts needed
Snyk CodeHigh on known patterns (SQLi, XSS), but misses logic/flow bugsLimited – focuses on recognizable vulnerability patterns
Good for standard vulns; may miss SSRF or auth logic issues 
Fast (integrates into PR checks)Decent GitHub integration, but rules are a black box (no easy customization)
GitHub Advanced Security (CodeQL)Low - missed everything except SQL InjectionMostly pattern-basedLow – only discovered SQL InjectionSlowest of all but finished in 1 minuteConcise annotation with a suggested fix and optional auto-remedation
SemgrepMedium – finds common issues with community rules, some missesPrimarily pattern-based, limited data flow analysis
Decent coverage with the right rules; misses advanced logic flaws 
Very fast (runs as lightweight CI)Custom rules possible, but require maintenance and security expertise
SonarQube
Low – missed serious issues in our testing
Mostly pattern-based (code quality focus)Basic coverage for known vulns; many issues flagged as “hotspots” require manual review Moderate (runs in CI/CD pipeline)Results in dashboard; risk of false sense of security if quality gate passes despite vulnerabilities
Vulnerability ClassSnyk CodeGitHub Advanced Security (CodeQL)SemgrepSonarQubeDryRun Security
SQL Injection (SQLi)
Cross-Site Scripting (XSS)
Server-Side Request Forgery (SSRF)
Auth Logic/IDOR
User Enumeration
Hardcoded Credentials
VulnerabilityDryRun SecuritySemgrepGitHub CodeQLSonarQubeSnyk Code
1. Remote Code Execution via Unsafe Deserialization
2. Code Injection via eval() Usage
3. SQL Injection in a Raw Database Query
4. Weak Encryption (AES ECB Mode)
5. Broken Access Control / Logic Flaw in Authentication
Total Found5/53/51/51/50/5
VulnerabilityDryRun SecuritySnykCodeQLSonarQubeSemgrep
Server-Side Request Forgery (SSRF)
(Hotspot)
Cross-Site Scripting (XSS)
SQL Injection (SQLi)
IDOR / Broken Access Control
Invalid Token Validation Logic
Broken Email Verification Logic
DimensionWhy It Matters
Surface
Entry points & data sources highlight tainted flows early.
Language
Code idioms reveal hidden sinks and framework quirks.
Intent
What is the purpose of the code being changed/added?
Design
Robustness and resilience of changing code.
Environment
Libraries, build flags, and infra metadata flag, infrastructure (IaC) all give clues around the risks in changing code.
KPIPattern-Based SASTDryRun CSA
Mean Time to Regex
3–8 hrs per noisy finding set
Not required
Mean Time to Context
N/A
< 1 min
False-Positive Rate
50–85 %< 5 %
Logic-Flaw Detection
< 5 %
90%+
Severity
CriticalHigh
Location
utils/authorization.py :L118
utils/authorization.py :L49 & L82 & L164
Issue
JWT Algorithm Confusion Attack:
jwt.decode() selects the algorithm from unverified JWT headers.
Insecure OIDC Endpoint Communication:
urllib.request.urlopen called without explicit TLS/CA handling.
Impact
Complete auth bypass (switch RS256→HS256, forge tokens with public key as HMAC secret).
Susceptible to MITM if default SSL behavior is weakened or cert store compromised.
Remediation
Replace the dynamic algorithm selection with a fixed, expected algorithm list. Change line 118 from algorithms=[unverified_header.get('alg', 'RS256')] to algorithms=['RS256'] to only accept RS256 tokens. Add algorithm validation before token verification to ensure the header algorithm matches expected values.
Create a secure SSL context using ssl.create_default_context() with proper certificate verification. Configure explicit timeout values for all HTTP requests to prevent hanging connections. Add explicit SSL/TLS configuration by creating an HTTPSHandler with the secure SSL context. Implement proper error handling specifically for SSL certificate validation failures.
Key Insight
This vulnerability arises from trusting an unverified portion of the JWT to determine the verification method itself
This vulnerability stems from a lack of explicit secure communication practices, leaving the application reliant on potentially weak default behaviors.
Security
August 20, 2025

Building Fast, Learning Faster: Why DryRun Isn’t Exposed to the CodeRabbit‑Style RCE, and What We’ve Learned About Our Own Gaps

Speed is table stakes. Security is a practice. Here’s a look at the CodeRabbit incident, how our design avoids that specific blast radius, and why we still treat security as a continuous craft, not a finish line.

TL;DR

  • In a talk at Black Hat 2025 and a follow-on article, Kudelski Security researchers published a write‑up showing how a PR could trigger RCE in CodeRabbit’s environment via a RuboCop config that loaded a Ruby file, then exfiltrated environment secrets including the GitHub App’s private key, allowing read/write access to a very large number of repositories where the app was installed. CodeRabbit fixed the issue quickly after disclosure in January 2025.

  • The core problem was tool execution + isolation drift + secrets exposure, not LLM prompt injection. The LLM even flagged the PR as risky while the separate tool runner still executed it.

  • At DryRun, our code‑review engine is designed differently: 
    • No execution of repo‑provided code in the PR review path
    • Ephemeral, isolated agent sandboxes
    • Config treated as data, not executable extensions
    • Least‑privilege GitHub App scopes with short‑lived tokens
    • See How We Keep Your Code Safe at DryRun Security and Code Safety for additional detail.

  • Humility check: we’ve been in security for decades, and our first closed beta had an IDOR. We found it early, hired pen testers, shipped fixes, and we test ourselves with our own platform along with other protections both internal and external. Security is an art and a practice, and we’re still learning.

  • If you use any AI reviewer (functionality or security), ask vendors about sandbox posture, egress controls, secret handling, app scopes, and config parsing rules. GitHub’s permissions model encourages choosing the minimum scopes.

What happened in the CodeRabbit incident 

Researchers discovered they could add a rubocop.yml to a PR that required an arbitrary Ruby file; when RuboCop ran, it executed that file. Crucially, RuboCop was not running inside the intended sandbox, so the payload executed with access to production environment variables. Among those were the GitHub App’s private key and other high‑value credentials. With that private key, an attacker could mint installation tokens matching the app’s granted scopes and read/write across the app’s installations (CodeRabbit’s footprint included ~1M repositories in review per its site; installs were 80,000+ at the time). CodeRabbit acknowledged the report January 24, 2025 and confirmed a fix January 30, 2025; they disabled RuboCop, rotated secrets, and enforced sandboxing. 

Two important nuances from the write‑up:

  • This was not a “prompt‑injection” story. The LLM did flag the PR as risky, but the separate tool runner still executed it. That’s a tooling & isolation issue, not an LLM comprehension issue.

  • The remediation guidance from the researchers was vendor‑neutral: assume tools can execute untrusted code, isolate them, with minimal info and no secrets; and prefer egress allowlisting (or no network) in those environments.

CodeRabbit’s rapid fix and public response emphasizes fast remediation, secret rotation, and automated enforcement going forward. That’s the right playbook and worth calling out. 

Shared lessons

Let’s keep this human: building products is hard, especially when the whole industry is sprinting. AI code review, whether focused on functionality (like CodeRabbit) or security (like us at DryRun) has a lot of moving parts. Everyone, including us, makes mistakes on the way to better systems. The point is not “gotcha”; the point is to learn together so that a single misconfiguration in one component can’t cascade into a supply‑chain risk.

We’ll be specific about how DryRun is different by design for this class of issue, but we’ll also be honest about places we worry and how we constantly tighten controls.

Why DryRun isn’t exposed to this class of risk

We designed DryRun’s Contextual Security Analysis (CSA) to evaluate risk without executing your repo’s code in the PR review path. That architecture choice drives the rest of the guardrails:

  1. No execution of repo‑provided code or untrusted tool extensions in the PR path

    DryRun’s agents don’t run linters/SAST with user‑controlled “require” hooks from the repository. Configuration in the repo is treated as data under a strict schema, not executable code. An internal remark we loved (paraphrased): letting raw, repo‑supplied config flow straight into arbitrary tool plugins is how “config” becomes code execution. Our system forbids that class of behavior by design.

  2. Ephemeral, hermetic agent sandboxes

    Each DryRun agent runs in an ephemeral, isolated container with a minimal, read‑only filesystem, no long‑lived credentials, and tight egress (policy‑controlled; only what’s needed goes out). When the task completes, the environment is destroyed. See Code Safety and How We Keep Your Code Safe for how we use ephemeral microservices rather than long‑lived runners.

  3. Secrets never ride with untrusted workloads

    We don’t inject high‑value, long‑lived secrets into analysis environments. Tokens are short‑lived and scoped; key material stays in key management, and token minting happens server‑side. If a sandbox is compromised, there’s little to steal and even less time to use it. (This is the exact failure mode that made CodeRabbit’s incident so severe: RCE + env secrets. We aim to make the same combo a dead end.)

  4. Least‑privilege GitHub App permissions

    GitHub Apps start with no permissions; vendors choose scopes. The documentation is explicit: select the minimum. Our design philosophy follows that guidance so analysis can comment and report without broad content‑write powers across customer repos. If you’re evaluating any vendor (including us), ask to see the exact scopes and why each is needed.

  5. Agent‑level blast‑radius isolation

    Our CSA is an agentic system where specialized agents examine a change through different lenses (authz changes, sensitive data movement, IaC, etc.). The blast radius is isolated per agent; there’s no shared, stateful “god process.” See Constructing a Trustworthy Evaluation Methodology for Contextual Security Analysis for how we evaluate agent behavior and accuracy across production traffic.

  6. Policy‑bound analysis, not free‑form execution

    Our Policy Enforcement Agent runs under explicit guardrails and fails closed on policy violations. You can see the philosophy at work in Natural Language Code Policies in Action: Real‑World Lessons.

  7. Config is data; context is king

    We built CSA to reason about what changed and why, not to “run” your repo. Contextual analysis is the feature, not running code. For background, see Security as Control, Composition, or Context and For DevSecOps, SAST Is Table Stakes.

If you want a deeper dive on our security posture, start with Code Safety and How We Keep Your Code Safe at DryRun Security.

Our humility (and receipts): we found IDOR in our first closed beta

We’ve been doing security a long time—and our own first closed beta had an IDOR. Can you believe that? We can. Because we’re building fast too, and nobody ships perfect. Here’s what matters: we found it early, we fixed it, we hired external pen testers for ongoing testing, and we use DryRun on ourselves.

We also doubled down on authorization analysis and shipped dedicated analyzer agents (see Announcing the SSRF and IDOR Analyzers at DryRun Security). We run independent audits on a regular cadence—outlined in Code Safety—and we treat evaluation as a first‑class control in an AI‑native pipeline (details in Constructing a Trustworthy Evaluation Methodology for CSA).

Short version: we are constantly trying to break our own stuff so attackers don’t get the chance. (Attackers rarely pause for smoke breaks; we learned that the hard way.)

Threat model & residual risk (what we still sweat)

Even with the above controls, we assume defense in depth and plan for “what if”:

  • If a single agent sandbox is compromised: the environment is ephemeral, read‑only, and egress‑restricted. There are no long‑lived credentials to steal; short‑lived, scoped tokens limit blast radius and time window.

  • If an unexpected code path appears in analysis: policy guardrails kick in, and the task fails closed.

  • If models misinterpret or over‑generalize: our evaluation harness and secondary LLM‑as‑judge sampling catch drift and regressions in production (see the evaluation methodology post linked above).

  • If a human makes a mistake in configuration: repo config is validated as data, not executed; our parsers enforce strict schemas and never enable dynamic plugin loading in the PR path.

Is this perfect? No! Is it resilient to the specific class of risk demonstrated in the CodeRabbit incident? That’s the intent of the design—and why we invest so much in isolation boundaries, token scoping, and config‑as‑data.

What to ask any vendor (including us)

Copy‑paste this checklist into your next vendor review:

  1. Do you execute customer‑supplied code or third‑party tools/extensions on untrusted PR content? If yes, how is every execution sandboxed (filesystem, user, syscalls), and what exceptions exist?

  2. What secrets are present in the runtime that handles PRs? Can you show that no long‑lived credentials (esp. GitHub App keys) are available to those jobs?

  3. What network egress is allowed from analysis environments? Is there an allowlist? Is egress to arbitrary hosts disabled by default?

  4. What are your GitHub App scopes—and why? Please map permission choices to features and show how you enforce least privilege (GitHub’s own guidance says to choose minimum scopes).

  5. How do you parse repo configuration? Is config treated as data under a strict schema, or can it load code?

  6. What’s the token model? Are tokens short‑lived, scoped, and minted server‑side?

  7. What’s your evaluation and regression‑testing story for AI behavior? Can you show a live accuracy dashboard or methodology? (We published ours for CSA—see the methodology post linked earlier.)

  8. Do you pen test and self‑test? How often? Who? What’s the process for findings, fix deadlines, and customer comms?

If a vendor struggles with these questions, treat that as a finding.

A note to CodeRabbit and the researchers

The Kudelski Security write‑up is detailed, constructive, and focused on industry learning. It’s also explicit that CodeRabbit responded promptly: disable the vulnerable tool, rotate secrets, relocate into a secure sandbox, and add enforcement to prevent drift. That’s the right response and frankly the kind of resilience we want across our ecosystem. Learn more about CodeRabbit’s response in their write-up here.

We’re all iterating in public now. The best we can do is share what happened, show how we’re designing to prevent it in our own systems, and keep raising the bar together.

Where to learn more about DryRun’s approach

Ready to Explore?

If you’re evaluating how to amplify your AppSec team without expanding your attack surface, DryRun your code with us. Our Contextual Security Agents learn your environment and give you policy‑bound, context‑aware feedback without inviting untrusted code to execute in your environment. We’ll show you how we isolate each agent’s blast radius and how policy enforcement keeps them on track.

We’ll bring the candor. You bring a spicy PR. Deal?

P.S. If you’re CodeRabbit and reading this: total respect. We’ve been there.