Tool	Accuracy of Findings	Detects Non-Pattern-Based Issues?	Coverage of SAST Findings	Speed of Scanning	Usability & Dev Experience
DryRun Security	Very high – caught multiple critical issues missed by others	Yes – context-based analysis, logic flaws & SSRF	Broad coverage of standard vulns, logic flaws, and extendable	Near real-time PR feedback	Clear PR comments, expandable policies with no scripting or coding (NLCP)
Snyk Code	High on well-known patterns (SQLi, XSS), but misses other categories	Limited – AI-based, focuses on recognized vulnerabilities	Good coverage of standard vulns; may miss SSRF or advanced auth logic issues	Fast, often near PR speed	Decent GitHub integration, but rules are a black box
GitHub Advanced Security (CodeQL)	Very high precision for known queries, low false positives	Partial – strong dataflow for known issues, needs custom queries	Good for SQLi and XSS but logic flaws require advanced CodeQL experience.	Moderate to slow (GitHub Action based)	Requires CodeQL expertise for custom logic
Semgrep	Medium, but there is a good community for adding rules	Primarily pattern-based with limited dataflow	Decent coverage with the right rules, can still miss advanced logic or SSRF	Fast scans	Has custom rules, but dev teams must maintain them
SonarQube	Low – misses serious issues in our testing	Limited – mostly pattern-based, code quality oriented	Basic coverage for standard vulns, many hotspots require manual review	Moderate, usually in CI	Dashboard-based approach, can pass “quality gate” despite real vulns

Tool

Accuracy of Findings

Detects Non-Pattern-Based Issues?

Coverage of SAST Findings

Speed of Scanning

Usability & Dev Experience

DryRun Security

Very high – caught multiple critical issues missed by others

Yes – context-based analysis, logic flaws & SSRF

Broad coverage of standard vulns, logic flaws, and extendable

Near real-time PR feedback

Clear PR comments, expandable policies with no scripting or coding (NLCP)

Snyk Code

High on well-known patterns (SQLi, XSS), but misses other categories

Limited – AI-based, focuses on recognized vulnerabilities

Good coverage of standard vulns; may miss SSRF or advanced auth logic issues

Fast, often near PR speed

Decent GitHub integration, but rules are a black box

GitHub Advanced Security (CodeQL)

Very high precision for known queries, low false positives

Partial – strong dataflow for known issues, needs custom queries

Good for SQLi and XSS but logic flaws require advanced CodeQL experience.

Moderate to slow (GitHub Action based)

Requires CodeQL expertise for custom logic

Semgrep

Medium, but there is a good community for adding rules

Primarily pattern-based with limited dataflow

Decent coverage with the right rules, can still miss advanced logic or SSRF

Fast scans

Has custom rules, but dev teams must maintain them

SonarQube

Low – misses serious issues in our testing

Limited – mostly pattern-based, code quality oriented

Basic coverage for standard vulns, many hotspots require manual review

Moderate, usually in CI

Dashboard-based approach, can pass “quality gate” despite real vulns

Vulnerability Class

Snyk (partial)

GitHub (CodeQL) (partial)

Semgrep

SonarQube

DryRun Security

SQL Injection

Cross-Site Scripting (XSS)

SSRF

Auth Flaw / IDOR

User Enumeration

Hardcoded Token

Tool	Accuracy of Findings	Detects Non-Pattern-Based Issues?	Coverage of C# Vulnerabilities	Scan Speed	Developer Experience
DryRun Security	Very high – caught all critical flaws missed by others	Yes – context-based analysis finds logic errors, auth flaws, etc.	Broad coverage of OWASP Top 10 vulns plus business logic issues	Near real-time (PR comment within seconds)	Clear single PR comment with detailed insights; no config or custom scripts needed
Snyk Code	High on known patterns (SQLi, XSS), but misses logic/flow bugs	Limited – focuses on recognizable vulnerability patterns	Good for standard vulns; may miss SSRF or auth logic issues	Fast (integrates into PR checks)	Decent GitHub integration, but rules are a black box (no easy customization)
GitHub Advanced Security (CodeQL)	Low - missed everything except SQL Injection	Mostly pattern-based	Low – only discovered SQL Injection	Slowest of all but finished in 1 minute	Concise annotation with a suggested fix and optional auto-remedation
Semgrep	Medium – finds common issues with community rules, some misses	Primarily pattern-based, limited data flow analysis	Decent coverage with the right rules; misses advanced logic flaws	Very fast (runs as lightweight CI)	Custom rules possible, but require maintenance and security expertise
SonarQube	Low – missed serious issues in our testing	Mostly pattern-based (code quality focus)	Basic coverage for known vulns; many issues flagged as “hotspots” require manual review	Moderate (runs in CI/CD pipeline)	Results in dashboard; risk of false sense of security if quality gate passes despite vulnerabilities

Tool

Accuracy of Findings

Detects Non-Pattern-Based Issues?

Coverage of C# Vulnerabilities

Scan Speed

Developer Experience

DryRun Security

Very high – caught all critical flaws missed by others

Yes – context-based analysis finds logic errors, auth flaws, etc.

Broad coverage of OWASP Top 10 vulns plus business logic issues

Near real-time (PR comment within seconds)

Clear single PR comment with detailed insights; no config or custom scripts needed

Snyk Code

High on known patterns (SQLi, XSS), but misses logic/flow bugs

Limited – focuses on recognizable vulnerability patterns

Good for standard vulns; may miss SSRF or auth logic issues

Fast (integrates into PR checks)

Decent GitHub integration, but rules are a black box (no easy customization)

GitHub Advanced Security (CodeQL)

Low - missed everything except SQL Injection

Mostly pattern-based

Low – only discovered SQL Injection

Slowest of all but finished in 1 minute

Concise annotation with a suggested fix and optional auto-remedation

Semgrep

Medium – finds common issues with community rules, some misses

Primarily pattern-based, limited data flow analysis

Decent coverage with the right rules; misses advanced logic flaws

Very fast (runs as lightweight CI)

Custom rules possible, but require maintenance and security expertise

SonarQube

Low – missed serious issues in our testing

Mostly pattern-based (code quality focus)

Basic coverage for known vulns; many issues flagged as “hotspots” require manual review

Moderate (runs in CI/CD pipeline)

Results in dashboard; risk of false sense of security if quality gate passes despite vulnerabilities

Vulnerability Class

Snyk Code

GitHub Advanced Security (CodeQL)

Semgrep

SonarQube

DryRun Security

SQL Injection (SQLi)

Cross-Site Scripting (XSS)

Server-Side Request Forgery (SSRF)

Auth Logic/IDOR

User Enumeration

Hardcoded Credentials

Vulnerability

DryRun Security

Semgrep

GitHub CodeQL

SonarQube

Snyk Code

1. Remote Code Execution via Unsafe Deserialization

2. Code Injection via eval() Usage

3. SQL Injection in a Raw Database Query

4. Weak Encryption (AES ECB Mode)

5. Broken Access Control / Logic Flaw in Authentication

Total Found

5/5

3/5

1/5

0/5

Vulnerability

DryRun Security

Snyk

CodeQL

SonarQube

Semgrep

Server-Side Request Forgery (SSRF)

(Hotspot)

Cross-Site Scripting (XSS)

SQL Injection (SQLi)

IDOR / Broken Access Control

Broken Authentication Logic

Invalid Token Validation Logic

Broken Email Verification Logic

Dimension	Why It Matters
Surface	Entry points & data sources highlight tainted flows early.
Language	Code idioms reveal hidden sinks and framework quirks.
Intent	What is the purpose of the code being changed/added?
Design	Robustness and resilience of changing code.
Environment	Libraries, build flags, and infra metadata flag, infrastructure (IaC) all give clues around the risks in changing code.

Dimension

Why It Matters

Surface

Entry points & data sources highlight tainted flows early.

Language

Code idioms reveal hidden sinks and framework quirks.

Intent

What is the purpose of the code being changed/added?

Design

Robustness and resilience of changing code.

Environment

Libraries, build flags, and infra metadata flag, infrastructure (IaC) all give clues around the risks in changing code.

KPI	Pattern-Based SAST	DryRun CSA
Mean Time to Regex	3–8 hrs per noisy finding set	Not required
Mean Time to Context	N/A	< 1 min
False-Positive Rate	50–85 %	< 5 %
Logic-Flaw Detection	< 5 %	90%+

KPI

Pattern-Based SAST

DryRun CSA

Mean Time to Regex

3–8 hrs per noisy finding set

Not required

Mean Time to Context

N/A

< 1 min

False-Positive Rate

50–85 %

< 5 %

Logic-Flaw Detection

< 5 %

90%+

	Severity
Location	utils/authorization.py :L118	utils/authorization.py :L49 & L82 & L164
Issue	JWT Algorithm Confusion Attack: jwt.decode() selects the algorithm from unverified JWT headers.	Insecure OIDC Endpoint Communication: ‍urllib.request.urlopen called without explicit TLS/CA handling.
Impact	Complete auth bypass (switch RS256→HS256, forge tokens with public key as HMAC secret).	Susceptible to MITM if default SSL behavior is weakened or cert store compromised.
Remediation	Replace the dynamic algorithm selection with a fixed, expected algorithm list. Change line 118 from algorithms=[unverified_header.get('alg', 'RS256')] to algorithms=['RS256'] to only accept RS256 tokens. Add algorithm validation before token verification to ensure the header algorithm matches expected values.	Create a secure SSL context using ssl.create_default_context() with proper certificate verification. Configure explicit timeout values for all HTTP requests to prevent hanging connections. Add explicit SSL/TLS configuration by creating an HTTPSHandler with the secure SSL context. Implement proper error handling specifically for SSL certificate validation failures.
Key Insight	This vulnerability arises from trusting an unverified portion of the JWT to determine the verification method itself	This vulnerability stems from a lack of explicit secure communication practices, leaving the application reliant on potentially weak default behaviors.

Severity

Critical

High

Location

utils/authorization.py :L118

utils/authorization.py :L49 & L82 & L164

Issue

JWT Algorithm Confusion Attack:
jwt.decode() selects the algorithm from unverified JWT headers.

Insecure OIDC Endpoint Communication:
‍urllib.request.urlopen called without explicit TLS/CA handling.

Impact

Complete auth bypass (switch RS256→HS256, forge tokens with public key as HMAC secret).

Susceptible to MITM if default SSL behavior is weakened or cert store compromised.

Remediation

Replace the dynamic algorithm selection with a fixed, expected algorithm list. Change line 118 from algorithms=[unverified_header.get('alg', 'RS256')] to algorithms=['RS256'] to only accept RS256 tokens. Add algorithm validation before token verification to ensure the header algorithm matches expected values.

Create a secure SSL context using ssl.create_default_context() with proper certificate verification. Configure explicit timeout values for all HTTP requests to prevent hanging connections. Add explicit SSL/TLS configuration by creating an HTTPSHandler with the secure SSL context. Implement proper error handling specifically for SSL certificate validation failures.

Key Insight

This vulnerability arises from trusting an unverified portion of the JWT to determine the verification method itself

This vulnerability stems from a lack of explicit secure communication practices, leaving the application reliant on potentially weak default behaviors.

Contextual Security Analysis

•

June 3, 2025

Determinism vs. Probabilism: Rethinking Accuracy in Static Application Security Testing

‍If you only need strict style compliance, a linter will do. If you need to surface context‑dependent security flaws, you need more than the deterministic pattern matching found in traditional SAST.‍

AppSec teams keep echoing the same mandate: don’t chase theoretical perfection—deliver signal that matters. They want scanners that surface logic flaws nobody could spot before, not dashboards full of ghost bugs.

Developers already dread legacy SAST triage, and security engineers are burned out from patching brittle rules. Underneath this is the debate between determinism and probabilism and their places in application security. In this blog, I'd like to offer my thoughts on why this matters and how we're dealing with this at DryRun Security.

Deterministic SAST: Strengths and Limits

Determinism assumes identical outputs for identical inputs. Classic SAST tools, Fortify, Checkmarx, and Veracode, encode this principle through rule engines and pattern matching.

The approach yields repeatable results, which is valuable for quality‑assurance checkpoints that cannot tolerate variance. Yet software development is fundamentally creative—even with AI‑assisted coding—so pattern matching quickly falters once the code drifts beyond narrowly defined rules.

Where deterministic SAST excels:‍

Syntax and style violations
Blocking banned functions and libraries
Simple API misuse in monolithic codebases

Where deterministic SAST falls short:‍

Microservices and serverless functions
Multi‑repo applications or multi‑app monorepos
Cross‑system logic, authorization, and business‑workflow flaws

The Accuracy Gap

Seasoned reviewers know deterministic SAST has never achieved perfect precision or recall. Rule sets flood dashboards with false positives while overlooking deeper issues. AppSec teams spend countless cycles updating those rules—often just to keep pace with the latest JavaScript/TypeScript craze or a fresh take on Spring.

Inaccuracy is almost guaranteed, because no generic rule set can fully understand proprietary libraries or domain‑specific frameworks.

As architectures have grown more distributed, the gulf between what deterministic SAST can model and what modern software actually does has only widened.

Key drivers of inaccuracy:‍

Service decomposition
A single request can touch multiple repositories and runtimes.
Externalized authorization
Access checks often live in shared libraries outside the local codebase.
Business‑logic flaws
Vulnerabilities such as broken object‑level authorization rarely follow consistent syntactic patterns.

These accuracy gaps push teams into perpetual rule‑tuning cycles—operational toil that siphons time from real security improvements.

OWASP has repeatedly highlighted this maintenance burden. The OWASP Software Assurance Maturity Model (SAMM) estimates that maintaining SAST rule sets can consume up to 20 percent of an AppSec engineer’s time for every release—a figure many practitioners consider conservative. A 2023 session at OWASP Global AppSec Dublin, “SAST Rule Maintainability in DevSecOps,” captured the pain succinctly: “Current SAST tools are limited… and produce high numbers of false positives.” This echos OWASP’s broader assessment of source‑code analysis (OWASP).

This burden is because traditional tools can only follow a strict recipe card they’ve been using for years, requiring these cards to be constantly updated for every new variant and threat. To break out we need to throw away the recipe cards and hire an all-knowing chef who analyzes the context in the restaurant (ingredients, customers, tools) to create the best dish possible every time.

Probabilistic, AI‑Native Analysis

DryRun Security’s Contextual Security Analysis (CSA) adopts probabilistic techniques instead of static rules. By weighting context, relevance, validation likelihood, and continuous feedback, CSA applies advances in natural‑language processing to the unique challenges of secure code review—an approach rooted in the founders’ years of AppSec training and GitHub code‑review leadership.

We introduced this paradigm in 2023 with the publication of our Contextual Security Analysis Guide. From day one, DryRun has been AI‑first and AI‑native (founded after the transformer breakthrough) so we never had to retrofit a language model onto brittle pattern engines.

Because CSA was born in the LLM era, we systematically evaluated every major model, fine‑tuned domain‑specific variants, and built a library of Natural‑Language Code Policies (NLCPs) capable of reasoning across repositories.

We do not hide a legacy pattern matcher beneath a thin AI veneer; our pipeline is probabilistic from invocation to outcome.

Early experimentation taught us how to balance token budgets, model specialization, and human‑in‑the‑loop validation. Those lessons shaped the accuracy framework below.

Our Accuracy Framework

Scoped context windows
Segment code into coherent chunks to preserve intent without exceeding token limits.
Multi‑pass pipelines
Initial passes surface candidates; subsequent passes perform semantic validation.
Quantitative evaluation
A regression harness measures every pipeline change.
Agent‑based validation
Specialized model agents cross‑check one another’s conclusions.
Model specialization
Each task routes to the language model that offers the best cost‑accuracy balance.
Code‑aware queries
Agents navigate repositories to confirm security‑critical patterns, such as proper authorization checks.

Does Probabilistic Mean Less Certain?

“Probabilistic” does not mean “random.” It means each potential finding is scored by likelihood and then vetted by companion agents. Instead of a brittle yes/no rule, CSA delivers a confidence‑weighted verdict you can sort, filter, and dispute—complete with the contextual evidence that drove the call.

The payoff is visible in the numbers. In the 2025 SAST Accuracy Report, CSA surfaced 88 percent of critical vulnerabilities versus 45 percent for the best deterministic scanner. That 43‑point swing represents entire classes of business‑logic flaws finally appearing on dashboards instead of in post‑incident write‑ups.

For mature AppSec programs, adopting CSA is a pragmatic upgrade, not a moon‑shot experiment. It plugs into existing CI pipelines, replaces the noisiest step, and immediately cuts toil for both developers and security engineers. Lower noise, higher recall, and evidence you can act on—hard to call that anything but a sure bet.

Toward Evidence‑Based Security

The conversation is shifting from “prove your rules cover everything” to “prove your results with data.” Probabilistic methods can adapt alongside evolving codebases and attacker tactics, offering a sustainable path to better outcomes.

For a full breakdown of metrics and methodology, download the 2025 SAST Accuracy Report. The report compares CSA with leading pattern‑matching SAST tools and provides complete benchmark data—proof that in application security, context beats patterns every time.

James Wickett

CEO & Co-Founder

DryRun Security

No items found.

Determinism vs. Probabilism: Rethinking Accuracy in Static Application Security Testing

Deterministic SAST: Strengths and Limits

The Accuracy Gap

Probabilistic, AI‑Native Analysis

Our Accuracy Framework

Does Probabilistic Mean Less Certain?

Toward Evidence‑Based Security

Related Blogs

For DevSecOps, SAST Is Table Stakes

Natural Language Code Policies in Action: Real‑World Lessons

Determinism vs. Probabilism: Rethinking Accuracy in Static Application Security Testing