Tool	Accuracy of Findings	Detects Non-Pattern-Based Issues?	Coverage of SAST Findings	Speed of Scanning	Usability & Dev Experience
DryRun Security	Very high – caught multiple critical issues missed by others	Yes – context-based analysis, logic flaws & SSRF	Broad coverage of standard vulns, logic flaws, and extendable	Near real-time PR feedback	Clear PR comments, expandable policies with no scripting or coding (NLCP)
Snyk Code	High on well-known patterns (SQLi, XSS), but misses other categories	Limited – AI-based, focuses on recognized vulnerabilities	Good coverage of standard vulns; may miss SSRF or advanced auth logic issues	Fast, often near PR speed	Decent GitHub integration, but rules are a black box
GitHub Advanced Security (CodeQL)	Very high precision for known queries, low false positives	Partial – strong dataflow for known issues, needs custom queries	Good for SQLi and XSS but logic flaws require advanced CodeQL experience.	Moderate to slow (GitHub Action based)	Requires CodeQL expertise for custom logic
Semgrep	Medium, but there is a good community for adding rules	Primarily pattern-based with limited dataflow	Decent coverage with the right rules, can still miss advanced logic or SSRF	Fast scans	Has custom rules, but dev teams must maintain them
SonarQube	Low – misses serious issues in our testing	Limited – mostly pattern-based, code quality oriented	Basic coverage for standard vulns, many hotspots require manual review	Moderate, usually in CI	Dashboard-based approach, can pass “quality gate” despite real vulns

Tool

Accuracy of Findings

Detects Non-Pattern-Based Issues?

Coverage of SAST Findings

Speed of Scanning

Usability & Dev Experience

DryRun Security

Very high – caught multiple critical issues missed by others

Yes – context-based analysis, logic flaws & SSRF

Broad coverage of standard vulns, logic flaws, and extendable

Near real-time PR feedback

Clear PR comments, expandable policies with no scripting or coding (NLCP)

Snyk Code

High on well-known patterns (SQLi, XSS), but misses other categories

Limited – AI-based, focuses on recognized vulnerabilities

Good coverage of standard vulns; may miss SSRF or advanced auth logic issues

Fast, often near PR speed

Decent GitHub integration, but rules are a black box

GitHub Advanced Security (CodeQL)

Very high precision for known queries, low false positives

Partial – strong dataflow for known issues, needs custom queries

Good for SQLi and XSS but logic flaws require advanced CodeQL experience.

Moderate to slow (GitHub Action based)

Requires CodeQL expertise for custom logic

Semgrep

Medium, but there is a good community for adding rules

Primarily pattern-based with limited dataflow

Decent coverage with the right rules, can still miss advanced logic or SSRF

Fast scans

Has custom rules, but dev teams must maintain them

SonarQube

Low – misses serious issues in our testing

Limited – mostly pattern-based, code quality oriented

Basic coverage for standard vulns, many hotspots require manual review

Moderate, usually in CI

Dashboard-based approach, can pass “quality gate” despite real vulns

Vulnerability Class

Snyk (partial)

GitHub (CodeQL) (partial)

Semgrep

SonarQube

DryRun Security

SQL Injection

Cross-Site Scripting (XSS)

SSRF

Auth Flaw / IDOR

User Enumeration

Hardcoded Token

Tool	Accuracy of Findings	Detects Non-Pattern-Based Issues?	Coverage of C# Vulnerabilities	Scan Speed	Developer Experience
DryRun Security	Very high – caught all critical flaws missed by others	Yes – context-based analysis finds logic errors, auth flaws, etc.	Broad coverage of OWASP Top 10 vulns plus business logic issues	Near real-time (PR comment within seconds)	Clear single PR comment with detailed insights; no config or custom scripts needed
Snyk Code	High on known patterns (SQLi, XSS), but misses logic/flow bugs	Limited – focuses on recognizable vulnerability patterns	Good for standard vulns; may miss SSRF or auth logic issues	Fast (integrates into PR checks)	Decent GitHub integration, but rules are a black box (no easy customization)
GitHub Advanced Security (CodeQL)	Low - missed everything except SQL Injection	Mostly pattern-based	Low – only discovered SQL Injection	Slowest of all but finished in 1 minute	Concise annotation with a suggested fix and optional auto-remedation
Semgrep	Medium – finds common issues with community rules, some misses	Primarily pattern-based, limited data flow analysis	Decent coverage with the right rules; misses advanced logic flaws	Very fast (runs as lightweight CI)	Custom rules possible, but require maintenance and security expertise
SonarQube	Low – missed serious issues in our testing	Mostly pattern-based (code quality focus)	Basic coverage for known vulns; many issues flagged as “hotspots” require manual review	Moderate (runs in CI/CD pipeline)	Results in dashboard; risk of false sense of security if quality gate passes despite vulnerabilities

Tool

Accuracy of Findings

Detects Non-Pattern-Based Issues?

Coverage of C# Vulnerabilities

Scan Speed

Developer Experience

DryRun Security

Very high – caught all critical flaws missed by others

Yes – context-based analysis finds logic errors, auth flaws, etc.

Broad coverage of OWASP Top 10 vulns plus business logic issues

Near real-time (PR comment within seconds)

Clear single PR comment with detailed insights; no config or custom scripts needed

Snyk Code

High on known patterns (SQLi, XSS), but misses logic/flow bugs

Limited – focuses on recognizable vulnerability patterns

Good for standard vulns; may miss SSRF or auth logic issues

Fast (integrates into PR checks)

Decent GitHub integration, but rules are a black box (no easy customization)

GitHub Advanced Security (CodeQL)

Low - missed everything except SQL Injection

Mostly pattern-based

Low – only discovered SQL Injection

Slowest of all but finished in 1 minute

Concise annotation with a suggested fix and optional auto-remedation

Semgrep

Medium – finds common issues with community rules, some misses

Primarily pattern-based, limited data flow analysis

Decent coverage with the right rules; misses advanced logic flaws

Very fast (runs as lightweight CI)

Custom rules possible, but require maintenance and security expertise

SonarQube

Low – missed serious issues in our testing

Mostly pattern-based (code quality focus)

Basic coverage for known vulns; many issues flagged as “hotspots” require manual review

Moderate (runs in CI/CD pipeline)

Results in dashboard; risk of false sense of security if quality gate passes despite vulnerabilities

Vulnerability Class

Snyk Code

GitHub Advanced Security (CodeQL)

Semgrep

SonarQube

DryRun Security

SQL Injection (SQLi)

Cross-Site Scripting (XSS)

Server-Side Request Forgery (SSRF)

Auth Logic/IDOR

User Enumeration

Hardcoded Credentials

Vulnerability

DryRun Security

Semgrep

GitHub CodeQL

SonarQube

Snyk Code

1. Remote Code Execution via Unsafe Deserialization

2. Code Injection via eval() Usage

3. SQL Injection in a Raw Database Query

4. Weak Encryption (AES ECB Mode)

5. Broken Access Control / Logic Flaw in Authentication

Total Found

5/5

3/5

1/5

0/5

Vulnerability

DryRun Security

Snyk

CodeQL

SonarQube

Semgrep

Server-Side Request Forgery (SSRF)

(Hotspot)

Cross-Site Scripting (XSS)

SQL Injection (SQLi)

IDOR / Broken Access Control

Broken Authentication Logic

Invalid Token Validation Logic

Broken Email Verification Logic

Dimension	Why It Matters
Surface	Entry points & data sources highlight tainted flows early.
Language	Code idioms reveal hidden sinks and framework quirks.
Intent	What is the purpose of the code being changed/added?
Design	Robustness and resilience of changing code.
Environment	Libraries, build flags, and infra metadata flag, infrastructure (IaC) all give clues around the risks in changing code.

Dimension

Why It Matters

Surface

Entry points & data sources highlight tainted flows early.

Language

Code idioms reveal hidden sinks and framework quirks.

Intent

What is the purpose of the code being changed/added?

Design

Robustness and resilience of changing code.

Environment

Libraries, build flags, and infra metadata flag, infrastructure (IaC) all give clues around the risks in changing code.

KPI	Pattern-Based SAST	DryRun CSA
Mean Time to Regex	3–8 hrs per noisy finding set	Not required
Mean Time to Context	N/A	< 1 min
False-Positive Rate	50–85 %	< 5 %
Logic-Flaw Detection	< 5 %	90%+

KPI

Pattern-Based SAST

DryRun CSA

Mean Time to Regex

3–8 hrs per noisy finding set

Not required

Mean Time to Context

N/A

< 1 min

False-Positive Rate

50–85 %

< 5 %

Logic-Flaw Detection

< 5 %

90%+

	Severity
Location	utils/authorization.py :L118	utils/authorization.py :L49 & L82 & L164
Issue	JWT Algorithm Confusion Attack: jwt.decode() selects the algorithm from unverified JWT headers.	Insecure OIDC Endpoint Communication: ‍urllib.request.urlopen called without explicit TLS/CA handling.
Impact	Complete auth bypass (switch RS256→HS256, forge tokens with public key as HMAC secret).	Susceptible to MITM if default SSL behavior is weakened or cert store compromised.
Remediation	Replace the dynamic algorithm selection with a fixed, expected algorithm list. Change line 118 from algorithms=[unverified_header.get('alg', 'RS256')] to algorithms=['RS256'] to only accept RS256 tokens. Add algorithm validation before token verification to ensure the header algorithm matches expected values.	Create a secure SSL context using ssl.create_default_context() with proper certificate verification. Configure explicit timeout values for all HTTP requests to prevent hanging connections. Add explicit SSL/TLS configuration by creating an HTTPSHandler with the secure SSL context. Implement proper error handling specifically for SSL certificate validation failures.
Key Insight	This vulnerability arises from trusting an unverified portion of the JWT to determine the verification method itself	This vulnerability stems from a lack of explicit secure communication practices, leaving the application reliant on potentially weak default behaviors.

Severity

Critical

High

Location

utils/authorization.py :L118

utils/authorization.py :L49 & L82 & L164

Issue

JWT Algorithm Confusion Attack:
jwt.decode() selects the algorithm from unverified JWT headers.

Insecure OIDC Endpoint Communication:
‍urllib.request.urlopen called without explicit TLS/CA handling.

Impact

Complete auth bypass (switch RS256→HS256, forge tokens with public key as HMAC secret).

Susceptible to MITM if default SSL behavior is weakened or cert store compromised.

Remediation

Replace the dynamic algorithm selection with a fixed, expected algorithm list. Change line 118 from algorithms=[unverified_header.get('alg', 'RS256')] to algorithms=['RS256'] to only accept RS256 tokens. Add algorithm validation before token verification to ensure the header algorithm matches expected values.

Create a secure SSL context using ssl.create_default_context() with proper certificate verification. Configure explicit timeout values for all HTTP requests to prevent hanging connections. Add explicit SSL/TLS configuration by creating an HTTPSHandler with the secure SSL context. Implement proper error handling specifically for SSL certificate validation failures.

Key Insight

This vulnerability arises from trusting an unverified portion of the JWT to determine the verification method itself

This vulnerability stems from a lack of explicit secure communication practices, leaving the application reliant on potentially weak default behaviors.

AI in AppSec

•

December 16, 2025

7 Mistakes Teams Make When Building AI Applications

Hard Lessons from Taking LLMs Out of the Lab and Into Production

Every engineering team I talk to today is building with AI, whether they say it out loud or not. Sometimes it’s a flashy copilot. Sometimes it’s a quiet LLM stitched into search, support, or internal tooling. The form factor changes, but the pattern doesn’t.

Teams move fast, ship value, and then stumble into risks they did not realize they had created.

After spending the last year reviewing real implementations, incidents, and production architectures, the same mistakes show up again and again. Not because teams are careless, but because LLMs break assumptions we’ve relied on for decades.

This blog outlines the most common AI application security mistakes teams make when moving LLM-powered features into production, based on real-world implementations and incidents.

Here are the seven mistakes I see most often.

The seven mistakes, up front

Assuming the old application security threat model still applies
Treating the model as trusted compute
Encoding policy and business logic inside prompts
Giving agents too much authority too quickly
Treating RAG and vector stores as harmless infrastructure
Dismissing misinformation as a “quality” problem
Forgetting that tokens are a finite, attackable resource

Now let’s unpack why each of these shows up, and why they matter.

‍

1. Assuming the old threat model still applies

This is the root mistake that feeds all the others.

Teams assume that securing an AI feature is just an extension of securing a web app or API. Input validation, authentication, authorization, …. These all get done like we’ve always done them. However, once an LLM enters the system, the application stops behaving like a deterministic service.

Models reason over untrusted input, retrieve external data, synthesize outputs, and increasingly act through tools. That creates failure modes that do not map cleanly to classic injection or access control flaws and the new class of issues like prompt injection, indirect injection through retrieved content, excessive agent behavior, and cost-based abuse are not edge cases, they are structural risks.

If your threat model does not change when a LLM is introduced, your system will surprise you later.

‍

2. Treating the model as trusted compute

Many teams implicitly trust model output because it sounds reasonable.

That trust leaks into code and outputs are rendered directly into interfaces, they’re parsed into commands, they’re used to call internal services and they’re allowed to make commitments on behalf of the business.

This is backwards. An LLM is not a trusted execution environment, and it is closer to a probabilistic interpreter sitting on top of untrusted data. Every token that enters and leaves the model should be treated as tainted until validated.

Structure, schemas, sanitization, and sandboxing are not optional. If the model can influence behavior outside itself, guardrails must exist in code and infrastructure, not in natural language instructions.

‍

3. Encoding policy and logic inside prompts

This mistake feels efficient right up until it fails.

Teams encode access rules, safety constraints, and business logic directly into system prompts. It works in early prototypes, but then prompts leak, or they get logged, extracted, copied into debugging tools, or surfaced through indirect injection.

Once policy lives in a prompt, it is no longer enforceable. Prompts are not versioned policy engines, and they should be thought of as suggestions. At DryRun, we have NLCPs (natural language code policies) that get enforced through our Custom Policy Agent. They are separate, enforceable policies.

The model should consume policy, not define it. If breaking a rule is possible because the model misunderstood or ignored instructions, that rule was never real.

‍

4. Giving agents too much authority too quickly

Agentic systems compress time. They also compress blast radius.

As soon as an LLM can call tools, mutate state, or interact with production systems, mistakes become operational. A single crafted input can trigger actions that would normally require human intent, context, and approval.

Teams often start permissive and promise to tighten controls later. Later usually arrives as an incident as users and attackers exploit authority in novel ways using LLM intelligence and tools.

Least privilege still applies, and so do explicit allowlists, step limits, approvals for high-impact actions, and bounded execution. If an agent can spend money, move data, or change records, you should assume it will eventually be confused, manipulated, or both! Setting bounds on your agents is crucial.

‍

5. Treating RAG and vector stores as harmless infrastructure

Retrieval systems feel safe because they look like search. They are not.

Vector stores are data systems with all the usual risks: misconfiguration, cross-tenant leakage, poisoning, and weak access controls. When they fail, they fail quietly. Answers degrade or skew long before anyone notices. Provenance, filtering, tenant isolation, and observability are table stakes, not advanced features.

RAG expands your data perimeter. If you don’t secure it like one, it could become an attack surface.

‍

6. Dismissing misinformation as a quality problem

Hallucination is often treated as an embarrassment rather than a liability.

That framing is outdated now that you own the output. If your system presents information as authoritative, you own the consequences. This takes your control of your data and puts it in the hand of the courts, regulators, and customers who don’t care that the model “made it up.”

High-impact outputs require grounding, citations, or refusal paths. Some require humans in the loop and now correctness is now part of security and product design, not an afterthought.

‍

7. Forgetting that tokens are a finite, attackable resource

LLMs introduce a new form of denial of service: resource exhaustion (e.g. money!).

Unbounded prompts, recursive agents, retries without backoff, and poorly designed loops can burn through budgets fast. Attackers are already exploring this and waiting for it to show up on provider dashboards will not save you in real time.

Tokens need limits and quotas and consider agents need timeboxed execution windows and circuit breakers.

The pattern behind the pattern

All seven mistakes trace back to the same assumption: that LLMs are magic instead of machinery.

Secure AI systems are built the same way secure systems have always been built: clear boundaries, least privilege, explicit policy, strong defaults, and controls outside the component you don’t fully trust.

If you are building with AI, the risks are already in your product. The only real decision left is whether you design for them intentionally or discover them the hard way.

‍

We built a whitepaper that maps each of these risks to a reference architecture with controls, tools, and real incident examples. Get the full implementation guide here.

Want to see how DryRun Security catches these issues directly in your codebase before production? Let’s chat!

James Wickett

CEO & Co-Founder

DryRun Security

No items found.

7 Mistakes Teams Make When Building AI Applications

Hard Lessons from Taking LLMs Out of the Lab and Into Production

The seven mistakes, up front

1. Assuming the old threat model still applies

2. Treating the model as trusted compute

3. Encoding policy and logic inside prompts

4. Giving agents too much authority too quickly

5. Treating RAG and vector stores as harmless infrastructure

6. Dismissing misinformation as a quality problem

7. Forgetting that tokens are a finite, attackable resource

The pattern behind the pattern

Related Blogs

Disruption, Not Deletion: How AI Is Rewiring Application Security

Top 10 AI SAST Tools for 2026 and How to Enforce Code Policy in Agentic Coding Workflows