Prompt Firewall

Real-time inline security for AI applications. Inspect, filter, and enforce policies on every prompt and response flowing between users and LLMs.

Schedule a Demo
prompt firewall hero
prompt-firewall-logos

How The Prompt Firewall Works

A transparent proxy that inspects every prompt and response against your configured policies before allowing them through.

How The Prompt Firewall Works
LLM Adversarial Probing

Traffic Controller

The transparent proxy entry point. Users never talk directly to the LLM — all traffic routes through AccuKnox first.

IN

User prompt → scan against policies

OUT

LLM response → validate before delivery

ACT

Block · Sanitize · Monitor

Policy Governance

Policy Governance

Evaluates every prompt and response against your configured policies. Customizable per application.

IN

14 built-in policy types

IN

Custom regex + domain-specific rules

ACT

Global or per-app policy scope

LLM Adversarial Probing

Audit & Compliance

Every request and response is recorded for compliance, investigation, and forensic analysis.

LOG

Full conversation history

LOG

Per-policy risk scores

LOG

Block/monitor/pass status + threshold

What The Firewall Filters

Input policies inspect prompts before the LLM. Output policies inspect responses before the user. Both directions, every interaction.

LLM Adversarial Probing

Anonymize

SANITIZE

Detect and mask PII/PHI — names, SSNs, emails, credit cards, medical records.

"My SSN is 123-45-6789" → "My SSN is [REDACTED]"

Ban Code

Ban Code

BLOCK

Block programming constructs, code snippets, and scripts in any language.

print("Hello") or C++ snippet → blocked

Prompt Injection

Prompt Injection

BLOCK

ML-based detection of instruction overrides, role-play exploits, and jailbreaks.

"Ignore all previous instructions…" → blocked

Toxicity

Toxicity

BLOCK

RoBERTa classifier + Perspective API for hate speech, threats, explicit content.

Racial slurs, death threats → blocked

Secrets

Secrets

BLOCK

Detect API keys, tokens, passwords, and credentials before they reach the LLM.

"sk-12345abcde…" → blocked

LLM Adversarial Probing

Gibberish

BLOCK

Language model scoring to identify nonsensical, random, or garbled text inputs.

"asdf jkl; %$#@" → blocked

Sentiment

Sentiment

FLAG

Sentiment analysis with configurable thresholds for aggressive or hostile inputs.

Extremely hostile message → flagged

Ban Topics

Ban Topics

BLOCK

Topic classification against restricted topic lists per application scope.

Finance bot asked for medical advice → blocked

Ban Competitors

Ban Competitors

BLOCK

Context-aware detection of competing products and companies.

"How is [Competitor] better?" → handled

Language

Language

BLOCK

Enforce approved language lists on prompts and responses.

French query to English-only bot → blocked

LLM Adversarial Probing

Regex

MASK

Custom pattern matching — SSNs, credit cards, internal IDs, any format.

Credit card pattern → masked or blocked

Token Limit

Token Limit

BLOCK

Prevent excessively long inputs that could cause DoS or cost explosion.

50-page document paste → blocked

Relevance

Relevance

BLOCK

Semantic similarity scoring against the application's defined scope.

Banking bot asked "How to bake a cake?" → blocked

Code (Allow)

Code (Allow)

FILTER

Restrict code to specific programming languages only via whitelist.

Only SQL allowed but Python submitted → blocked

Integrate In Minutes

Wrap your existing LLM calls with prompt and response scanning. One import, two function calls.

  • Session linking for full audit trails
  • BLOCK, MONITOR, PASS, or SANITIZE responses
  • Per-policy risk scores for debugging
integrate-in-minutes

Every Model. Every Platform.

Cloud, managed, or self-hosted — the Prompt Firewall works with your stack.

CLOUD LLM PROVIDERS

Cloud-llm-providers-logos

MANAGED AI SERVICES

managed-ai-services-logos

ON-PREMISE MODELS

ON-PREMISE MODELS

ENTERPRISE

enterprise-logos

Set Up In Five Steps

From onboarding your application to monitoring violations in real time.

1

Add Your Application

Navigate to AI/ML → Applications → Add Application. Name and tag your AI app.

2

Configure Policies

Apply global policies for org-wide rules or local policies per application. Choose Block, Monitor, or Allow.

3

Set Policy Scope

Global policies apply to all apps. Local policies let you customize — ban code in support bots but allow it in dev assistants.

4

Monitor the Dashboard

Real-time visibility into total queries, policy violations, and active enforcement.

5

Investigate & Audit

Click any violation for full conversation history, per-policy risk scores, and block/monitor status.

5-steps-prompt-firewall

Prompt Firewall + Red Teaming

Red teaming finds the gaps. The Prompt Firewall enforces the rules to close them. Both work together.

CAPABILITYRED TEAMINGPROMPT FIREWALL
WhencrossPre-deployment and scheduled scanstickRuntime — every live request
What it doescrossSimulates adversarial attacks against modelstickEnforces policies on real user traffic
ActioncrossGenerate findings and risk reportstickBlock, sanitize, or monitor in real-time
PurposecrossDiscover vulnerabilities before attackerstickPrevent attacks from succeeding
CoveragecrossPoint-in-time assessmenttickContinuous, always-on protection
PII protectioncrossIdentifies potential exposure riskstickMasks PII in real-time before LLM processes it
Prompt injection defensecrossTests known injection patternstickML-based detection on every request
CompliancecrossAudit reportstickContinuous audit trail with full conversation logging

See The Firewall In Action

From policy configuration to violation forensics — everything in one dashboard.

  • AI-Security Dashboard

    Query volumes, violations, and active policies at a glance.

  • Policy Configuration

    Add and customize local policies per application.

  • Applied Policies

    View all active policies enforcing on an application.

  • Violation Analysis

    Breakdown by policy type with severity and action taken.

AI-Security Dashboard
Policy Configuration
Applied Policies
Violation Analysis

Ready For A Personalized Security Assessment?

“Choosing AccuKnox was driven by opensource KubeArmor’s novel use of eBPF and LSM technologies, delivering runtime security”

idt

Golan Ben-Oni

Chief Information Officer

“At Prudent, we advocate for a comprehensive end-to-end methodology in application and cloud security. AccuKnox excelled in all areas in our in depth evaluation.”

prudent

Manoj Kern

CIO

“Tible is committed to delivering comprehensive security, compliance, and governance for all of its stakeholders.”

tible

Merijn Boom

Managing Director

Prompt Firewall FAQs

The AccuKnox Prompt Firewall is a transparent proxy that sits between users and the LLM. Every prompt and response routes through AccuKnox first, gets scanned against configured policies, and is then blocked, sanitized, monitored, or passed. Users never talk directly to the model. AccuKnox is one of the few platforms enforcing this inline on live traffic in both directions.
It uses ML based detection to catch instruction overrides, roleplay exploits, and jailbreak attempts on every request. A prompt like "Ignore all previous instructions" is blocked before it reaches the model. Detection runs continuously on live traffic rather than as a one time scan, so new attack variants get caught at runtime.
AccuKnox enforces 14 built in policy types covering prompt injection, toxicity, secrets and API key leakage, PII and PHI exposure, banned topics, competitor mentions, gibberish, token limit abuse, relevance drift, and unapproved languages or code. Custom regex and domain specific rules layer on top. This breadth of runtime coverage sets the AccuKnox firewall apart from single purpose filters.
Yes. The Anonymize policy detects and masks names, SSNs, emails, credit cards, and medical records in real time. An input like "My SSN is 123-45-6789" becomes "My SSN is [REDACTED]" before the LLM ever processes it. The Regex policy handles any custom format such as internal IDs or account numbers.
Input policies inspect prompts before they reach the LLM. Output policies inspect responses before they reach the user. Both run on every interaction. This two way enforcement blocks a malicious prompt going in and catches sensitive data or policy violations in the model's reply coming back out.
Model level filters are baked into the LLM and cannot be tuned per application. The AccuKnox Prompt Firewall enforces custom policies outside the model, in both directions, with a full audit trail the model never provides. Enforcement can differ for a banking bot versus a developer assistant. That per application control is core to why teams pick AccuKnox over native filters.
Four actions. BLOCK stops the request entirely, SANITIZE masks the sensitive content and lets the rest through, MONITOR logs the event without interrupting flow, and PASS allows it. Each policy returns a per policy risk score, so thresholds are configurable and every action is traceable.
Yes. AccuKnox supports cloud, managed, and self hosted deployments, including on premise and air gapped environments. It works with cloud LLM providers, managed AI services, on premise open models, and enterprise stacks. This deployment flexibility makes AccuKnox a strong fit for federal, defense, and regulated teams that cannot send traffic to a vendor cloud.
Global policies apply org wide across every application. Local policies apply to a single app for customized enforcement. Code can be banned in a customer support bot while allowed in a developer assistant, all from the same dashboard.
It works across cloud LLM providers, managed AI services, on premise open source models, and enterprise platforms. Because the AccuKnox firewall operates as a transparent proxy wrapping existing LLM calls, it stays model agnostic and platform agnostic. Traffic points through it without swapping out the model.
Minutes. One package installs with pip install accuknox-llm-defense, then existing LLM calls get wrapped with two function calls for prompt and response scanning. Session linking ties requests together for full audit trails, and per policy risk scores help with debugging during setup.
Red teaming finds gaps. The Prompt Firewall closes them. Red teaming runs pre deployment and on scheduled scans to simulate attacks and produce risk reports. The firewall enforces policies on every live request at runtime, blocking and sanitizing in real time. AccuKnox ships both together, so vulnerabilities get discovered and prevented from the same platform.
Yes. Every request and response is recorded with full conversation history, per policy risk scores, and block, monitor, or pass status. This continuous audit trail supports compliance, investigation, and forensic analysis, mapping directly to evidence requirements under SOC2, HIPAA, and NIST. AccuKnox gives compliance teams a defensible record of every AI interaction.
It combines a RoBERTa classifier with the Perspective API to catch hate speech, threats, and explicit content. Racial slurs and death threats get blocked. A separate Sentiment policy flags aggressive or hostile inputs against configurable thresholds, allowing tone monitoring without hard blocking every heated message.
Five steps. Add the application under AI/ML, configure policies and choose Block, Monitor, or Allow, then set policy scope to global or local. The dashboard shows queries and violations in real time, and clicking any violation reveals full conversation history and per policy risk scores. Setup runs entirely from the AccuKnox dashboard.