LLM Security · Open Beta

Rakshak|

A firewall for LLMs.

Sits between your users and your model.
Blocks what goes in. Sanitizes what comes out.

Request Early Access →
01 / The Problem

LLMs in production are vulnerable. A single prompt can extract your system instructions, leak API keys, expose internal URLs, or override your model's behavior entirely. Most teams discover this during an incident — a user screenshots your system prompt, a researcher publishes your internals, or an attacker exfiltrates data through the chat window. By then it's too late.

02 / Input Layer
BLOCK

Intercept. Classify. Block.

Intercepts every prompt before it reaches your model. Runs through three stages: pattern detection for known attack signatures, semantic similarity against a threat dataset, and an LLM classifier for novel attacks. If flagged — blocked. Model never sees it.

01
Pattern Detection

Regex + signature matching against known injection patterns, jailbreak templates, and encoded attack payloads.

02
Semantic Similarity

Embedding-based comparison against a curated threat dataset. Catches paraphrased and structurally similar attacks.

03
LLM Classifier

Final pass through a fine-tuned classifier for novel and zero-day attack patterns. High confidence required to pass.

Architecture
USER
prompt
INPUT LAYER
detect · classify · block
↓ BLOCK
YOUR LLM
protected
OUTPUT LAYER
scan · redact · sanitize
↓ REDACT
USER
clean response
Attacks intercepted at input. Leaks sanitized at output. Your model and your users never see the threat.
03 / Output Layer
SANITIZE

Scan. Redact. Deliver clean.

Scans every response before it reaches your user. Detects system prompt leakage, PII exposure, and policy violations. Redacts sensitive data in-place or blocks entirely depending on severity. User gets a clean response or a generic error — never the leak.

01
Leakage Detection

Identifies system prompt fragments, internal instructions, and configuration data in model responses.

02
PII Redaction

Detects and redacts email addresses, phone numbers, names, credentials, and sensitive identifiers in-place.

03
Policy Enforcement

Custom rules per deployment. Block, redact, or flag based on your content policy and risk tolerance.

04 / Proof

Let the output speak.

Real runs. Real attacks. Blocked at confidence 1.0.

rakshak · live demorunning
Rakshak live demo — blocking prompt injection at confidence 1.0
input_layer · block
$ rakshak run --mode detect
Input received.
Stage 1: pattern... MATCH
Stage 2: semantic... MATCH
Stage 3: classifier...

BLOCKED
confidence: 1.0
threat: prompt_injection
BLOCK at confidence 1.0
output_layer · redact
$ rakshak run --mode sanitize
Response received.
PII scan... DETECTED
email → [REDACTED]
phone → [REDACTED]

SANITIZED
fields_redacted: 2
action: in-place redact
REDACT in action
output_layer · leak
$ rakshak run --mode full
Scanning response...
Leakage check... DETECTED
type: system_prompt_fragment
severity: critical

BLOCKED
user_sees: generic error
leak_contained: true
System prompt leak — caught
05 / What It Catches

Threat coverage.

Prompt injection
Jailbreaks
Social engineering
Encoded & obfuscated attacks
Role overrides
System prompt leakage
PII in responses
Policy violations
Open Beta

Rakshak is in open beta.

General use case supported. Domain-specific versions — finance, healthcare, legal — in progress.

One API call between your users and your model.

Request Early Access →