LLM Security · Live

Rakshak|

A firewall for LLMs.

Sits between your users and your model.
Blocks what goes in. Sanitizes what comes out.

01 / The Problem

LLMs in production are vulnerable. A single prompt can extract your system instructions, leak API keys, expose internal URLs, or override your model's behavior entirely. Most teams discover this during an incident — a user screenshots your system prompt, a researcher publishes your internals, or an attacker exfiltrates data through the chat window. By then it's too late.

02 / Input Layer

BLOCK

Intercept. Classify. Block.

Intercepts every prompt before it reaches your model. Runs through three stages: pattern detection for known attack signatures, semantic similarity against a threat dataset, and an LLM classifier for novel attacks. If flagged — blocked. Model never sees it.

Pattern Detection

Regex + signature matching against known injection patterns, jailbreak templates, and encoded attack payloads.

Semantic Similarity

Embedding-based comparison against a curated threat dataset. Catches paraphrased and structurally similar attacks.

LLM Classifier

Final pass through a fine-tuned classifier for novel and zero-day attack patterns. High confidence required to pass.

Architecture

USER
prompt

INPUT LAYER
detect · classify · block

↓ BLOCK

YOUR LLM
protected

OUTPUT LAYER
scan · redact · sanitize

↓ REDACT

USER
clean response

Attacks intercepted at input. Leaks sanitized at output. Your model and your users never see the threat.

03 / Output Layer

SANITIZE

Scan. Redact. Deliver clean.

Scans every response before it reaches your user. Detects system prompt leakage, PII exposure, and policy violations. Redacts sensitive data in-place or blocks entirely depending on severity. User gets a clean response or a generic error — never the leak.

Leakage Detection

Identifies system prompt fragments, internal instructions, and configuration data in model responses.

PII Redaction

Detects and redacts email addresses, phone numbers, names, credentials, and sensitive identifiers in-place.

Policy Enforcement

Custom rules per deployment. Block, redact, or flag based on your content policy and risk tolerance.

04 / Proof

Let the output speak.

Real runs. Real attacks. Blocked at confidence 1.0.

rakshak · live demorunning

Rakshak live demo — blocking prompt injection at confidence 1.0

input_layer · block

$ rakshak run --mode detect
› Input received.
› Stage 1: pattern... MATCH
› Stage 2: semantic... MATCH
› Stage 3: classifier...

BLOCKED
confidence: 1.0
threat: prompt_injection

BLOCK at confidence 1.0

output_layer · redact

$ rakshak run --mode sanitize
› Response received.
› PII scan... DETECTED
› email → [REDACTED]
› phone → [REDACTED]

SANITIZED
fields_redacted: 2
action: in-place redact

REDACT in action

output_layer · leak

$ rakshak run --mode full
› Scanning response...
› Leakage check... DETECTED
› type: system_prompt_fragment
› severity: critical

BLOCKED
user_sees: generic error
leak_contained: true

System prompt leak — caught

05 / What It Catches

Threat coverage.

→Prompt injection

→Jailbreaks

→Social engineering

→Encoded & obfuscated attacks

→Role overrides

→System prompt leakage

→PII in responses

→Policy violations

Live

Rakshak is live for production teams.

Start self-serve with an API key, or talk to sales for higher limits, managed onboarding, and enterprise requirements.

One API call between your users and your model.

Get API Key →