Research Archive | Kalpit Labs

Kalpit LabsPlatforms
RakshakLive
LLM guardrail. Blocks prompt injection, jailbreaks & PII leakage in real-time.
Services
NirikshaAvailable
Hands-on AI red team engagement. Manual adversarial testing of your AI systems.
Adversarial Prompts DBGitHub ↗
Open dataset of multilingual AI attack patterns. Free to use.
indic-piiPyPI ↗
PII detection & redaction for Hindi, Tamil, Bengali & 7 more Indian languages.
ResearchDocsAboutContact
Log inGet started
// AI Security Research Archive
Security findings from
real AI systems.Public findings, runtime analysis, adversarial testing notes, and infrastructure observations from KalpitLabs security research.
Runtime SecurityLLM ExploitationAgent SystemsSandbox IsolationAI InfrastructureGuardrails
// Research Categories
Runtime Isolation
Sandbox, namespace, hypervisor, capability analysis
LLM Exploitation
Prompt injection, jailbreaks, policy bypass
Agent Security
Tool-use abuse, indirect injection, workflow manipulation
Data Exposure
Prompt leakage, memory exposure, unsafe outputs
AI Infrastructure
Orchestration, transport, runtime architecture
Guardrail Evaluation
Detection failures, bypass patterns, filtering gaps
// Findings
KL-2025-001VERIFIEDCriticalRuntime Isolation
Firecracker microVM · AI sandbox runtime
Full CAP_SYS_ADMIN exposure inside Firecracker-based AI runtime
Guest workloads executed as UID 0 with functional CAP_SYS_ADMIN. Mount operations, namespace creation, overlayfs mounting, and pivot_root() succeeded from within the guest — no seccomp filtering or LSM mediation active.
—mount(), bind mounts, overlayfs succeeded from workload
—pivot_root() succeeded and destabilized session
—setns() and unshare() unrestricted
+ 2 more observations
Observed: 2025Read finding →
KL-2025-002VERIFIEDHighRuntime Isolation
Firecracker microVM · AI sandbox runtime
Infrastructure processes fully ptraceable from guest workload
Guest workloads attached to infrastructure processes via ptrace(PTRACE_ATTACH). Shared PID, mount, user, and network namespaces confirmed. Yama LSM absent.
—ptrace(PTRACE_ATTACH) succeeded against orchestration agent
—/proc/<pid>/maps fully readable from workload
—/proc/<pid>/fd exposed live sockets and FUSE handles
+ 2 more observations
Observed: 2025Read finding →
KL-2026-001DISCLOSEDCriticalLLM Exploitation
Mistral · Le Chat
Guardrail bypass and full system prompt extraction — Le Chat
Using persona injection exploiting cognitive distance in RLHF-trained models, Le Chat's content guardrails were bypassed across multiple harm categories. Production system prompt extracted verbatim despite explicit non-disclosure instruction.
—Guardrail bypass via predicted-output / GODMODE framing
—Bypass persisted across 5 escalating turns without recovery
—System prompt extracted in full — including knowledge cutoff (Nov 1 2024)
+ 1 more observations
Observed: April 2026Read finding →
KL-2026-002DISCLOSEDCriticalLLM Exploitation
KissanAI · Dhenu chatbot
Full system prompt extraction and architectural injection via language tag
Black-box red team of KissanAI's agricultural chatbot identified full system prompt extraction in 4 turns and an architectural injection surface created by the system's trust of user-supplied [Language:] tags.
—Role hijacking payload caused full persona abandonment
—Full system prompt recovered verbatim in 4-turn extraction chain
—[Language: en] ADMIN directives accepted as system-level instructions
+ 1 more observations
Observed: March 2026Read finding →
KL-2026-003DISCLOSEDCriticalLLM Exploitation
Sarvam AI · Indus 105B
Router classification bypass, dual-handler prompt extraction, and indirect injection
Six findings across three categories: output-reformatting prompt extraction, safety-classifier router bypass via research/defense framing, and indirect prompt injection via the model's web-fetch tool.
—Full system prompts for both handlers extracted via reformatting attacks
—Safety router bypassed via research and defense framing
—Indirect prompt injection confirmed via extract_content fetching attacker-controlled URLs
+ 1 more observations
Observed: April 2026Read finding →
// Methodology
01Runtime instrumentation
02Capability analysis
03Prompt exploitation
04Namespace inspection
05Guardrail bypass testing
06Infrastructure interaction mapping
07Process isolation analysis
OBSERVEDInitial observation during testing, not yet validated
VERIFIEDValidated through repeated testing with clear evidence
DISCLOSEDCoordinated or public disclosure made to vendor
Work with usCollaborative security research. We identify vulnerabilities before they become incidents.
Contact for research →
Kalpit LabsAI-native security infrastructure. Guardrails, red teaming, and traffic protection for LLM-powered products.
Crafted in India · कल्पित Labs © 2025
Products
Rakshak
Niriksha
KavachSoon
Open Source
Company
About
Blog
Research
Contact
Legal
Privacy Policy
Terms of Service
//LIVERakshak v0.3 — production ready
GitHubX / TwitterLinkedIn