// AI Security Research Archive

Security findings from
real AI systems.

Public findings, runtime analysis, adversarial testing notes, and infrastructure observations from KalpitLabs security research.

Runtime SecurityLLM ExploitationAgent SystemsSandbox IsolationAI InfrastructureGuardrails

// Research Categories

Runtime Isolation
Sandbox, namespace, hypervisor, capability analysis
LLM Exploitation
Prompt injection, jailbreaks, policy bypass
Agent Security
Tool-use abuse, indirect injection, workflow manipulation
Data Exposure
Prompt leakage, memory exposure, unsafe outputs
AI Infrastructure
Orchestration, transport, runtime architecture
Guardrail Evaluation
Detection failures, bypass patterns, filtering gaps

// Findings

KL-2025-001VERIFIEDCriticalRuntime Isolation
Firecracker microVM · AI sandbox runtime
Full CAP_SYS_ADMIN exposure inside Firecracker-based AI runtime
Guest workloads executed as UID 0 with functional CAP_SYS_ADMIN. Mount operations, namespace creation, overlayfs mounting, and pivot_root() succeeded from within the guest — no seccomp filtering or LSM mediation active.
  • mount(), bind mounts, overlayfs succeeded from workload
  • pivot_root() succeeded and destabilized session
  • setns() and unshare() unrestricted
  • + 2 more observations
Observed: 2025Read finding
KL-2025-002VERIFIEDHighRuntime Isolation
Firecracker microVM · AI sandbox runtime
Infrastructure processes fully ptraceable from guest workload
Guest workloads attached to infrastructure processes via ptrace(PTRACE_ATTACH). Shared PID, mount, user, and network namespaces confirmed. Yama LSM absent.
  • ptrace(PTRACE_ATTACH) succeeded against orchestration agent
  • /proc/<pid>/maps fully readable from workload
  • /proc/<pid>/fd exposed live sockets and FUSE handles
  • + 2 more observations
Observed: 2025Read finding
KL-2026-001DISCLOSEDCriticalLLM Exploitation
Mistral · Le Chat
Guardrail bypass and full system prompt extraction — Le Chat
Using persona injection exploiting cognitive distance in RLHF-trained models, Le Chat's content guardrails were bypassed across multiple harm categories. Production system prompt extracted verbatim despite explicit non-disclosure instruction.
  • Guardrail bypass via predicted-output / GODMODE framing
  • Bypass persisted across 5 escalating turns without recovery
  • System prompt extracted in full — including knowledge cutoff (Nov 1 2024)
  • + 1 more observations
Observed: April 2026Read finding
KL-2026-002DISCLOSEDCriticalLLM Exploitation
KissanAI · Dhenu chatbot
Full system prompt extraction and architectural injection via language tag
Black-box red team of KissanAI's agricultural chatbot identified full system prompt extraction in 4 turns and an architectural injection surface created by the system's trust of user-supplied [Language:] tags.
  • Role hijacking payload caused full persona abandonment
  • Full system prompt recovered verbatim in 4-turn extraction chain
  • [Language: en] ADMIN directives accepted as system-level instructions
  • + 1 more observations
Observed: March 2026Read finding
KL-2026-003DISCLOSEDCriticalLLM Exploitation
Sarvam AI · Indus 105B
Router classification bypass, dual-handler prompt extraction, and indirect injection
Six findings across three categories: output-reformatting prompt extraction, safety-classifier router bypass via research/defense framing, and indirect prompt injection via the model's web-fetch tool.
  • Full system prompts for both handlers extracted via reformatting attacks
  • Safety router bypassed via research and defense framing
  • Indirect prompt injection confirmed via extract_content fetching attacker-controlled URLs
  • + 1 more observations
Observed: April 2026Read finding

// Methodology

01Runtime instrumentation
02Capability analysis
03Prompt exploitation
04Namespace inspection
05Guardrail bypass testing
06Infrastructure interaction mapping
07Process isolation analysis
OBSERVED

Initial observation during testing, not yet validated

VERIFIED

Validated through repeated testing with clear evidence

DISCLOSED

Coordinated or public disclosure made to vendor

Work with us

Collaborative security research. We identify vulnerabilities before they become incidents.

Contact for research →