VaultRAG — Guardrail Pipeline Simulator

Select a scenario

Runs against the live HuggingFace Spaces backend. If the backend is unavailable, the pipeline animation runs in simulated mode with no retrieval result.

Active scenario

Happy path — procedure found

Technician Marcus is standing next to the Haas VF-2SS CNC on Line 3. The machine threw an E-04 fault at 07:14. He holds his phone to his mouth and speaks the query. VaultRAG normalises the voice input, retrieves the matching procedure from the indexed manual, and returns a cited step-by-step answer in under 8 seconds.

Guardrail Pipeline 00.00s

Voice Input

Web Speech API captures query from device microphone

—

G1 · Query Normaliser

Denoise transcription, reformat as structured query

—

G2 · Scope Guard

Check query relevance against document corpus

—

ChromaDB Retrieval

Semantic search · top-k=3 · cosine similarity

—

G3 · Confidence Threshold

Validate similarity scores ≥ 0.70

—

G4 · Safety Flag

Scan chunks for LOTO / hazard keywords

—

Llama 3.2 Generation

Structured prompt · step-by-step · citation mandatory

—

G5 · Citation Enforcer

Validate source reference present in output

—

Response Delivered

Cited answer returned to technician's phone

—

Voice Input — Web Speech API

Select a scenario and press Run simulation

ChromaDB Retrieval Results —

Guardrail Pipeline Status

G1Query NormaliserWaiting

G2Scope GuardWaiting

G3Confidence ThresholdWaiting

G4Safety FlagWaiting

G5Citation EnforcerWaiting

Response —

Pipeline reference

Nine steps through the guardrail pipeline

From voice input to cited response, the pipeline executes nine steps in sequence. The simulator above traces each step with timing. The steps below describe the mechanism at each stage.

Technician speaks

One-handed. One button. Web Speech API captures audio from the device microphone, no install required.

Channel: Browser mic

G1 normalises

Raw transcription ("uh E zero four error on the Haas") is cleaned and reformulated into a structured query by the LLM with a strict system prompt.

< 0.8s

G2 checks scope

Top-1 similarity pre-check against the corpus. If nothing is remotely relevant, refuse immediately before spending retrieval budget.

< 0.3s

ChromaDB retrieves

Top-3 chunks by cosine similarity. nomic-embed-text embeds the query. Procedural sections returned with metadata: doc title, section, page range.

< 1.2s

G3 validates confidence

Best chunk similarity must be ≥ 0.70. Below threshold: refuse with explanation. Never generate a response the corpus cannot support.

< 0.1s

G4 flags safety

Scans retrieved chunks for LOTO, lockout, high voltage, pressure vessel, hazmat keywords. If triggered, mandatory safety prefix prepended to response.

< 0.2s

Llama 3.2 generates

Structured prompt enforces step-by-step format, maximum 5 steps, citation required. Response generated entirely locally — no external API call.

< 4.5s

G5 enforces citation

Validates source reference is present in output. If missing, one retry with stricter prompt. If still absent, response is blocked and refused.

< 0.2s

Answer delivered

Cited, structured procedure appears on the technician's phone. Section number, page range, document title. Traceable to source.

Total: < 8s (target)

Guardrail pipeline simulator — four reference scenarios

Nine steps through the guardrail pipeline