VaultRAG — Roadmap

01 — Version Overview

Three stages, one coherent path

Each version solves a specific problem and sets up the next. v0.1 proves the core loop works. v1.0 makes it production-ready. The future vision extends the platform to adjacent use cases without re-solving trust or safety.

v0.1

Current · MVP

Core loop validated

Voice query → 5-layer guardrail pipeline → cited procedure response. On-prem deployment model. Single document corpus. Single concurrent user. Deployed on HuggingFace Spaces for portfolio demonstration purposes — the production deployment model is documented in page-03.

Portfolio demo · Built

v1.0

Production path

Multi-user, multi-corpus, auditable

Department-level document namespacing. Role-based query permissions. Immutable audit log of every query and response. LlamaGuard integration on the safety path. Whisper local for fully air-gapped voice. Word and PowerPoint ingestion. Production hardware with GPU acceleration.

Designed · Not built

v2+

Platform vision

Governed intelligence utility

VaultRAG as a platform abstraction layer: multiple secure assistants (maintenance, quality, safety, engineering) sharing a common on-prem trust infrastructure without re-solving the sovereignty and guardrail problems for each new use case. Proactive alerts when documents are updated. Multi-site federation.

Vision · Architecture sketched

02 — MVP Scope

v0.1 feature inventory

The scoping principle for v0.1 was: every component must either make the core loop function or make a guardrail fire. Anything that did neither was deferred. This table is the complete inventory.

VaultRAG v0.1 — complete feature scope

Feature	v0.1 Status	Decision rationale
Voice input via Web Speech API	In v0.1	Core user interface requirement. A technician on the floor cannot type reliably. Without voice, the system fails its primary user scenario.
Text fallback input	In v0.1	Voice recognition fails in extreme noise environments (>100dB). Keyboard fallback ensures the system never leaves a user without recourse.
PDF and TXT document ingestion	In v0.1	The two dominant formats for manufacturing documentation. Everything else is roadmap.
Procedural section chunking	In v0.1	Safety-critical requirement. Token-window chunking splits procedures and produces incomplete — potentially dangerous — responses. See ADR-008.
5-layer guardrail pipeline (G1–G5)	In v0.1	Safety requirement in a manufacturing context. All five layers implemented as custom prompt-based logic. Not optional for the core architecture validation.
ChromaDB vector store	In v0.1	Core retrieval infrastructure. The RAG pipeline cannot function without a vector store.
Llama 3.2 3B via Ollama	In v0.1	Local LLM is the architectural invariant. Ollama is the simplest local serving solution. 3B chosen for demo environment RAM constraints. See ADR-001.
Mobile-responsive chat UI	In v0.1	The primary user is on a phone on the factory floor. A desktop-first UI fails the primary persona entirely.
Single document corpus	In v0.1	Multi-corpus management adds indexing complexity, namespace isolation, and corpus selection logic that are v1.0 concerns. MVP validates the single-corpus pipeline first.
User authentication / login	Deferred v1.0	Authentication is an operational requirement but not a pipeline validation requirement. It adds session management, token handling, and role logic that are not needed to validate the core RAG loop. Documented as a production gap below.
Department-level document namespacing	Deferred v1.0	Role-based corpus access requires user identity, which requires authentication. The dependency chain means this follows auth, not precedes it.
Immutable audit log	Deferred v1.0	Essential for ISO 9001 compliance and internal governance. Not required to validate that the pipeline retrieves and cites correctly. Documented as a production gap below.
LlamaGuard integration	Deferred v1.0	RAM-infeasible alongside Llama 3.2 3B on free demo tier. Custom G4 Safety Flag provides the MVP safety layer. LlamaGuard activates on G4 trigger in v1.0 — conditional, not always-on. See ADR-003.
Word / PowerPoint ingestion	Deferred v1.0	Many SOPs live in .docx format. python-docx integration is straightforward but adds a dependency and format-specific chunking logic. Deferred to keep MVP ingestion pipeline clean.
Whisper local voice (air-gapped)	Deferred v1.0	Web Speech API uses device STT (Google/Apple backend). For fully air-gapped facilities, local Whisper is required. Added RAM overhead makes it a production concern, not MVP. Documented as a production gap below. See ADR-005.
Text-to-speech response readback	Deferred v1.0	Logical extension of the voice-first design. Web Speech Synthesis API makes this a one-line addition. Deferred to keep MVP UI surface minimal.
Query history and session memory	Deferred v1.0	Persistent conversation history requires session storage and raises data retention questions. Each v0.1 query is stateless by design — simpler, safer, easier to audit.
Multi-site federation	v2+ vision	Multiple facility instances sharing a common document standard but isolated corpora. Requires a federation layer and inter-site governance model. Platform-level concern.
Proactive document update alerts	v2+ vision	Notify technicians when a procedure they previously queried has been updated. Requires query history (v1.0), document versioning, and a notification pipeline.
Custom disclaimer injection per department	v2+ vision	Prepend department-specific legal or compliance text to responses. Requires department namespacing and user identity — both v1.0 foundations.

03 — Known Production Gaps — v0.1

Known production gaps — v0.1

These are missing requirements in the current build — not deliberate architectural deferrals. They represent real limitations that would need to be addressed before v0.1 could be operated in a production manufacturing environment. Each has a documented v1.0 mitigation.

Data sovereignty scope

In the production deployment model, no data crosses the facility boundary. The demo prototype uses HuggingFace Spaces for inference and Web Speech API for voice — both are documented exceptions. v1.0 mitigations (local Whisper, on-prem server) restore the sovereignty invariant. The architecture is designed so the mitigations are additive — the application code does not change.

No user authentication — open to any device on the plant network

v0.1 has no login, session management, or identity layer. Any device connected to the plant WiFi can query the system and retrieve procedural documentation, including safety-critical LOTO procedures. There is no mechanism to restrict access by role, department, or individual user.

Security gap

v1.0 mitigation: JWT-based authentication. Role-based access control. Department-level namespace scoping tied to authenticated user identity.

v1.0 · Priority 1

No audit log — queries and responses are not persisted

v0.1 has no record of what was queried, what was retrieved, what guardrails fired, or what response was delivered. In a regulated manufacturing environment operating under ISO 9001, the absence of an audit trail means the system cannot be included in a management review and AI-assisted decisions cannot be traced.

Compliance gap

v1.0 mitigation: Append-only audit log with timestamp, user ID, normalised query, chunk references, guardrail outcomes, and response hash. Written before response delivery. Cannot be modified or deleted.

v1.0 · Priority 3

Single document namespace — no department-level isolation

v0.1 indexes all documents into a single ChromaDB collection. Any authenticated user (once auth is added) would have access to all indexed documents regardless of department, classification, or need-to-know. A maintenance technician would be able to query engineering drawings or quality management records they are not authorised to access.

Data integrity gap

v1.0 mitigation: ChromaDB collections mapped to organisational namespaces. Query routing scoped to the authenticated user's permitted collections. Depends on auth (Priority 1).

v1.0 · Priority 2

Web Speech API routes audio externally — violates air-gap in strict deployments

The Web Speech API uses the device's native speech recognition engine, which in practice routes audio to Google or Apple backends for transcription. In facilities operating under strict air-gap requirements — ITAR-controlled environments, defence supply chain, certain ISO 27001 scopes — this is a data sovereignty violation even if the inference layer is on-prem.

Sovereignty gap

v1.0 mitigation: Local Whisper medium (244M parameters) in a dedicated Docker container. Audio routed to the on-prem server, transcribed locally, no external API call. GPU-accelerated: <1s for a 5-second query.

v1.0 · Priority 5

04 — v1.0 Production Path

v1.0 production path — eight additions

These are not features for their own sake. Each v1.0 addition resolves a specific gap between the portfolio prototype and a system a manufacturer would trust in production. The ordering is intentional — auth enables namespacing; namespacing enables audit; audit enables compliance.

v1.0 · Priority 1

User authentication and role-based access

JWT-based authentication. Each user belongs to one or more departments. Queries are scoped to the user's permitted corpus namespaces. A maintenance technician cannot query the Legal department's documents. An engineering manager can query both.

Why first: Without auth, the system cannot enforce data governance at the user level. Every subsequent v1.0 feature depends on knowing who is asking.

v1.0 · Priority 2

Department-level document namespacing

ChromaDB collections mapped to organisational namespaces. Maintenance docs, quality SOPs, engineering drawings, and safety procedures indexed in isolated collections. Query routing respects the authenticated user's permitted namespaces.

Why now: A single flat corpus is a demo simplification. Production facilities have strict need-to-know controls on document access — especially in ISO-audited environments.

v1.0 · Priority 3

Immutable audit log — every query, every response

Append-only log with timestamp, user ID, normalised query, retrieved chunk references, guardrail outcomes, and response hash. Written before the response is delivered. Cannot be modified or deleted. Reviewed in ISO 9001 management reviews.

Why now: Regulated manufacturers require evidence that every AI-assisted decision is traceable. Without an audit trail, VaultRAG cannot be used in an ISO-audited environment. This is a compliance gate, not a nice-to-have.

v1.0 · Priority 4

LlamaGuard on G4 trigger — conditional, not always-on

LlamaGuard 2 activates only when the custom G4 Safety Flag fires — adding a safety model validation layer for the highest-stakes queries without the always-on RAM overhead. G4 acts as a triage gate; LlamaGuard acts as a secondary validator on G4-triggered queries only.

Why now: Custom keyword-based G4 has false-negative risk for novel hazard phrasing not in the keyword list. LlamaGuard adds adversarial robustness. The conditional activation pattern solves the RAM constraint identified in ADR-003.

v1.0 · Priority 5

Whisper local STT for air-gapped facilities

Replace Web Speech API with locally deployed Whisper medium (244M parameters) for facilities that prohibit any device-level cloud connection. Whisper runs in a separate Docker container, receives audio via internal HTTP, returns transcription with no external calls. GPU-accelerated: <1s for a 5-second query.

Why now: Web Speech API routes audio to Google or Apple STT backends — this violates strict air-gapped requirements. Facilities operating under classified or defence supply chain constraints need local STT. v0.1 documents this as a known sovereignty gap.

v1.0 · Priority 6

Word and PowerPoint document ingestion

python-docx for .docx parsing with heading-based chunking. python-pptx for .pptx parsing with slide-title chunking. Many manufacturing SOPs are authored in Word; many training materials in PowerPoint. Extending ingestion beyond PDF/TXT captures the majority of the real-world corpus.

Why now: The MVP corpus is limited to PDF and TXT. A real facility's documentation estate is typically 60–70% Word documents. Without .docx ingestion, significant portions of the knowledge base are inaccessible.

v1.0 · Priority 7

Text-to-speech response readback

Web Speech Synthesis API reads the response aloud after delivery. The technician holds their phone near their ear, speaks the query, and hears the procedure read step by step — fully hands-free. A single JavaScript call on top of the existing response flow.

Why now: The floor use case requires hands-free operation. Reading a response off a screen while one hand is on equipment is impractical. TTS completes the voice-first loop. It is deliberately not in v0.1 to keep the UI surface minimal and testable.

v1.0 · Priority 8

Document version management and re-indexing triggers

When a document is updated (new SOP revision, revised manual), the system detects the version change, re-indexes the affected chunks, and invalidates any cached responses referencing the old version. A document management webhook or scheduled check triggers re-ingestion automatically.

Why now: An ISO-controlled SOP that changes must be reflected immediately in the knowledge base. A system that answers from a superseded procedure is potentially more dangerous than no system at all. Version management is a production safety requirement.

06 — Deferred Architectural Decisions

Deferred architectural decisions

These are architectural questions that were considered for v0.1 and deliberately excluded — not because they are unimportant, but because they do not affect the core architectural validation. They are not production gaps; they are optimisations and enhancements that sit above the minimum viable architecture.

Hybrid retrieval — dense + sparse (BM25)

Pure semantic search (dense retrieval) can miss exact keyword matches for specific part numbers, error codes, and model identifiers — exactly the kind of query a maintenance technician makes. Hybrid retrieval combining ChromaDB cosine similarity with a BM25 sparse index would improve precision on these high-specificity queries.

Deferred because: Pure semantic retrieval is sufficient to validate the pipeline architecture. Hybrid retrieval is an optimisation — valuable, but it does not change the guardrail design, the deployment model, or the sovereignty invariant. Added complexity without added architectural signal at MVP stage.

v1.0 candidate

Query rewriting — HyDE (Hypothetical Document Embeddings)

HyDE generates a hypothetical answer to the query before retrieval and uses that hypothetical answer's embedding to search the corpus — improving retrieval quality for queries phrased very differently from the document vocabulary. Relevant for voice queries where the transcription may not match the terminology in the manual.

Deferred because: Adds an extra LLM call per query (generation before retrieval), increasing latency and complexity. G1 Query Normaliser addresses the vocabulary gap for v0.1. HyDE is the next step if retrieval quality proves insufficient at production scale.

v1.0 candidate

Re-ranking with a cross-encoder model

After top-k retrieval, a cross-encoder re-ranker evaluates each retrieved chunk in the context of the specific query and re-orders them by relevance — typically improving precision at rank 1, which is the chunk most likely to ground the response. ColBERT or a lightweight cross-encoder would be candidates.

Deferred because: Adds another model to the inference stack (additional RAM, additional latency). The procedural chunking strategy already improves retrieval precision by keeping complete procedures intact. Re-ranking is an optimisation for larger corpora where multiple procedures may appear relevant.

v2+ candidate

Multi-modal ingestion — diagrams, schematics, images

Manufacturing manuals contain critical information in diagrams, wiring schematics, and exploded-view drawings. A purely text-based RAG system cannot index this content. A multi-modal pipeline using a vision model (LLaVA, Phi-3 Vision) to generate textual descriptions of figures before indexing would capture this signal.

Deferred because: Requires a vision model alongside the text LLM — significant RAM increase. Figure description quality with local models is variable. The architecture is designed to accommodate this: the ingestion pipeline is modular, and a vision pre-processing step can be inserted before ChromaDB indexing without changing the retrieval or generation layers.

v2+ candidate

Giskard automated vulnerability scanning

Giskard is a testing framework for LLM applications — it automatically generates adversarial test cases (prompt injections, hallucination probes, bias tests) and evaluates the pipeline against them. Valuable for production QA but not a runtime guardrail.

Deferred because: Giskard belongs in the CI/CD pipeline, not in the inference path. It is a pre-deployment testing tool. It will be added to the GitHub Actions workflow before v1.0 release.

v1.0 CI/CD

Roadmap — v0.1 scope, v1.0 path, and deferred decisions

Three stages, one coherent path

v0.1 feature inventory

Known production gaps — v0.1

v1.0 production path — eight additions

How v0.1 becomes v1.0

Deferred architectural decisions