This page documents what v0.1 delivers, the known production gaps in the current build, the v1.0 path that addresses those gaps, and the architectural decisions deliberately deferred from MVP scope. The distinction between a production gap and a deliberate deferral is architecturally significant — both are documented here as separate categories.
Each version solves a specific problem and sets up the next. v0.1 proves the core loop works. v1.0 makes it production-ready. The future vision extends the platform to adjacent use cases without re-solving trust or safety.
Voice query → 5-layer guardrail pipeline → cited procedure response. On-prem deployment model. Single document corpus. Single concurrent user. Deployed on HuggingFace Spaces for portfolio demonstration purposes — the production deployment model is documented in page-03.
Portfolio demo · BuiltDepartment-level document namespacing. Role-based query permissions. Immutable audit log of every query and response. LlamaGuard integration on the safety path. Whisper local for fully air-gapped voice. Word and PowerPoint ingestion. Production hardware with GPU acceleration.
Designed · Not builtVaultRAG as a platform abstraction layer: multiple secure assistants (maintenance, quality, safety, engineering) sharing a common on-prem trust infrastructure without re-solving the sovereignty and guardrail problems for each new use case. Proactive alerts when documents are updated. Multi-site federation.
Vision · Architecture sketchedThe scoping principle for v0.1 was: every component must either make the core loop function or make a guardrail fire. Anything that did neither was deferred. This table is the complete inventory.
| Feature | v0.1 Status | Decision rationale |
|---|---|---|
| Voice input via Web Speech API | In v0.1 | Core user interface requirement. A technician on the floor cannot type reliably. Without voice, the system fails its primary user scenario. |
| Text fallback input | In v0.1 | Voice recognition fails in extreme noise environments (>100dB). Keyboard fallback ensures the system never leaves a user without recourse. |
| PDF and TXT document ingestion | In v0.1 | The two dominant formats for manufacturing documentation. Everything else is roadmap. |
| Procedural section chunking | In v0.1 | Safety-critical requirement. Token-window chunking splits procedures and produces incomplete — potentially dangerous — responses. See ADR-008. |
| 5-layer guardrail pipeline (G1–G5) | In v0.1 | Safety requirement in a manufacturing context. All five layers implemented as custom prompt-based logic. Not optional for the core architecture validation. |
| ChromaDB vector store | In v0.1 | Core retrieval infrastructure. The RAG pipeline cannot function without a vector store. |
| Llama 3.2 3B via Ollama | In v0.1 | Local LLM is the architectural invariant. Ollama is the simplest local serving solution. 3B chosen for demo environment RAM constraints. See ADR-001. |
| Mobile-responsive chat UI | In v0.1 | The primary user is on a phone on the factory floor. A desktop-first UI fails the primary persona entirely. |
| Single document corpus | In v0.1 | Multi-corpus management adds indexing complexity, namespace isolation, and corpus selection logic that are v1.0 concerns. MVP validates the single-corpus pipeline first. |
| User authentication / login | Deferred v1.0 | Authentication is an operational requirement but not a pipeline validation requirement. It adds session management, token handling, and role logic that are not needed to validate the core RAG loop. Documented as a production gap below. |
| Department-level document namespacing | Deferred v1.0 | Role-based corpus access requires user identity, which requires authentication. The dependency chain means this follows auth, not precedes it. |
| Immutable audit log | Deferred v1.0 | Essential for ISO 9001 compliance and internal governance. Not required to validate that the pipeline retrieves and cites correctly. Documented as a production gap below. |
| LlamaGuard integration | Deferred v1.0 | RAM-infeasible alongside Llama 3.2 3B on free demo tier. Custom G4 Safety Flag provides the MVP safety layer. LlamaGuard activates on G4 trigger in v1.0 — conditional, not always-on. See ADR-003. |
| Word / PowerPoint ingestion | Deferred v1.0 | Many SOPs live in .docx format. python-docx integration is straightforward but adds a dependency and format-specific chunking logic. Deferred to keep MVP ingestion pipeline clean. |
| Whisper local voice (air-gapped) | Deferred v1.0 | Web Speech API uses device STT (Google/Apple backend). For fully air-gapped facilities, local Whisper is required. Added RAM overhead makes it a production concern, not MVP. Documented as a production gap below. See ADR-005. |
| Text-to-speech response readback | Deferred v1.0 | Logical extension of the voice-first design. Web Speech Synthesis API makes this a one-line addition. Deferred to keep MVP UI surface minimal. |
| Query history and session memory | Deferred v1.0 | Persistent conversation history requires session storage and raises data retention questions. Each v0.1 query is stateless by design — simpler, safer, easier to audit. |
| Multi-site federation | v2+ vision | Multiple facility instances sharing a common document standard but isolated corpora. Requires a federation layer and inter-site governance model. Platform-level concern. |
| Proactive document update alerts | v2+ vision | Notify technicians when a procedure they previously queried has been updated. Requires query history (v1.0), document versioning, and a notification pipeline. |
| Custom disclaimer injection per department | v2+ vision | Prepend department-specific legal or compliance text to responses. Requires department namespacing and user identity — both v1.0 foundations. |
These are missing requirements in the current build — not deliberate architectural deferrals. They represent real limitations that would need to be addressed before v0.1 could be operated in a production manufacturing environment. Each has a documented v1.0 mitigation.
v0.1 has no login, session management, or identity layer. Any device connected to the plant WiFi can query the system and retrieve procedural documentation, including safety-critical LOTO procedures. There is no mechanism to restrict access by role, department, or individual user.
v0.1 has no record of what was queried, what was retrieved, what guardrails fired, or what response was delivered. In a regulated manufacturing environment operating under ISO 9001, the absence of an audit trail means the system cannot be included in a management review and AI-assisted decisions cannot be traced.
v0.1 indexes all documents into a single ChromaDB collection. Any authenticated user (once auth is added) would have access to all indexed documents regardless of department, classification, or need-to-know. A maintenance technician would be able to query engineering drawings or quality management records they are not authorised to access.
The Web Speech API uses the device's native speech recognition engine, which in practice routes audio to Google or Apple backends for transcription. In facilities operating under strict air-gap requirements — ITAR-controlled environments, defence supply chain, certain ISO 27001 scopes — this is a data sovereignty violation even if the inference layer is on-prem.
These are not features for their own sake. Each v1.0 addition resolves a specific gap between the portfolio prototype and a system a manufacturer would trust in production. The ordering is intentional — auth enables namespacing; namespacing enables audit; audit enables compliance.
JWT-based authentication. Each user belongs to one or more departments. Queries are scoped to the user's permitted corpus namespaces. A maintenance technician cannot query the Legal department's documents. An engineering manager can query both.
ChromaDB collections mapped to organisational namespaces. Maintenance docs, quality SOPs, engineering drawings, and safety procedures indexed in isolated collections. Query routing respects the authenticated user's permitted namespaces.
Append-only log with timestamp, user ID, normalised query, retrieved chunk references, guardrail outcomes, and response hash. Written before the response is delivered. Cannot be modified or deleted. Reviewed in ISO 9001 management reviews.
LlamaGuard 2 activates only when the custom G4 Safety Flag fires — adding a safety model validation layer for the highest-stakes queries without the always-on RAM overhead. G4 acts as a triage gate; LlamaGuard acts as a secondary validator on G4-triggered queries only.
Replace Web Speech API with locally deployed Whisper medium (244M parameters) for facilities that prohibit any device-level cloud connection. Whisper runs in a separate Docker container, receives audio via internal HTTP, returns transcription with no external calls. GPU-accelerated: <1s for a 5-second query.
python-docx for .docx parsing with heading-based chunking. python-pptx for .pptx parsing with slide-title chunking. Many manufacturing SOPs are authored in Word; many training materials in PowerPoint. Extending ingestion beyond PDF/TXT captures the majority of the real-world corpus.
Web Speech Synthesis API reads the response aloud after delivery. The technician holds their phone near their ear, speaks the query, and hears the procedure read step by step — fully hands-free. A single JavaScript call on top of the existing response flow.
When a document is updated (new SOP revision, revised manual), the system detects the version change, re-indexes the affected chunks, and invalidates any cached responses referencing the old version. A document management webhook or scheduled check triggers re-ingestion automatically.
The v0.1 architecture is designed so that every v1.0 addition slots into the existing structure without requiring a rebuild. The core pipeline — voice → guardrails → retrieval → generation → citation — does not change. Each v1.0 component wraps or extends an existing layer.
These are architectural questions that were considered for v0.1 and deliberately excluded — not because they are unimportant, but because they do not affect the core architectural validation. They are not production gaps; they are optimisations and enhancements that sit above the minimum viable architecture.
Pure semantic search (dense retrieval) can miss exact keyword matches for specific part numbers, error codes, and model identifiers — exactly the kind of query a maintenance technician makes. Hybrid retrieval combining ChromaDB cosine similarity with a BM25 sparse index would improve precision on these high-specificity queries.
HyDE generates a hypothetical answer to the query before retrieval and uses that hypothetical answer's embedding to search the corpus — improving retrieval quality for queries phrased very differently from the document vocabulary. Relevant for voice queries where the transcription may not match the terminology in the manual.
After top-k retrieval, a cross-encoder re-ranker evaluates each retrieved chunk in the context of the specific query and re-orders them by relevance — typically improving precision at rank 1, which is the chunk most likely to ground the response. ColBERT or a lightweight cross-encoder would be candidates.
Manufacturing manuals contain critical information in diagrams, wiring schematics, and exploded-view drawings. A purely text-based RAG system cannot index this content. A multi-modal pipeline using a vision model (LLaVA, Phi-3 Vision) to generate textual descriptions of figures before indexing would capture this signal.
Giskard is a testing framework for LLM applications — it automatically generates adversarial test cases (prompt injections, hallucination probes, bias tests) and evaluates the pipeline against them. Valuable for production QA but not a runtime guardrail.