Architecture Portfolio · 2026 · Manufacturing · Enterprise AI
A design study in voice-first document retrieval for factory floor use. The architecture is built around three constraints that exclude cloud RAG services: data sovereignty requirements, air-gapped network conditions, and zero ongoing API cost. All inference runs on-prem. This portfolio documents the design decisions, trade-offs, and known limitations.
The manufacturing knowledge retrieval problem is not new. What changed between 2022 and 2024 is the availability of open-source components capable of running the full RAG pipeline locally — without cloud dependencies, without API costs, and without sending proprietary documents outside the facility network.
Llama 3.2 3B runs on commodity hardware with instruction-following quality sufficient for structured procedure retrieval. A model capable of reading a 400-page equipment manual and returning the relevant procedure now runs entirely on a facility server — no API calls, no internet dependency at inference time.
Llama 3.2 · Meta · 2024The Web Speech API is supported natively in every modern mobile browser. A technician can speak a query one-handed without installing anything. In the production deployment this is replaced by local Whisper STT to eliminate the external STT backend dependency — that trade-off is documented in ADR-005.
Web Speech API · Baseline 2023ChromaDB and similar embedded stores reduced the retrieval layer to a single dependency with persistent local storage. Semantic search over a facility's document corpus now requires no external service, no managed database, and no ongoing cost — making it deployable inside an air-gapped network.
ChromaDB · Apache 2.0 · 2024Manufacturing firms in aerospace, automotive, and defence supply chains increasingly prohibit sending proprietary process data to third-party APIs. This contractual constraint — not a preference — architecturally excludes cloud-hosted RAG services and creates a genuine requirement for the on-prem pattern this project addresses.
ISO 27001 · ITAR · NDA clausesThe design scenario addresses two related problems: a business cost measured in downtime and rework, and an operational constraint measured in the time it takes a technician to locate and apply the correct procedure under fault conditions. Both are documented in detail on page 02.
A mid-size manufacturer typically maintains 12,000–40,000 pages of technical documentation — equipment manuals, ISO-controlled SOPs, non-conformance reports, maintenance logs. This documentation exists as PDFs on network drives and printed binders. It is not queryable. When a machine faults, the retrieval process is manual, sequential, and slow.
Every minute of that search is unplanned downtime. Every wrong procedure is extended downtime plus a potential non-conformance record. The cost is documented; the retrieval mechanism has not changed.
The Haas CNC has thrown an E-04 fault the floor hasn't seen before. The line is stopped. The manual is a 380-page PDF on a laptop forty metres away. The technician asks a colleague who thinks he remembers the procedure. They apply it. The fault clears and returns two hours later — now a recurring incident in the NCR log.
The knowledge existed in the documentation. The failure was retrieval: noise, distance, time pressure, and no device suitable for navigating a 380-page PDF one-handed under a running machine.
The architecture separates concerns across three layers. In the target production deployment, all three layers run inside the facility network: no data crosses the boundary at any point. The portfolio demo runs on HuggingFace Spaces with browser-native STT — those are documented exceptions that apply to the prototype only. The diagram below reflects the production architecture.