This page presents two cost models: an estimate of the business cost attributable to the knowledge retrieval problem, and a breakdown of the solution's running cost. All figures are derived from published vendor pricing or cited research benchmarks applied to the FlexForm design scenario. Every assumption is stated. Every source is linked.
All calculations use the FlexForm anchor scenario from Page 03. Figures are illustrative of a mid-size Tier-2 automotive supplier. Sources are cited; assumptions are stated where data is estimated.
The three cost components below represent the directly addressable financial exposure modelled from the knowledge retrieval gap. Figures are conservatively estimated from published research. The 23% human error attribution and £125K/hr downtime rate are sourced from Siemens, ABB, and Plutomen research cited below.
Published research attributes approximately 23% of unplanned manufacturing downtime to human error — including wrong or misremembered procedures. At 800 hours of unplanned downtime per year and a mid-size plant rate of £125,000/hr, a conservative 23% attribution produces a significant annual exposure in this scenario. This fraction is not disaggregated by failure type in the source data; the figure represents an upper bound on the addressable portion.
McKinsey research indicates employees spend an average of 1.8 hours per day searching for information. For a manufacturing floor context, a conservative 0.5 hrs/day is applied here. The rate used (£18/hr) reflects a floor-worker estimate; the maintenance section applies a separate IT-administrator rate of £35/hr. These are distinct assumptions and should not be conflated.
The FlexForm scenario includes 14 NCRs over 12 months, with 9 attributed to incorrect or incomplete procedure application in this model. This attribution (64%) is a design assumption, not a validated finding. NCR cost estimates use a composite of published automotive Tier-2 benchmarks. The range is wide; the figure used (£12,000 average) is mid-range.
The three cost components above represent the total modelled financial exposure from the knowledge retrieval gap in this scenario. These figures are additive under the stated assumptions. In practice, the fraction of downtime attributable to procedure retrieval specifically (as distinct from broader human error) is not disaggregated in the source literature and would require facility-level measurement to isolate.
The solution cost is determined by two real line items: server hardware amortisation and IT maintenance time. Four of the six components incur zero ongoing cost because they use open-source software running locally with no managed cloud services. The analysis below itemises each component and its cost basis.
Ollama serves Llama 3.2 3B entirely locally. Once the model is downloaded (one-time, ~2.0GB), every inference runs at zero marginal API cost. There is no per-token fee, no API key, no rate limit, and no internet requirement. The marginal cost per query is electricity only.
ChromaDB runs embedded in the Python process and persists the vector index to local disk. There is no managed service, no cloud egress, and no subscription fee. A 28,000-page document corpus produces approximately 140,000–280,000 chunks at procedural chunking density. ChromaDB handles this volume at the prototype scale; production performance at this vector count has not been load-tested in this implementation.
LlamaIndex provides the ingestion pipeline, query engine, and retrieval orchestration. The open-source library (pip install llama-index) is free. LlamaCloud managed services exist but are not used in this implementation. All orchestration runs locally in the same Python process as the FastAPI server.
FastAPI is open source (MIT). The frontend is a single HTML file served by FastAPI's static file handler. In production, this runs on the on-prem server with zero external hosting cost. The demo environment uses GitHub Pages (free tier) and HuggingFace Spaces (free tier) for accessibility during portfolio review.
Production deployment requires a facility server with 32GB RAM and a consumer GPU capable of running a 3B parameter quantised model (RTX 3080 class or equivalent). Many manufacturing facilities already have servers of this specification for MES, SCADA, or CAD workloads. If VaultRAG is co-hosted on existing infrastructure, the incremental hardware cost is zero. The figure below assumes a dedicated server is provisioned.
The primary ongoing operational cost is re-indexing when documents change — new SOP versions, updated equipment manuals, revised NCR procedures. This is a script execution task, not a development task. An internal IT administrator runs the ingestion pipeline when documents are updated. The £35/hr rate is a general knowledge-worker benchmark and may not reflect shop floor IT support rates at a specific facility.
The table below places both cost models side by side under a common set of stated assumptions. The breakeven and ratio figures that follow are derived from these inputs and are illustrative of the design scenario — not a projection or a deployment outcome.
Breakeven threshold: a 0.006% reduction in procedure-error downtime recovers the full annual solution cost. A 10% reduction returns approximately £2.37M against £1,440 annual cost — a 1,647× ratio. This model uses illustrative benchmarks applied to the FlexForm design scenario and is not a projection. No deployment data exists for this system, and the 10% improvement assumption is unsourced.
Logarithmic scale. All figures are illustrative — derived from published industry benchmarks applied to the FlexForm design scenario. Actual impact depends on facility-specific downtime rates, error attribution data, and system adoption. The 10% improvement assumption is unsourced; no deployment data exists for this system.
This table positions VaultRAG against named alternatives in the context of the FlexForm design scenario. The comparison includes a Limitations column that identifies areas where VaultRAG v0.1 does not currently compete. These are genuine gaps in the current implementation, not deferred features.
| Solution | Annual Cost | Data Sovereignty | Voice Input | Mobile-First | NDA / ITAR Compatible | Limitations |
|---|---|---|---|---|---|---|
| VaultRAG v0.1 | £1,440/yr | ✓ On-prem · zero egress | ✓ Web Speech API | ✓ Designed for it | ✓ Architecturally enforced | No SSO · no audit log · single namespace · no enterprise support contract |
| Azure OpenAI on Your Data | £8K–£40K/yr (est.) | ✗ Data sent to Azure OpenAI | ~ Add-on required | ~ Depends on config | ✗ Cloud dependency · excluded by NDA/ITAR | Enterprise SSO · audit logs · Microsoft support |
| AWS Bedrock Knowledge Bases | £10K–£50K/yr (est. at scale) | ✗ Data processed in AWS | ~ Requires integration | ~ Custom build required | ✗ Cloud dependency | Enterprise support · IAM · multi-namespace · audit trail |
| Google Vertex AI Search | £8K–£30K/yr (est. at scale) | ✗ Data processed in GCP | ~ Requires Dialogflow CX | ~ Custom build required | ✗ Cloud dependency | Enterprise support · IAM · multi-tenant · audit trail |
| ServiceNow Knowledge Management | £360+/user/yr · £120K+ for 340 users | ~ ServiceNow cloud | ✗ Limited native voice | ~ Responsive UI, not floor-optimised | ✗ Cloud dependency | SSO · RBAC · audit logs · enterprise support · workflow integration |
| Guru / Notion AI | £5–£15/user/mo · £20K–£60K/yr | ✗ SaaS cloud storage | ✗ None | ~ General mobile support | ✗ Cloud dependent | SSO · audit logs · collaboration features · enterprise support |
| Microsoft Copilot for M365 | £360/user/yr · £122K for 340 users | ~ MS Azure cloud | ~ Limited | ~ Partial | ✗ Cloud dependency | SSO · audit logs · Microsoft enterprise support |
| SharePoint + keyword search | £0 marginal (existing M365) | ~ MS cloud | ✗ None | ✗ Desktop-first UI | ~ Cloud dependent | Keyword search only · no semantic retrieval · no voice |
| Printed binders + tribal knowledge | £0 direct cost | ✓ No digital egress | ✗ None | ✗ Not applicable | ✓ No digital data | Version control failure · no search · dependent on individual knowledge retention |
| Component | VaultRAG (on-prem) | OpenAI RAG equivalent | Saving |
|---|---|---|---|
| LLM inference per query | £0.000 · local Ollama | ~£0.0001 · GPT-4o-mini | 100% |
| Embedding per query | £0.000 · nomic-embed-text local | ~£0.000002 · OpenAI ada-002 | 100% |
| Vector search per query | £0.000 · ChromaDB local | ~£0.000033 · Pinecone serverless | 100% |
| Monthly total (500 queries) | £0.00 API costs | ~£0.07 · minimal at small scale | 100% |
| Data sent to external API | Zero bytes | Every query + every document chunk | ∞ data sovereignty |