VaultRAG — Cost Analysis

01 — Model Assumptions

Baseline scenario — FlexForm Precision · 340 employees

All calculations use the FlexForm anchor scenario from Page 03. Figures are illustrative of a mid-size Tier-2 automotive supplier. Sources are cited; assumptions are stated where data is estimated.

340

Employees

260

Floor workers

800

Downtime hrs/yr

£4,333

Cost per downtime min

NCRs last 12 months

28K+

Documentation pages

Limitations of this model

All downtime cost figures use sector-wide averages (Aberdeen Group, Siemens, ABB). Actual facility costs vary significantly depending on plant size, product mix, and contract penalties.
The 23% human error attribution is applied as a flat rate across all downtime. In practice, the fraction attributable specifically to procedure retrieval failures is unknown without facility-specific root cause data.
The 10% improvement assumption used in the ROI calculation is illustrative and unsourced. No deployment data exists for this system; no live production instance has been operated.
The £35/hr search time cost uses a general knowledge-worker rate from published salary benchmarks. Shop floor technician rates will differ and should be validated against the target facility's labour cost model.

02 — Problem Cost Estimate

Estimated annual cost of the knowledge retrieval gap — FlexForm scenario

Scenario disclosure: The cost figures below are applied to FlexForm Precision, a constructed design scenario. FlexForm is not a real client. The downtime rate, NCR data, and headcount are synthesised from published manufacturing sector benchmarks. These figures are presented to size the business problem the architecture is designed to address — they are not a client engagement finding or a deployment outcome.

The three cost components below represent the directly addressable financial exposure modelled from the knowledge retrieval gap. Figures are conservatively estimated from published research. The 23% human error attribution and £125K/hr downtime rate are sourced from Siemens, ABB, and Plutomen research cited below.

Cost — 01 · Downtime attributable to procedure errors

Annual downtime cost from human error

Published research attributes approximately 23% of unplanned manufacturing downtime to human error — including wrong or misremembered procedures. At 800 hours of unplanned downtime per year and a mid-size plant rate of £125,000/hr, a conservative 23% attribution produces a significant annual exposure in this scenario. This fraction is not disaggregated by failure type in the source data; the figure represents an upper bound on the addressable portion.

Annual unplanned downtime800 hrs
Human error attribution rate23%
Attributable downtime hours184 hrs/yr
Cost per hour (mid-size plant, ABB 2024)£125,000/hr
Modelled annual cost attributable to knowledge gap£23M/yr

↗ Siemens True Cost of Downtime 2024 (800 hrs figure) · ABB Value of Reliability 2024 (£125K/hr mid-size) · Plutomen / ABB (23% human error rate)

Cost — 02 · Information search time across the floor workforce

Annual labour cost of document retrieval inefficiency

McKinsey research indicates employees spend an average of 1.8 hours per day searching for information. For a manufacturing floor context, a conservative 0.5 hrs/day is applied here. The rate used (£18/hr) reflects a floor-worker estimate; the maintenance section applies a separate IT-administrator rate of £35/hr. These are distinct assumptions and should not be conflated.

Floor workers260
Search time per worker per day0.5 hrs (conservative)
Working days per year250
Total search hours per year32,500 hrs
Floor labour cost per hour (estimated)£18/hr
Modelled annual search time cost£585,000/yr

↗ McKinsey Global Institute via Copernic 2025 (1.8 hrs/day baseline) · Gartner via M-Files (18 min per document)

Cost — 03 · Non-conformance reports from procedure errors

Annual NCR cost from incorrect procedure application

The FlexForm scenario includes 14 NCRs over 12 months, with 9 attributed to incorrect or incomplete procedure application in this model. This attribution (64%) is a design assumption, not a validated finding. NCR cost estimates use a composite of published automotive Tier-2 benchmarks. The range is wide; the figure used (£12,000 average) is mid-range.

NCRs last 12 months (scenario)14
NCRs attributed to procedure errors (modelled)9 (64%)
Average NCR resolution cost (mid-range estimate)£12,000
Range (minor to major)£4,000–£40,000
Modelled annual NCR cost from procedure errors£108,000/yr

↗ ASQ — 33% quality problems from human error · NCR cost estimate: composite of published automotive Tier-2 NCR cost studies

Cost — 04 · Total addressable problem cost

Combined annual exposure — FlexForm scenario

The three cost components above represent the total modelled financial exposure from the knowledge retrieval gap in this scenario. These figures are additive under the stated assumptions. In practice, the fraction of downtime attributable to procedure retrieval specifically (as distinct from broader human error) is not disaggregated in the source literature and would require facility-level measurement to isolate.

Downtime attributable to procedure errors£23,000,000
Search time productivity loss£585,000
NCR costs from procedure errors£108,000
Total modelled annual addressable exposure~£23.7M/yr

Illustrative model. Downtime figure uses £125K/hr (mid-size plant, ABB 2024), not the £260K/hr Aberdeen general manufacturing figure. All attribution rates are sector averages applied to this scenario, not measured at a specific facility.

03 — Solution Cost Breakdown

Solution cost breakdown — component by component

The solution cost is determined by two real line items: server hardware amortisation and IT maintenance time. Four of the six components incur zero ongoing cost because they use open-source software running locally with no managed cloud services. The analysis below itemises each component and its cost basis.

LLM Inference — Ollama + Llama 3.2 3B

Local execution · Port 11434 · No API calls

Ollama serves Llama 3.2 3B entirely locally. Once the model is downloaded (one-time, ~2.0GB), every inference runs at zero marginal API cost. There is no per-token fee, no API key, no rate limit, and no internet requirement. The marginal cost per query is electricity only.

Model download (one-time)~2.0 GB · Llama 3.2 3B Q4

Per-query inference cost£0.00 · local GPU/CPU

API calls to external servicesZero

Comparison: OpenAI GPT-4o-mini~$0.00015 per query · $1.50/10K queries

Comparison: Anthropic Claude Haiku~$0.00025 per query at 1K tokens

↗ Ollama.com — open source local model serving · Meta Llama 3.2 — Apache 2.0 licence

£0.00

per month

Structurally free

Vector Store — ChromaDB

Embedded · Persistent to disk · No server

ChromaDB runs embedded in the Python process and persists the vector index to local disk. There is no managed service, no cloud egress, and no subscription fee. A 28,000-page document corpus produces approximately 140,000–280,000 chunks at procedural chunking density. ChromaDB handles this volume at the prototype scale; production performance at this vector count has not been load-tested in this implementation.

LicenceApache 2.0 — free forever

Estimated corpus size (FlexForm scenario)~140K–280K chunks

Disk storage for embeddings~2–4 GB (nomic-embed-text, 768-dim)

Query latency at this scale< 200ms for top-k=3

Comparison: Pinecone Serverless$0.033/1M vectors stored + query costs

↗ ChromaDB — open source embedding database

£0.00

per month

Structurally free

RAG Orchestration — LlamaIndex

Open source · MIT licence · No managed tier required

LlamaIndex provides the ingestion pipeline, query engine, and retrieval orchestration. The open-source library (pip install llama-index) is free. LlamaCloud managed services exist but are not used in this implementation. All orchestration runs locally in the same Python process as the FastAPI server.

LicenceMIT — free forever

Managed services usedNone

LlamaCloud (not used)$0.30/1K documents indexed · skipped

↗ LlamaIndex — MIT licence

£0.00

per month

Structurally free

Backend + Frontend — FastAPI · HTML/CSS/JS

Open source · Served locally · No hosting cost in production

FastAPI is open source (MIT). The frontend is a single HTML file served by FastAPI's static file handler. In production, this runs on the on-prem server with zero external hosting cost. The demo environment uses GitHub Pages (free tier) and HuggingFace Spaces (free tier) for accessibility during portfolio review.

FastAPI licenceMIT — free

Demo frontend hostingGitHub Pages — free

Demo backend hostingHuggingFace Spaces free tier

Production hostingOn-prem server — zero external cost

↗ FastAPI — MIT licence · GitHub Pages — free · HuggingFace Spaces — free tier

£0.00

per month

Structurally free

Server Hardware — On-Prem Production

One-time capital cost · Shared infrastructure · 5-year depreciation

Production deployment requires a facility server with 32GB RAM and a consumer GPU capable of running a 3B parameter quantised model (RTX 3080 class or equivalent). Many manufacturing facilities already have servers of this specification for MES, SCADA, or CAD workloads. If VaultRAG is co-hosted on existing infrastructure, the incremental hardware cost is zero. The figure below assumes a dedicated server is provisioned.

Minimum production spec32GB RAM · RTX 3080 (10GB VRAM) · 500GB SSD

Dedicated server cost (if new)~£2,500–£4,000 one-time

Amortised over 5 years~£42–£67/month

If shared with existing infrastructure£0 incremental hardware cost

Power consumption (GPU inference)~£8–£15/month at UK industrial rates

↗ Hardware pricing: Scan.co.uk · Overclockers.co.uk · 2024 market rates · Power cost: Ofgem industrial tariff estimates

£50

per month est.

Amortised capex

Ongoing Maintenance — Document Re-indexing + Updates

Internal IT time · Estimated 2–4 hrs/month

The primary ongoing operational cost is re-indexing when documents change — new SOP versions, updated equipment manuals, revised NCR procedures. This is a script execution task, not a development task. An internal IT administrator runs the ingestion pipeline when documents are updated. The £35/hr rate is a general knowledge-worker benchmark and may not reflect shop floor IT support rates at a specific facility.

Re-indexing time per document update~15–30 min per batch (automated pipeline)

Estimated update frequency2–4 times per month

IT administrator time per month~2 hrs

Rate used (general knowledge-worker benchmark)~£35/hr

↗ IT salary benchmarks: Reed Technology Salary Guide 2025 · Hays UK Tech Salary Report 2025

£70

per month est.

Internal IT time

04 — Cost Comparison Summary

Annual cost summary — problem vs solution

The table below places both cost models side by side under a common set of stated assumptions. The breakeven and ratio figures that follow are derived from these inputs and are illustrative of the design scenario — not a projection or a deployment outcome.

VaultRAG v0.1 · FlexForm scenario · Annual cost comparison — illustrative model

Modelled annual problem cost

£23.7M

Downtime + search time + NCR costs attributed to knowledge retrieval gap under stated assumptions

Annual solution cost

£1,440

£120/month · server amortisation + IT maintenance. All software components are open-source and £0.

Breakeven threshold

0.006%

The solution recovers its annual cost if it prevents 0.006% of the modelled problem cost

Ratio at 10% improvement

1,647×

A 10% reduction in procedure-error downtime returns £2.37M against £1,440 annual cost — under the stated assumptions

Breakeven threshold: a 0.006% reduction in procedure-error downtime recovers the full annual solution cost. A 10% reduction returns approximately £2.37M against £1,440 annual cost — a 1,647× ratio. This model uses illustrative benchmarks applied to the FlexForm design scenario and is not a projection. No deployment data exists for this system, and the 10% improvement assumption is unsourced.

Cost comparison — annual · logarithmic scale · FlexForm illustrative model

Logarithmic scale. All figures are illustrative — derived from published industry benchmarks applied to the FlexForm design scenario. Actual impact depends on facility-specific downtime rates, error attribution data, and system adoption. The 10% improvement assumption is unsourced; no deployment data exists for this system.

05 — Alternative Approaches

VaultRAG v0.1 vs alternative approaches

This table positions VaultRAG against named alternatives in the context of the FlexForm design scenario. The comparison includes a Limitations column that identifies areas where VaultRAG v0.1 does not currently compete. These are genuine gaps in the current implementation, not deferred features.

Solution comparison — FlexForm Precision context · 2026

Solution	Annual Cost	Data Sovereignty	Voice Input	Mobile-First	NDA / ITAR Compatible	Limitations
VaultRAG v0.1	£1,440/yr	✓ On-prem · zero egress	✓ Web Speech API	✓ Designed for it	✓ Architecturally enforced	No SSO · no audit log · single namespace · no enterprise support contract
Azure OpenAI on Your Data	£8K–£40K/yr (est.)	✗ Data sent to Azure OpenAI	~ Add-on required	~ Depends on config	✗ Cloud dependency · excluded by NDA/ITAR	Enterprise SSO · audit logs · Microsoft support
AWS Bedrock Knowledge Bases	£10K–£50K/yr (est. at scale)	✗ Data processed in AWS	~ Requires integration	~ Custom build required	✗ Cloud dependency	Enterprise support · IAM · multi-namespace · audit trail
Google Vertex AI Search	£8K–£30K/yr (est. at scale)	✗ Data processed in GCP	~ Requires Dialogflow CX	~ Custom build required	✗ Cloud dependency	Enterprise support · IAM · multi-tenant · audit trail
ServiceNow Knowledge Management	£360+/user/yr · £120K+ for 340 users	~ ServiceNow cloud	✗ Limited native voice	~ Responsive UI, not floor-optimised	✗ Cloud dependency	SSO · RBAC · audit logs · enterprise support · workflow integration
Guru / Notion AI	£5–£15/user/mo · £20K–£60K/yr	✗ SaaS cloud storage	✗ None	~ General mobile support	✗ Cloud dependent	SSO · audit logs · collaboration features · enterprise support
Microsoft Copilot for M365	£360/user/yr · £122K for 340 users	~ MS Azure cloud	~ Limited	~ Partial	✗ Cloud dependency	SSO · audit logs · Microsoft enterprise support
SharePoint + keyword search	£0 marginal (existing M365)	~ MS cloud	✗ None	✗ Desktop-first UI	~ Cloud dependent	Keyword search only · no semantic retrieval · no voice
Printed binders + tribal knowledge	£0 direct cost	✓ No digital egress	✗ None	✗ Not applicable	✓ No digital data	Version control failure · no search · dependent on individual knowledge retention

Per-query cost comparison — assuming 500 queries/month across 260 floor workers

Component	VaultRAG (on-prem)	OpenAI RAG equivalent	Saving
LLM inference per query	£0.000 · local Ollama	~£0.0001 · GPT-4o-mini	100%
Embedding per query	£0.000 · nomic-embed-text local	~£0.000002 · OpenAI ada-002	100%
Vector search per query	£0.000 · ChromaDB local	~£0.000033 · Pinecone serverless	100%
Monthly total (500 queries)	£0.00 API costs	~£0.07 · minimal at small scale	100%
Data sent to external API	Zero bytes	Every query + every document chunk	∞ data sovereignty

Next — Roadmap →

Cost analysis —solution cost vs problem cost

Baseline scenario — FlexForm Precision · 340 employees

Limitations of this model

Estimated annual cost of the knowledge retrieval gap — FlexForm scenario

Solution cost breakdown — component by component

Annual cost summary — problem vs solution

VaultRAG v0.1 vs alternative approaches

Cost analysis —
solution cost vs problem cost