01 — Model Assumptions

Baseline scenario — FlexForm Precision · 340 employees

All calculations use the FlexForm anchor scenario from Page 03. Figures are illustrative of a mid-size Tier-2 automotive supplier. Sources are cited; assumptions are stated where data is estimated.

340
Employees
260
Floor workers
800
Downtime hrs/yr
£4,333
Cost per downtime min
14
NCRs last 12 months
28K+
Documentation pages

Limitations of this model

  1. All downtime cost figures use sector-wide averages (Aberdeen Group, Siemens, ABB). Actual facility costs vary significantly depending on plant size, product mix, and contract penalties.
  2. The 23% human error attribution is applied as a flat rate across all downtime. In practice, the fraction attributable specifically to procedure retrieval failures is unknown without facility-specific root cause data.
  3. The 10% improvement assumption used in the ROI calculation is illustrative and unsourced. No deployment data exists for this system; no live production instance has been operated.
  4. The £35/hr search time cost uses a general knowledge-worker rate from published salary benchmarks. Shop floor technician rates will differ and should be validated against the target facility's labour cost model.
02 — Problem Cost Estimate

Estimated annual cost of the knowledge retrieval gap — FlexForm scenario

Scenario disclosure: The cost figures below are applied to FlexForm Precision, a constructed design scenario. FlexForm is not a real client. The downtime rate, NCR data, and headcount are synthesised from published manufacturing sector benchmarks. These figures are presented to size the business problem the architecture is designed to address — they are not a client engagement finding or a deployment outcome.

The three cost components below represent the directly addressable financial exposure modelled from the knowledge retrieval gap. Figures are conservatively estimated from published research. The 23% human error attribution and £125K/hr downtime rate are sourced from Siemens, ABB, and Plutomen research cited below.

Cost — 01 · Downtime attributable to procedure errors
Annual downtime cost from human error

Published research attributes approximately 23% of unplanned manufacturing downtime to human error — including wrong or misremembered procedures. At 800 hours of unplanned downtime per year and a mid-size plant rate of £125,000/hr, a conservative 23% attribution produces a significant annual exposure in this scenario. This fraction is not disaggregated by failure type in the source data; the figure represents an upper bound on the addressable portion.

Annual unplanned downtime800 hrs
Human error attribution rate23%
Attributable downtime hours184 hrs/yr
Cost per hour (mid-size plant, ABB 2024)£125,000/hr
Modelled annual cost attributable to knowledge gap£23M/yr
Cost — 02 · Information search time across the floor workforce
Annual labour cost of document retrieval inefficiency

McKinsey research indicates employees spend an average of 1.8 hours per day searching for information. For a manufacturing floor context, a conservative 0.5 hrs/day is applied here. The rate used (£18/hr) reflects a floor-worker estimate; the maintenance section applies a separate IT-administrator rate of £35/hr. These are distinct assumptions and should not be conflated.

Floor workers260
Search time per worker per day0.5 hrs (conservative)
Working days per year250
Total search hours per year32,500 hrs
Floor labour cost per hour (estimated)£18/hr
Modelled annual search time cost£585,000/yr
Cost — 03 · Non-conformance reports from procedure errors
Annual NCR cost from incorrect procedure application

The FlexForm scenario includes 14 NCRs over 12 months, with 9 attributed to incorrect or incomplete procedure application in this model. This attribution (64%) is a design assumption, not a validated finding. NCR cost estimates use a composite of published automotive Tier-2 benchmarks. The range is wide; the figure used (£12,000 average) is mid-range.

NCRs last 12 months (scenario)14
NCRs attributed to procedure errors (modelled)9 (64%)
Average NCR resolution cost (mid-range estimate)£12,000
Range (minor to major)£4,000–£40,000
Modelled annual NCR cost from procedure errors£108,000/yr
ASQ — 33% quality problems from human error · NCR cost estimate: composite of published automotive Tier-2 NCR cost studies
Cost — 04 · Total addressable problem cost
Combined annual exposure — FlexForm scenario

The three cost components above represent the total modelled financial exposure from the knowledge retrieval gap in this scenario. These figures are additive under the stated assumptions. In practice, the fraction of downtime attributable to procedure retrieval specifically (as distinct from broader human error) is not disaggregated in the source literature and would require facility-level measurement to isolate.

Downtime attributable to procedure errors£23,000,000
Search time productivity loss£585,000
NCR costs from procedure errors£108,000
Total modelled annual addressable exposure~£23.7M/yr
Illustrative model. Downtime figure uses £125K/hr (mid-size plant, ABB 2024), not the £260K/hr Aberdeen general manufacturing figure. All attribution rates are sector averages applied to this scenario, not measured at a specific facility.
03 — Solution Cost Breakdown

Solution cost breakdown — component by component

The solution cost is determined by two real line items: server hardware amortisation and IT maintenance time. Four of the six components incur zero ongoing cost because they use open-source software running locally with no managed cloud services. The analysis below itemises each component and its cost basis.

C1
LLM Inference — Ollama + Llama 3.2 3B
Local execution · Port 11434 · No API calls

Ollama serves Llama 3.2 3B entirely locally. Once the model is downloaded (one-time, ~2.0GB), every inference runs at zero marginal API cost. There is no per-token fee, no API key, no rate limit, and no internet requirement. The marginal cost per query is electricity only.

Model download (one-time)~2.0 GB · Llama 3.2 3B Q4
Per-query inference cost£0.00 · local GPU/CPU
API calls to external servicesZero
Comparison: OpenAI GPT-4o-mini~$0.00015 per query · $1.50/10K queries
Comparison: Anthropic Claude Haiku~$0.00025 per query at 1K tokens
£0.00
per month
Structurally free
C2
Vector Store — ChromaDB
Embedded · Persistent to disk · No server

ChromaDB runs embedded in the Python process and persists the vector index to local disk. There is no managed service, no cloud egress, and no subscription fee. A 28,000-page document corpus produces approximately 140,000–280,000 chunks at procedural chunking density. ChromaDB handles this volume at the prototype scale; production performance at this vector count has not been load-tested in this implementation.

LicenceApache 2.0 — free forever
Estimated corpus size (FlexForm scenario)~140K–280K chunks
Disk storage for embeddings~2–4 GB (nomic-embed-text, 768-dim)
Query latency at this scale< 200ms for top-k=3
Comparison: Pinecone Serverless$0.033/1M vectors stored + query costs
£0.00
per month
Structurally free
C3
RAG Orchestration — LlamaIndex
Open source · MIT licence · No managed tier required

LlamaIndex provides the ingestion pipeline, query engine, and retrieval orchestration. The open-source library (pip install llama-index) is free. LlamaCloud managed services exist but are not used in this implementation. All orchestration runs locally in the same Python process as the FastAPI server.

LicenceMIT — free forever
Managed services usedNone
LlamaCloud (not used)$0.30/1K documents indexed · skipped
£0.00
per month
Structurally free
C4
Backend + Frontend — FastAPI · HTML/CSS/JS
Open source · Served locally · No hosting cost in production

FastAPI is open source (MIT). The frontend is a single HTML file served by FastAPI's static file handler. In production, this runs on the on-prem server with zero external hosting cost. The demo environment uses GitHub Pages (free tier) and HuggingFace Spaces (free tier) for accessibility during portfolio review.

FastAPI licenceMIT — free
Demo frontend hostingGitHub Pages — free
Demo backend hostingHuggingFace Spaces free tier
Production hostingOn-prem server — zero external cost
£0.00
per month
Structurally free
C5
Server Hardware — On-Prem Production
One-time capital cost · Shared infrastructure · 5-year depreciation

Production deployment requires a facility server with 32GB RAM and a consumer GPU capable of running a 3B parameter quantised model (RTX 3080 class or equivalent). Many manufacturing facilities already have servers of this specification for MES, SCADA, or CAD workloads. If VaultRAG is co-hosted on existing infrastructure, the incremental hardware cost is zero. The figure below assumes a dedicated server is provisioned.

Minimum production spec32GB RAM · RTX 3080 (10GB VRAM) · 500GB SSD
Dedicated server cost (if new)~£2,500–£4,000 one-time
Amortised over 5 years~£42–£67/month
If shared with existing infrastructure£0 incremental hardware cost
Power consumption (GPU inference)~£8–£15/month at UK industrial rates
↗ Hardware pricing: Scan.co.uk · Overclockers.co.uk · 2024 market rates · Power cost: Ofgem industrial tariff estimates
£50
per month est.
C6
Ongoing Maintenance — Document Re-indexing + Updates
Internal IT time · Estimated 2–4 hrs/month

The primary ongoing operational cost is re-indexing when documents change — new SOP versions, updated equipment manuals, revised NCR procedures. This is a script execution task, not a development task. An internal IT administrator runs the ingestion pipeline when documents are updated. The £35/hr rate is a general knowledge-worker benchmark and may not reflect shop floor IT support rates at a specific facility.

Re-indexing time per document update~15–30 min per batch (automated pipeline)
Estimated update frequency2–4 times per month
IT administrator time per month~2 hrs
Rate used (general knowledge-worker benchmark)~£35/hr
↗ IT salary benchmarks: Reed Technology Salary Guide 2025 · Hays UK Tech Salary Report 2025
£70
per month est.
04 — Cost Comparison Summary

Annual cost summary — problem vs solution

The table below places both cost models side by side under a common set of stated assumptions. The breakeven and ratio figures that follow are derived from these inputs and are illustrative of the design scenario — not a projection or a deployment outcome.

VaultRAG v0.1 · FlexForm scenario · Annual cost comparison — illustrative model
Modelled annual problem cost
£23.7M
Downtime + search time + NCR costs attributed to knowledge retrieval gap under stated assumptions
Annual solution cost
£1,440
£120/month · server amortisation + IT maintenance. All software components are open-source and £0.
Breakeven threshold
0.006%
The solution recovers its annual cost if it prevents 0.006% of the modelled problem cost
Ratio at 10% improvement
1,647×
A 10% reduction in procedure-error downtime returns £2.37M against £1,440 annual cost — under the stated assumptions

Breakeven threshold: a 0.006% reduction in procedure-error downtime recovers the full annual solution cost. A 10% reduction returns approximately £2.37M against £1,440 annual cost — a 1,647× ratio. This model uses illustrative benchmarks applied to the FlexForm design scenario and is not a projection. No deployment data exists for this system, and the 10% improvement assumption is unsourced.

Cost comparison — annual · logarithmic scale · FlexForm illustrative model
ANNUAL COST (£) £25M £2.5M £250K £25K £2.5K £23.7M Modelled problem cost £2.37M Value of 10% improvement £1,440 Annual solution cost 1,647× ratio at 10%

Logarithmic scale. All figures are illustrative — derived from published industry benchmarks applied to the FlexForm design scenario. Actual impact depends on facility-specific downtime rates, error attribution data, and system adoption. The 10% improvement assumption is unsourced; no deployment data exists for this system.

05 — Alternative Approaches

VaultRAG v0.1 vs alternative approaches

This table positions VaultRAG against named alternatives in the context of the FlexForm design scenario. The comparison includes a Limitations column that identifies areas where VaultRAG v0.1 does not currently compete. These are genuine gaps in the current implementation, not deferred features.

Solution comparison — FlexForm Precision context · 2026
Solution Annual Cost Data Sovereignty Voice Input Mobile-First NDA / ITAR Compatible Limitations
VaultRAG v0.1 £1,440/yr ✓ On-prem · zero egress ✓ Web Speech API ✓ Designed for it ✓ Architecturally enforced No SSO · no audit log · single namespace · no enterprise support contract
Azure OpenAI on Your Data £8K–£40K/yr (est.) ✗ Data sent to Azure OpenAI ~ Add-on required ~ Depends on config ✗ Cloud dependency · excluded by NDA/ITAR Enterprise SSO · audit logs · Microsoft support
AWS Bedrock Knowledge Bases £10K–£50K/yr (est. at scale) ✗ Data processed in AWS ~ Requires integration ~ Custom build required ✗ Cloud dependency Enterprise support · IAM · multi-namespace · audit trail
Google Vertex AI Search £8K–£30K/yr (est. at scale) ✗ Data processed in GCP ~ Requires Dialogflow CX ~ Custom build required ✗ Cloud dependency Enterprise support · IAM · multi-tenant · audit trail
ServiceNow Knowledge Management £360+/user/yr · £120K+ for 340 users ~ ServiceNow cloud ✗ Limited native voice ~ Responsive UI, not floor-optimised ✗ Cloud dependency SSO · RBAC · audit logs · enterprise support · workflow integration
Guru / Notion AI £5–£15/user/mo · £20K–£60K/yr ✗ SaaS cloud storage ✗ None ~ General mobile support ✗ Cloud dependent SSO · audit logs · collaboration features · enterprise support
Microsoft Copilot for M365 £360/user/yr · £122K for 340 users ~ MS Azure cloud ~ Limited ~ Partial ✗ Cloud dependency SSO · audit logs · Microsoft enterprise support
SharePoint + keyword search £0 marginal (existing M365) ~ MS cloud ✗ None ✗ Desktop-first UI ~ Cloud dependent Keyword search only · no semantic retrieval · no voice
Printed binders + tribal knowledge £0 direct cost ✓ No digital egress ✗ None ✗ Not applicable ✓ No digital data Version control failure · no search · dependent on individual knowledge retention
Per-query cost comparison — assuming 500 queries/month across 260 floor workers
ComponentVaultRAG (on-prem)OpenAI RAG equivalentSaving
LLM inference per query £0.000 · local Ollama ~£0.0001 · GPT-4o-mini 100%
Embedding per query £0.000 · nomic-embed-text local ~£0.000002 · OpenAI ada-002 100%
Vector search per query £0.000 · ChromaDB local ~£0.000033 · Pinecone serverless 100%
Monthly total (500 queries) £0.00 API costs ~£0.07 · minimal at small scale 100%
Data sent to external API Zero bytes Every query + every document chunk ∞ data sovereignty
Next — Roadmap →