03
The Autonomous HR / Page 03 / Cost Methodology

$0.13/employee/month.
Here is every number.

Every cost figure on this site is derived from live, publicly documented pricing from the vendors themselves. This page shows the full calculation — every component, every assumption, every source — so you can verify, challenge, or adapt the model for your own scale and region.

Model assumptions — baseline scenario
50
Employees on the system
500
Total HR interactions per month
50
Voice / IVR calls per month
45s
Average voice call duration
~1,000
Tokens per LLM interaction (avg)
01 — Component by component

Six cost components.
Four are zero.

The architecture is designed so that the majority of components fall entirely within free tiers at SMB scale. Real costs accrue on exactly three line items: compute, voice, and LLM inference. Everything else is structurally zero at under 200 employees.

Calculation
Interactions / month 500
Avg tokens per interaction (input + output) ~1,000 tokens
Total tokens / month ~500,000 tokens
Gemini 1.5 Flash input price $0.075 / 1M tokens
Gemini 1.5 Flash output price $0.30 / 1M tokens
Blended avg (70% input / 30% output) ~$0.143 / 1M tokens
Monthly cost ~$0.04
Why Gemini 1.5 Flash
Gemini 1.5 Flash is retained as a managed GCP service rather than replaced with an open-source alternative because latency SLA matters here. A worker waiting for a leave confirmation on a phone call cannot tolerate an 8-second cold-start on a CPU-only container. Gemini Flash on Vertex AI delivers P99 latency under 2 seconds for these workloads.

At $0.075/1M input tokens, it is the cheapest capable hosted LLM available from any major provider as of Q1 2026. Gemini 2.5 Flash-Lite ($0.10/1M) and GPT-4o-mini ($0.15/1M) are both more expensive at equivalent quality for this task type.
Calculation
Voice calls / month 50 calls
Average call duration 45 seconds
Total audio / month 37.5 minutes
Whisper on T4 spot GPU · ~$0.0003/min ~$0.01
Cloud Run GPU spot surcharge (amortised) ~$0.01
Comparison: Google STT v2 at $0.006/15s would cost ~$0.09
Monthly cost (Whisper) ~$0.02
Why Whisper over Google STT
OpenAI Whisper large-v3 achieves word error rates below 8% across 99 languages, with particular strength in Indic languages, code-switching (Hinglish, Tenglish), and regional accents that Google STT v2 handles less consistently for the dialects spoken in South and Southeast Asian manufacturing centres.

Deployed as a containerised Cloud Run service with a minimum-instance warm configuration, cold start is eliminated. On failure, the service falls back automatically to Google STT — maintaining uptime SLA at managed-service levels.

Cost saving vs Google STT: ~78% at this interaction volume.
Calculation
CPU rate (Tier 1 · asia-south1) $0.000024 / vCPU-sec
Memory rate $0.0000025 / GiB-sec
Free tier (monthly) 180K vCPU-sec · 2M req
Estimated active compute / month ~220K vCPU-sec
Billable after free tier ~40K vCPU-sec
Whisper min-instance warm keep (1 vCPU) ~$3.20 / month
Remaining active compute cost ~$2.00
Monthly cost ~$5.20
Scale-to-zero architecture
Cloud Run is the single largest cost item — and intentionally so. Every service scales to zero when idle. A 50-person retail business does not generate HR interactions at 3am. The cost aligns exactly with usage.

The largest sub-cost is the Whisper minimum instance: keeping one warm container avoids a 3–8 second GPU cold start when a worker calls. This is a deliberate design trade: $3.20/month buys consistent sub-4-second transcription response rather than occasional cold-start delays that break the voice UX.

Cloud Run's free tier (180,000 vCPU-seconds/month) absorbs roughly the first 180 interactions entirely. At <200 employees, a significant portion of month-end compute may remain in the free tier.
Calculation
Voice calls / month 50 calls
Average call duration 45 seconds
Total minutes / month 37.5 minutes
Exotel India SIP rate (approx) ~₹0.30–0.40 / min
Equivalent in USD ~$0.004 / min
Comparison: Twilio Voice (US billing) $0.013 / min — 3× more
Monthly cost ~$1.30
Why a local India SIP provider
Exotel and Plivo both operate India-local SIP infrastructure with data centres in Mumbai and Chennai. Using a local PSTN interconnect instead of routing through a US-billed provider like Twilio eliminates the international leg cost and reduces call latency.

Exotel's credit-based pricing (1 credit = ₹1) and startup tiers make it practical for a 50-person business without a contract commitment. For deployments outside India, the same architecture substitutes the local SIP provider for the equivalent regional operator — the IVR layer is fully provider-agnostic.

Note: Exotel does not publish per-minute rates on its public pricing page; the ₹0.30–0.40/min figure is derived from their credit-based plan structure and third-party comparisons. Exact rates require a sales quote and vary by call type and volume tier.
Calculation
Total interactions / month 500
Nature of interactions User-initiated service
User-initiated service messages (Meta fee) $0.00 (free)
Utility templates in 24h service window $0.00 (free, from Jul 2025)
Business-initiated utility (India rate) $0.0107 / msg if outside window
Estimated business-initiated msgs / month ~0 (workers initiate)
Monthly cost $0.00
Why WhatsApp is structurally free for this use case
Meta's July 2025 pricing shift from conversation-based to per-message billing was a significant change — but one that favours this architecture specifically.

The HR system is almost entirely worker-initiated: the employee sends the message, the system replies. Under Meta's model, user-initiated service conversations carry zero Meta fee. Utility template replies sent within the 24-hour service window are also free from July 2025 onwards.

The only scenario incurring a charge would be a proactive HITL escalation message to the owner — a business-initiated utility template at ~$0.011 per message in India, or ~$0.05 in Europe. At <50 escalations/month, this cost rounds to negligible ($0.55/month worst case for India).

This is not a loophole — it is the intended model. Meta explicitly exempts worker-initiated support interactions from fees to encourage business adoption.
Free tier limits vs. our usage
Firestore free reads / day 50,000 · we use ~600
Firestore free writes / day 20,000 · we use ~170
Pub/Sub free messages / month 10 GB · we use ~0.1 MB
Cloud Functions free invocations / month 2,000,000 · we use ~1,000
Supabase free DB storage 500 MB · policy doc ~1–2 MB
Supabase free pgvector queries Unlimited on free tier
Monthly cost $0.00
Structural free tier fit
A 50-person business generating 500 HR interactions per month sits one to two orders of magnitude below the free tier thresholds of every data service in the stack.

Firestore's free tier alone covers 50,000 reads per day — roughly 83× our estimated daily read volume. Pub/Sub's 10GB free monthly allowance covers our ~100KB of event payloads with 99.99% headroom.

Supabase's free tier (500MB Postgres, including pgvector) is sufficient to index any HR policy document a small business would produce — a 20-page policy PDF produces roughly 500–800 chunks at 256 tokens each, consuming under 2MB total. The RAG store is structurally free at SMB scale.

These costs begin to appear at 500+ employees and 50,000+ interactions/month — well beyond the target segment for this architecture phase.
Monthly totals — baseline scenario (50 employees)
The full bill.
C1 · LLM inference
$0.04
Gemini 1.5 Flash · 500K tokens
C2 · Speech-to-text
$0.02
Whisper on Cloud Run GPU spot
C3 · Cloud Run compute
$5.20
Agents + Whisper warm instance
C4 · Voice gateway
$1.30
Exotel India SIP · 37.5 min
Per employee · per month
$0.13
$6.56 ÷ 50 employees · annualised: $1.56/employee/year
02 — Sensitivity analysis

What happens when scale changes.

The model scales sub-linearly. As employee count grows, the per-employee cost falls — because Cloud Run's fixed warm-instance cost gets amortised across more interactions while the free tiers still absorb most of the data and messaging load.

Scenario A — Micro
15 employees · 150 interactions/month
LLM inference~$0.01
Whisper STT~$0.01
Cloud Run (warm instance dominates)~$3.50
Voice gateway (15 calls)~$0.40
WhatsApp + Data$0.00
Monthly total ~$3.92
$0.26 per employee / month
Scenario B — Baseline ★
50 employees · 500 interactions/month
LLM inference~$0.04
Whisper STT~$0.02
Cloud Run compute~$5.20
Voice gateway (50 calls)~$1.30
WhatsApp + Data$0.00
Monthly total ~$6.56
$0.13 per employee / month
Scenario C — Growth
200 employees · 2,000 interactions/month
LLM inference~$0.29
Whisper STT~$0.08
Cloud Run compute (2 warm instances)~$9.50
Voice gateway (200 calls)~$5.20
WhatsApp + Data (still free tier)$0.00
Monthly total ~$15.07
$0.075 per employee / month

All figures are estimates. Cloud Run billing is metered to the nearest 100ms; actual costs will vary by interaction complexity, agent reasoning depth, and token length. The warm-instance cost dominates at low scale and amortises as volume grows.

03 — Market comparison

What else could you buy for this money.

These are not cherry-picked comparisons. They are the actual options available to a 50-person business in 2026 that needs to manage HR for a workforce that does not sit at a desk.

Solution Monthly cost · 50 employees Employee channel Multi-language Autonomous decisions Deskless-native
The Autonomous HR $6.56 WhatsApp + Voice IVR 200 languages RAG-governed Designed for it
GreytHR SMB $60–100 Web portal + mobile app English only Human HR required Requires smartphone app
Keka Starter $100–150 Web + mobile app English only HR admin required Desktop-first UX
Darwinbox $300+ Web + mobile app Partial · limited Hindi Partial · workflow rules Enterprise-grade complexity
Workday HCM $800–1,500 Web + mobile app Yes · enterprise Partial · rules engine Requires trained HR staff
Part-time HR assistant $400–800 WhatsApp (unstructured) Depends on person Manual, inconsistent Varies
Manual (paper + WhatsApp) $0 + HR staff time WhatsApp (unstructured) Any None Partially
Methodology notes & caveats
All pricing sourced from vendor documentation as of Q1 2026. Cloud pricing changes frequently; treat these figures as directional estimates rather than contractual quotes. Each component links to its primary source above.

Exotel voice rates are not published as a standard per-minute rate. The ₹0.30–0.40/min figure is derived from Exotel's credit-based plan structure (exotel.com/pricing) and third-party pricing analyses. Exact rates require a direct sales quote and vary by call type, destination, and volume tier.

WhatsApp pricing changed materially on 1 July 2025 when Meta shifted from conversation-based to per-message billing. The $0.00 cost for this use case reflects that worker-initiated service messages and utility templates sent within the 24-hour customer service window carry no Meta fee under the new model. Source: Spur — WhatsApp Business API Pricing 2026.

Gemini 1.5 Flash pricing ($0.075/1M input, $0.30/1M output) is sourced from Google Cloud Vertex AI Pricing. Note that Google has released Gemini 2.5 Flash and 3.x models since this architecture was designed; newer Flash variants may offer better price-performance as they reach stable GA status on Vertex AI.

Token estimates (1,000 avg tokens/interaction) are conservative approximations. Simple leave balance queries may consume 200–400 tokens. Complex policy RAG lookups with context may consume 2,000–4,000 tokens. The 1,000 token average assumes a mixed workload typical of a 50-person business.

Currency note: All costs are shown in USD for international comparability. For India deployments, GCP india-south1 pricing and Exotel INR rates apply. The $6.56/month baseline is approximately ₹550/month at current exchange rates — less than the cost of one day of a part-time contract HR assistant.