From the moment a worker sends a WhatsApp voice note to the moment they receive a written confirmation, eight system components fire in sequence. The simulation above shows each one in real time.
01
Worker sends voice note or calls IVR
Speaks naturally in any language. No script, no menu navigation. "I need tomorrow off" in English, Hindi, Telugu, or any of 200 languages Whisper supports.
Channel: WhatsApp or IVR
01
02
Whisper transcribes. NLLB-200 normalises.
Audio is transcribed by Whisper large-v3 (OSS, Cloud Run). Language is detected. Code-switching — Hinglish, Tenglish — is handled natively. Confidence scored.
< 4 seconds
02
03
Gemini Flash classifies intent
The transcript is sent to the Intent Router. Gemini 1.5 Flash identifies: leave request, leave type, dates, employee ID. Routes to the Leave Agent with structured context.
< 1 second
03
04
Firestore queried for employee record
Leave balance retrieved. Team headcount on requested date checked. Prior leave history reviewed. All reads happen in a single Firestore transaction.
< 1 second
04
05
HR Policy PDF consulted via RAG
The Leave Agent queries the pgvector RAG store. The governing policy clause is retrieved with a confidence score. All conditions evaluated against the document.
< 2 seconds
05
06
Decision made — or escalated
If confidence > 0.8 and policy satisfied: auto-approve or auto-deny. If confidence < 0.8 or a threshold is breached: HITL path fires, owner notified via WhatsApp.
< 1 second
06
07
Audit log written — before notification
Immutable Firestore record written: timestamp, employee ID, decision, policy clause cited, confidence score, decision path (AI or HUMAN). Written before any outbound message.
Synchronous write
07
08
Confirmation sent in worker's language
WhatsApp message in the detected language. Approval or denial with the policy reason. Remaining leave balance included. The worker receives this within 60 seconds of their original message.
< 2 seconds
08