ADR

The decisions that
shaped the architecture.

Every significant architecture decision in this system is documented here in full ADR format: context, decision, alternatives considered, consequences (positive and negative), and the hardest rebuttals a design review would surface. The reasoning is the design.

ADR-001 WhatsApp + IVR as sole employee channels Accepted ADR-002 Firestore over relational DB for primary storage Accepted ADR-003 LangGraph over Vertex AI Agent Builder Accepted ADR-004 Pub/Sub event fabric over direct service calls Accepted ADR-005 Cloud Run over GKE for all compute Accepted ADR-006 Exotel over Twilio for voice gateway Accepted ADR-007 HITL timeout policy: 24h auto-deny Accepted ADR-008 Mobile number as sole employee identity Accepted
ADR-001 Accepted Channel architecture 2026-Q1
WhatsApp + IVR as sole employee channels
The choice of channel is not a UX decision — it is the most important architectural decision in the system. The wrong channel choice makes every subsequent component irrelevant: a system that workers will not use is not a system. This ADR establishes that no new channel, app, or login will ever be introduced for employee-facing interactions.
Decision
All employee-facing interactions will be delivered exclusively via WhatsApp Business API (text and voice notes) and IVR voice calls (for workers who prefer calling over messaging). No mobile app will be built. No web portal will be offered. No SMS channel will be maintained.

This is not a cost decision. It is a user adoption decision. WhatsApp penetration among the target workforce — manufacturing, retail, construction workers across South and Southeast Asia, Africa, and Latin America — exceeds 90% in most markets. Workers already use WhatsApp for family communication, digital payments, and business messaging. The behavioural change required is zero.
Alternatives considered
React Native mobile app. Rejected. Keka — an established Indian HR platform — achieved 11/52 employee adoption at Rathi Textiles before being abandoned in six weeks. An app requires installation, onboarding, and sustained engagement. For workers who have never used a business app, the activation energy is prohibitive.

USSD. Rejected. Developer access to USSD in most target markets (India, Southeast Asia) has been deprecated or restricted by carriers. No conversational state. Cannot handle document delivery.

SMS. Rejected. One-way broadcast only. Cannot handle voice. Cannot deliver documents or payslips. No conversational state for multi-turn interactions like onboarding.
Positive consequences
Zero onboarding friction — workers need no training to use the system
98% message open rate vs 21% for email
Voice note support handles low-literacy workforce natively
Meta free tier covers entire SMB interaction volume
Negative consequences
System is dependent on Meta's WhatsApp Business API policies — changes can affect cost or capability
No web fallback if worker loses phone or changes number
WhatsApp not available in China — system cannot serve Chinese-market deployments without channel adaptation
Mitigations
Channel layer is abstracted behind a gateway interface — substituting a regional messaging platform (WeChat, Telegram) requires only a gateway swap
IVR voice channel provides a non-WhatsApp fallback for any worker
Number change handled via onboarding re-registration flow
Design review pushbacks
"WhatsApp has had multiple policy changes that broke business automation — what's your continuity plan?"
The channel gateway is provider-agnostic by design. The interface the rest of the system sees is send_message(worker_id, text, language) and receive_message(sender_id, content). Swapping the underlying provider is a gateway module replacement — no agent code, no state machine, no Firestore schema changes. The IVR channel provides full feature parity as a fallback. Migration timeline to an alternative channel: estimated 2 weeks of engineering.
"What happens when Meta's pricing changes make WhatsApp uneconomical?"
Meta's July 2025 pricing shift from conversation-based to per-message billing was the most significant change in the platform's business history — and it reduced cost for this use case. Worker-initiated service messages carry zero Meta fee. The economics of this architecture are structurally aligned with Meta's incentive to drive business adoption, not against it.
ADR-002 Accepted Data architecture 2026-Q1
Firestore over relational database
The data model for an HR system is not inherently relational. The primary entities — employees, leave requests, audit log entries — have variable schema (different leave types carry different fields), are accessed predominantly by document ID, and are written significantly more than they are queried in aggregate. These characteristics favour a document store.
Decision
Use Firestore in Native mode as the primary operational database for all transactional data: employee records, leave requests, HITL queue, conversation sessions, and audit log.

Key reasons: (1) Serverless with zero idle cost — no instance to keep warm, no connection pool to manage. (2) Free tier covers SMB scale entirely — 50K reads/day and 20K writes/day free; our estimated daily volume is 600 reads and 170 writes. (3) Native security rules allow the append-only audit log constraint to be enforced at the database layer, not the application layer. (4) Real-time listeners support the HITL queue without polling.
Alternatives considered
Cloud SQL (Postgres). Rejected at SMB scale. Cloud SQL minimum instance: ~$7/month always-on even with smallest tier. Schema migrations require downtime. Connection pooling adds operational complexity. For aggregate reporting (payroll calculations, leave summaries), the relational model is beneficial — but these queries are infrequent and can be satisfied by Firestore's query capabilities at SMB scale.

AlloyDB / Cloud Spanner. Rejected. Engineered for enterprise-scale transactional workloads. Minimum cost far exceeds the total infrastructure budget for this system.

Supabase Postgres (primary DB). Considered but rejected for transactional data. Supabase is used for pgvector (appropriate) but introducing a second Postgres instance as the primary DB adds an external dependency for the most critical data path.
Positive consequences
Zero idle cost — serverless billing aligns with SMB interaction patterns
Append-only audit log enforced at DB layer via security rules
Free tier covers entire Phase 1 workload
Real-time HITL queue updates without polling
Negative consequences
No SQL joins — payroll aggregation in Phase 3 requires application-level data assembly
Firestore query limitations (no inequality filters on multiple fields) require composite index planning
At 500+ employees, complex reporting may warrant a BigQuery export pipeline
Migration path
Phase 3 payroll: Firestore → Cloud Functions → BigQuery export for aggregate calculations
At 500+ employees: evaluate Cloud SQL addition for reporting layer only (CQRS pattern)
Audit log stays in Firestore permanently — the append-only guarantee is non-negotiable
Design review pushbacks
"Payroll requires complex aggregations — Firestore is not built for that. You'll have to rewrite the data layer in Phase 3."
Correct — and accounted for. Payroll calculations are not done in Firestore. Attendance records are read from Firestore, piped through a Cloud Function that performs the calculations in memory, and the result is written back as a payslip document. For a 52-person business, this is a single Cloud Function execution that completes in under 5 seconds. At 500+ employees, a BigQuery export layer is added for aggregate queries — the Firestore schema does not change.
"Labour law requires data to be queryable for audits — can Firestore satisfy a labour inspector's data request?"
Yes. The audit log and leave register are fully queryable by employee ID, date range, and decision outcome. A labour inspection export is a Cloud Function that queries Firestore and generates a formatted PDF or Excel file. This is documented in the use case catalogue as a Phase 4 capability. The data model is specifically designed with statutory reporting fields (clause citations, decision timestamps, employee IDs) to support this export.
ADR-003 Accepted Agent orchestration 2026-Q1
LangGraph over Vertex AI Agent Builder
Agent orchestration is where the most complex logic in the system lives — state machines, tool calling, HITL routing, multi-turn conversation state. The choice of orchestration framework determines how much control the architecture has over every agent decision, and how auditable those decisions are.
Decision
Use LangGraph (Python, OSS, by LangChain) for all agent state machine orchestration. LangGraph provides explicit graph-based state machines where every node, edge, and transition condition is defined in code — not inferred by a managed service.

This gives the architecture complete control over the autonomy boundary: the HITL trigger on confidence < 0.80 is a graph edge condition evaluated in Python, not a prompt instruction to the LLM. The distinction matters enormously: a prompt instruction can be argued around by the model; a graph edge condition cannot.
Alternatives considered
Vertex AI Agent Builder. Google's managed agent platform. Provides built-in tool integration, managed state, and native GCP service bindings. Simpler to set up than LangGraph for standard use cases.

Rejected because: (1) The autonomy boundary cannot be hard-coded in Vertex AI Agent Builder — it is a prompt-level instruction, not a state machine constraint. For a system making employment decisions, this is unacceptable. (2) Vertex AI Agent Builder costs money per conversation — at SMB scale, comparable to Cloud Run costs but with less control. (3) State machine visibility for audit purposes is limited — LangGraph's explicit graph definition means every state transition is inspectable and loggable.

Google ADK (Agent Development Kit). Considered. ADK is more structured than Vertex AI Agent Builder but still abstracts the state machine. LangGraph gives lower-level control for the same Python effort.
Positive consequences
HITL trigger is a graph edge condition — cannot be overridden by LLM reasoning
Every state transition is logged — full agent reasoning audit trail
Zero managed agent cost — runs in existing Cloud Run containers
State machine definition is version-controlled alongside application code
Negative consequences
More code to maintain than a managed agent platform
LangGraph is an OSS project — breaking changes require active version management
No built-in GCP service bindings — each tool integration is manually implemented
Mitigations
LangGraph version pinned in requirements.txt — upgrades are deliberate, tested events
Tool implementations are thin wrappers around GCP SDK calls — minimal custom code per tool
Migration to ADK or Vertex AI Agent Builder possible if LangGraph is abandoned — state machine logic maps directly
Design review pushbacks
"Google ADK is the canonical GCP agent framework — why not use it? You're already committed to GCP."
ADK is an excellent framework for agent tool orchestration. It does not provide the explicit state machine semantics that this architecture requires. The HITL trigger — confidence < 0.80 fires an unconditional transition to HITL_PENDING — is a hard architectural constraint, not a soft agent behaviour. LangGraph's graph model makes this constraint explicit in code. ADK would implement it as a tool call condition or a prompt instruction. For an employment decision system, the difference matters legally.
"LangGraph is a startup-era OSS project. What happens if LangChain Inc. pivots or discontinues it?"
LangGraph is MIT-licensed. The codebase is forkable and the state machine primitives (nodes, edges, conditional routing, persistent state via checkpointers) are stable, well-documented abstractions. If LangChain discontinued LangGraph tomorrow, the existing code continues to work unchanged. The migration surface to an alternative (ADK, custom state machine) is the ~200 lines of graph definition code per agent — not the tool implementations, not the Firestore schema, not the Pub/Sub events.
ADR-004 Accepted Integration architecture 2026-Q1
Pub/Sub event fabric over direct service calls
In a multi-agent system, services need to communicate. The communication pattern determines whether the system is brittle (synchronous, tightly coupled) or resilient (asynchronous, independently deployable). For a system processing voice calls and WhatsApp messages, where individual steps have variable latency, asynchronous decoupling is not optional.
Decision
Use Cloud Pub/Sub as the event fabric connecting all services. Every pipeline stage communicates by publishing to a topic and subscribing to topics — never by direct HTTP call between services.

Five topics: INBOUND_MSGSTT_RESULTAGENT_EVENTHITL_ESCALATIONNOTIFICATION_OUT. Each topic has a dead-letter queue. Message retention is 7 days for replay on failure.

The gateway is the only service with a public HTTPS endpoint. Every other service is internal-only, reachable only via Pub/Sub push subscriptions.
Alternatives considered
Direct HTTP between services. Simplest approach. Each service calls the next directly via Cloud Run internal URL. No Pub/Sub infrastructure. Easier to reason about synchronously.

Rejected because: (1) A direct HTTP call from the webhook gateway to the STT service means the gateway waits 4 seconds for transcription to complete before responding to Meta's webhook. Meta requires a 200 response within 5 seconds — tight, and breaks immediately if STT is under load. (2) A failed STT call fails the entire pipeline with no retry mechanism. (3) Tightly coupled services cannot be deployed or scaled independently.

Eventarc. GCP's managed event routing. Built on Pub/Sub under the hood. Adds trigger configuration complexity with no meaningful benefit over direct Pub/Sub for this use case.
Positive consequences
Webhook gateway responds to Meta in < 200ms — processing happens asynchronously downstream
Dead-letter queue provides automatic retry on any service failure
7-day message retention enables replay for debugging and recovery
Services can be deployed, scaled, and updated independently
Negative consequences
Distributed tracing is more complex — Cloud Trace correlation IDs must be propagated in message attributes
Local development requires a Pub/Sub emulator or test doubles
Message ordering not guaranteed by default — requires ordered delivery configuration for session-sensitive flows
Mitigations
Trace ID injected as Pub/Sub message attribute on every event — Cloud Trace correlates across topics
Pub/Sub emulator configured in docker-compose for local development
Ordered delivery enabled on session-sensitive subscriptions (leave agent conversation state)
Design review pushbacks
"Adding Pub/Sub to a simple request/response flow is over-engineering for an SMB HR system."
The flow is not simple request/response. It is: receive audio → transcribe (2–4s) → classify intent (1s) → query DB (0.6s) → query RAG (1.3s) → decide → write → notify. Total: 6–11 seconds of processing behind a Meta webhook that requires a 200 response in 5 seconds. Pub/Sub is not over-engineering — it is the minimum viable architecture for meeting the channel's SLA. A direct HTTP chain would fail Meta's webhook timeout on any interaction involving STT.
"Message duplication in Pub/Sub could cause duplicate leave approvals."
Idempotency is implemented at the Leave Agent layer. Every leave request is keyed by employee_id + date + leave_type. Before writing a new leave record, the agent checks for an existing record with the same key. A duplicate Pub/Sub delivery triggers a duplicate check — the second invocation finds the existing record and returns the already-committed decision without creating a new one. The audit log entry for the duplicate is flagged as a replay.
ADR-005 Accepted Compute 2026-Q1
Cloud Run over GKE for all compute
The choice between Cloud Run and GKE is fundamentally a question of operational complexity vs. control. At SMB scale, the right answer is not always the most powerful one.
Decision
Use Cloud Run for all compute — application containers, STT service, agent containers. No GKE cluster.

Cloud Run provides: scale-to-zero (zero idle cost), managed TLS, automatic scaling, built-in Pub/Sub push subscription support, and 99.95% SLA. The entire infrastructure is managed — no node pools, no pod disruption budgets, no cluster upgrades to manage.

At a 50-employee business generating 500 interactions/month, an always-on GKE cluster would cost more per month than the entire rest of the infrastructure combined.
Alternatives considered
GKE Autopilot. Managed Kubernetes. Provides more control over networking, sidecar injection, resource limits, and multi-container pod patterns. Required for workloads with GPU node pool requirements at scale, custom networking, or service mesh (Istio).

Rejected at current scale because: GKE Autopilot minimum cost ~$74/month for the control plane alone — exceeding our total infrastructure budget. No GPU node pools available in Autopilot (required for Whisper at scale, though T4 spot on Cloud Run covers SMB).

GKE Standard. Even more expensive. Full cluster management burden. Not appropriate for SMB scale.

Cloud Run functions (gen 2). Considered for notification dispatcher and audit log writer. Selected over standard Cloud Run for these lightweight, event-triggered functions due to simpler deployment model.
Positive consequences
Zero idle cost — scale-to-zero eliminates the largest infrastructure expense
No cluster management — no node upgrades, no pod scheduling, no etcd backups
99.95% SLA with zero operational overhead
GPU T4 spot instances available on Cloud Run for Whisper
Negative consequences
Cold start latency (mitigated by min-instance config on critical services)
No sidecar containers — each service must bundle all dependencies
At 500+ employees with sustained load, GKE may become more cost-effective
Migration trigger to GKE
When sustained concurrent requests exceed Cloud Run's concurrency-per-instance limits
When Whisper requires dedicated GPU node pool (not available on Cloud Run spot)
When monthly Cloud Run cost exceeds GKE Autopilot TCO — estimated at 300+ employees
Design review pushbacks
"Cloud Run cold starts on the GPU STT service will cause unacceptable IVR latency."
The STT service has min-instances: 1 configured in Terraform. One warm instance is kept always-on at a cost of approximately $3.20/month (1 vCPU + 16Gi memory idle rate on Cloud Run). This eliminates cold starts entirely on the critical IVR path. The 10-second GPU cold start only occurs if the min-instance is recycled — Cloud Run keeps min-instances warm indefinitely. P99 STT latency with min-instance warm: consistently < 4 seconds.
"A GCP customer already running GKE would expect this to run on their existing cluster."
Valid consideration for an enterprise deployment. The containerised architecture is cluster-agnostic. All services are standard Docker containers. Deploying to a customer's GKE cluster requires Helm chart definitions for each service (Deployment, Service, HPA) — approximately 2 days of work per service. The Terraform modules would be replaced with Helm charts. Application code is unchanged. This migration path is explicitly documented in the deployment guide.
ADR-006 Accepted Voice infrastructure 2026-Q1
Exotel over Twilio for voice gateway
Voice call infrastructure is the highest per-interaction cost component in markets where IVR is heavily used. For a system deployed in India and other South/Southeast Asian markets, the choice of voice provider has a significant impact on per-minute cost and call quality.
Decision
Use Exotel (India) or Plivo as the IVR voice gateway for India and Southeast Asia deployments. For other regions, substitute with the equivalent local SIP provider.

Exotel operates India-local SIP infrastructure with data centres in Mumbai and Chennai. Per-minute rate: approximately ₹0.30–0.40/min (~$0.004/min) vs Twilio's $0.013/min — a 68% cost reduction on voice calls. Exotel routes over the local PSTN, reducing latency for Indian callers. The IVR layer in the architecture is provider-agnostic — substituting Exotel with another SIP provider requires only a gateway configuration change.
Alternatives considered
Twilio Voice. The default choice for programmable voice. Excellent documentation, global coverage, reliable SLA, strong developer experience. Used by the majority of voice-AI startups globally.

Rejected for India-primary deployments because: (1) US-billed international leg adds latency and cost. (2) $0.013/min vs ₹0.35/min — Twilio costs 3× more for the same call. (3) No India data centre — call audio routes internationally before reaching the STT service, adding 80–120ms round-trip latency.

BSNL SIP Trunk. Considered. Cheapest possible option for India. Rejected: requires physical SIP hardware configuration, no API management interface, no programmable IVR capability without additional infrastructure.
Positive consequences
68% cost reduction on voice calls vs Twilio
India-local PSTN routing — lower latency for Indian callers
TRAI compliance handled natively by Exotel
Local DID numbers across all Indian states
Negative consequences
Exotel pricing not publicly documented as standard per-minute rate — requires sales engagement for exact pricing
Less mature international coverage — deployments outside India, SEA, MEA require provider substitution
Smaller developer ecosystem than Twilio — fewer third-party integrations
Provider substitution by region
Africa: Africa's Talking (Kenya, Nigeria) — local PSTN, competitive rates
Southeast Asia: Plivo with Singapore routing
Latin America: Twilio (no viable local alternative at comparable quality)
Europe/US: Twilio — cost differential less significant, developer ecosystem value higher
Design review pushbacks
"Exotel doesn't publish per-minute rates — how can you claim 68% savings if the pricing isn't verifiable?"
This is a fair challenge and is explicitly acknowledged in the cost methodology page of this portfolio. The ₹0.30–0.40/min estimate is derived from Exotel's credit-based plan structure and third-party comparisons, not from a published rate card. The cost methodology page documents this caveat and directs readers to request a direct quote from Exotel for production pricing. The 68% savings claim is directional — exact savings depend on call volume tier and negotiated rates.
"For a global product, defaulting to an India-only provider creates a two-tier architecture."
The architecture is explicitly designed for this. The IVR gateway is a replaceable module behind a provider-agnostic interface: initiate_call(number), stream_audio(session_id), end_call(session_id). Swapping Exotel for Twilio for a non-India deployment is a gateway config change — one environment variable and one provider SDK import. The cost methodology page shows region-specific provider recommendations. A global enterprise deployment would use Twilio globally and Exotel for India — the architecture supports both simultaneously.
ADR-007 Accepted HITL policy 2026-Q1
HITL timeout: 24-hour auto-deny
When the system escalates to the owner and the owner does not respond, the system must eventually resolve the escalation. The timeout policy is not a technical decision — it is a policy decision with legal consequences. It must be configurable, documented, and its default must reflect the least harm to the worker.
Decision
If an owner does not respond to a HITL escalation within 24 hours, the leave request is automatically denied and the worker is notified with an explanation that the request could not be processed and they should contact the owner directly.

At 4 hours, a re-escalation with an URGENT flag is sent. The 24-hour and 4-hour thresholds are configurable parameters in the HR Policy PDF — an owner can set different values in their policy document, and the system respects those values via the RAG configuration parser.

The timeout behaviour is written to the audit log: decision_by: TIMEOUT, reason: "Owner did not respond within configured timeout period".
Alternatives considered
Auto-approve on timeout. Argued that an unanswered escalation implies the owner has no objection. Rejected: auto-approval on a silence creates a legal liability. If a worker is approved for leave during a period where coverage falls below statutory minimums, and the approval was granted by a system timeout, the employer has no defence. Silence is not consent in employment law.

Keep escalation open indefinitely. No timeout. The worker's request remains pending until the owner responds. Rejected: a worker waiting indefinitely for a leave decision is a worse employee experience than a denial with a clear explanation and an instruction to escalate directly. Indefinite pending also breaks the system's SLA guarantee.

48-hour timeout. Considered. 24 hours is sufficient for an owner to respond to a WhatsApp message. 48 hours introduces a weekend problem — a Friday escalation would auto-deny on Sunday, which may not be appropriate. 24 hours is both more responsive and more consistent.
Positive consequences
Worker receives a definitive response within 24 hours in all cases
System SLA is upheld — no interaction is left permanently unresolved
Audit log records the timeout as a decision event — legally defensible
Negative consequences
A worker may be denied leave they were entitled to because the owner was unreachable
Weekend timing means a Friday evening escalation auto-denies Saturday evening
Configurable threshold means policy document must be clear — a poorly written policy could set timeout to 1 hour
Mitigations
Denial message instructs worker to contact owner directly — the denial is not final if the owner intervenes
Owner can reverse a timeout denial via the dashboard with a manual approval — full audit trail preserved
Sample HR Policy PDF sets 24-hour default with explanation of implications
Design review pushbacks
"Auto-denying a leave request because the owner didn't check WhatsApp is unfair to the worker."
The fairness question is real. The alternative — leaving a request pending indefinitely — is worse for the worker, not better. A worker who needs to plan around a leave decision cannot plan around a pending status. A denial with a clear explanation ("the system could not reach your employer within the required window — please contact them directly") gives the worker agency. An indefinite pending status gives them nothing. The owner retains the ability to reverse the denial at any point after the fact.
"This should be configurable per worker type, not a global setting."
It is configurable — via the HR Policy PDF. The policy document can specify different timeout thresholds for different leave types or worker categories. The RAG configuration parser extracts these values at indexing time. A policy that says "HITL timeout for casual leave: 12 hours; for sick leave: 4 hours; for emergency leave: 1 hour" is fully supported. The 24-hour global default in the sample policy is a reasonable starting point that the employer can adjust.
ADR-008 Accepted Identity & security 2026-Q1
Mobile number as sole employee identity
For a workforce with no company email addresses and no corporate device access, the conventional identity stack (SSO, LDAP, SAML, JWT) is not viable. The identity mechanism must be something the worker already has, already trusts, and cannot lose or forget.
Decision
The WhatsApp sender phone number, verified by Meta's API on every inbound message, is the employee's identity. The IVR caller ID, verified by Exotel, is the identity on voice calls. No username, no password, no app login, no OTP is required or offered.

On onboarding, the worker sends one WhatsApp message to register. Their phone number is stored in Firestore /employees/{id} as the mobile field. Every subsequent interaction is authenticated by matching the sender number against registered employees. The telco and Meta have already performed KYC on this number — the system inherits that verification.
Alternatives considered
WhatsApp OTP verification. Send a one-time passcode to the worker's number for each session. Adds a second factor. Rejected: the phone number IS the second factor. A WhatsApp OTP sent to the same phone number adds no security — an attacker with access to the phone can also receive the OTP. It adds friction without adding security for this threat model.

Employee ID + PIN. Worker provides their employee ID and a 4-digit PIN. Rejected: a manufacturing worker who forgets their PIN has no self-service reset mechanism. Support overhead on PIN resets would be significant. The threat model for a 52-person textile business does not justify this friction.

Biometric via smartphone. Fingerprint or face unlock before WhatsApp session. Rejected: Android fragmentation makes biometric API reliability inconsistent across the low-to-mid-range devices used by this workforce.
Positive consequences
Zero authentication friction — worker sends a message, system knows who they are
No password resets, no locked accounts, no support overhead
Meta and telco KYC inherited — number is tied to a verified identity
Number change handled via a simple re-registration message to the owner
Negative consequences
SIM swap attack: attacker obtains worker's number via telco — can impersonate worker
Shared phone: if a worker shares their phone, household member could apply for leave
Number change not self-service — requires owner to update the employee record
Risk acceptance
SIM swap accepted: affects < 0.01% of interactions at SMB scale; mitigated by audit log and HITL for suspicious patterns
Shared phone accepted: the same household member who could apply for leave could also walk into the factory and tell the supervisor — not a novel attack vector
Number change: documented process in the owner onboarding guide
Design review pushbacks
"SIM swap is a real and growing attack vector in India. This is not an acceptable risk for a payroll system."
The risk is real but the threat model is calibrated. SIM swap attacks target financial accounts with large balances, not HR systems at 52-person textile businesses. The incentive structure for a sophisticated SIM swap attack to falsely approve leave at Rathi Textiles is essentially zero. The audit trail records every decision with a timestamp and the sender number — an anomalous pattern (multiple leave applications in rapid succession, unusual dates) triggers HITL review. For Phase 3 payroll, the threat model is reassessed — payroll disbursement may warrant stronger identity controls for the payment trigger specifically.
"GDPR and DPDP require explicit consent for processing personal data — does a phone number registration constitute that?"
Yes, with proper implementation. The onboarding flow presents a consent message in the worker's detected language before their number is registered. The consent records the purpose of data processing (HR management, leave tracking, payroll), the data categories collected (mobile number, leave records), and the retention period. This consent record is stored in Firestore alongside the employee record. The sample HR Policy PDF includes a data processing notice that satisfies DPDP Act 2023 requirements for employer-employee HR data.