The agent tool manifests on Page 05 reference five ML models by name. This page designs every one of them — feature engineering, model architecture, SHAP explanation contracts, Vertex AI Pipelines DAG, drift detection, and full Model Cards. Every model satisfies EU AI Act Article 11 documentation requirements before it ships.
All five AE models share a common Vertex AI platform — one Feature Store, one Model Registry, one Pipelines infrastructure, one monitoring stack. The shared platform means that MLOps patterns proven on RevRec AI are inherited by every subsequent model. The HITL-11 promotion checkpoint is a platform-level gate, not a model-specific configuration.
The Vertex AI Feature Store is the single source of truth for all ML features across the AE. Features are computed once and served to all models that need them — no duplicated feature logic, no inconsistency between training and serving. Every feature value carries a lineage tag: source system, ingestion timestamp, schema version, and quality score.
Each model specification covers the problem framing, feature inputs, architecture choice, training data, evaluation metrics, and SHAP explanation contract. Architecture choices are cross-referenced to ADRs — no undocumented decisions.
These are the gaps a principal ML engineer probes in a design review. Each decision below is documented because the absence of it looks like an oversight. None of these are afterthoughts — they shaped the design from the start.
The Vertex AI Pipelines DAG for RevRec AI is the canonical MLOps pipeline for the AE. Every other model's pipeline follows the same structure with model-specific steps. The HITL-11 promotion gate is the step that makes this pipeline EU AI Act-compliant — no model version reaches production without a human reviewer approving the Model Card diff and evaluation results.
Model degradation in production is not a monitoring problem — it is an architecture problem. Drift detection is designed into the platform from day one: three types of drift, each with a detection method, an alert threshold, and a response that routes through HITL before any automated action executes.
Every AE model has a full Model Card — created before training begins, updated with actual evaluation results before promotion, and versioned alongside the model in Vertex AI Model Registry. The Model Card is the primary input to HITL-11 and the evidence package for EU AI Act Article 11 compliance. Full cards for all five models are shown below.
| Primary users | Finance Controller · CFO (via HITL-04 queue) |
| Deployment environment | ClaraVis GCP project · europe-west3 · VPC-SC perimeter |
| Data residency | All inference data stays in EU boundary. CMEK encryption. |
| Training period | 2019-01 to 2025-12 |
| Records | 4,800 contracts · 18 features per record |
| Label source | Finance team manual classification + HITL override history |
| Known gaps | Limited data for contract values above €5M. Performance degrades at upper tail. |
| Weighted F1 | 0.94 (test set) |
| MULTI-ELEMENT Recall | 0.91 |
| MULTI-ELEMENT Precision | 0.89 |
| Expected Calibration Error | 0.032 (well-calibrated) |
| Baseline comparison | +0.03 F1 improvement over v2.0 |
| HITL override rate (30-day) | 8.2% (within threshold) |
| Hospital Tier 1–3 | F1 = 0.94–0.96 · No disparity |
| Hospital Tier 4 | F1 = 0.88 · Flagged for monitoring |
| EU contracts | F1 = 0.95 |
| Non-EU contracts | F1 = 0.89 · Limited training data |
| EU AI Act Art. 11 | ✓ Technical documentation complete |
| EU AI Act Art. 13 | ✓ Transparency — SHAP per inference |
| EU AI Act Art. 14 | ✓ HITL-04 mandatory for all classifications |
| HITL-11 approval | ✓ Approved 2026-02-14 by ML Lead |
| Intended use | Decision-support for planned maintenance scheduling. Work orders above 0.82 confidence created autonomously. Below threshold: HITL-06. |
| Training data | 3yr telemetry · 8,400 unit-quarters · actual failure events as labels · censored survival data handled |
| MAE | 4.2 days · Recall@14d: 0.91 |
| Key limitation | Trained on MRI-7T and MRI-3T variants. CT-Premium performance lower (F1 = 0.84). Separate model in development. |
| Bias analysis | No significant regional performance disparity. Older units (age > 8yr) show lower recall (0.85) — flagged. |
| EU AI Act | Art. 11 ✓ · Art. 13 ✓ (SHAP sensor attribution) · Art. 14 ✓ (HITL-06) · HITL-11 approved 2026-01-22 |
| Intended use | Unit-level anomaly detection for early warning. Fleet anomaly patterns (≥3 units) trigger HITL-07 to VP Field Service. |
| Training data | 18 months normal operation · 6,200 unit-months · unsupervised (no failure labels required) · contamination: 0.05 |
| [email protected] | 0.82 · False Positive Rate: 0.04 |
| Key limitation | New sensor types from MRI-7T Gen 2 units not in training data. Alert threshold raised to 0.80 for Gen 2 units pending data collection. |
| Bias analysis | EMEA-North performance (FPR 0.03) vs APAC-East (FPR 0.07) — climate-driven sensor baseline differences. Regional baselines in roadmap. |
| EU AI Act | Art. 11 ✓ · Art. 13 ✓ (SHAP sensor) · Art. 14 ✓ (HITL-06/07) · HITL-11 approved 2026-01-22 |
| Intended use | Clause-level risk pre-screening to prioritise Legal review. Clauses above 0.65 route to HITL-02. Does not replace legal judgment. |
| Training data | 12,400 labelled clauses · 4,800 contracts · Legal team labels · HITL decision history · Gemini text-embedding-004 semantic features |
| High-Risk Recall | 0.95 · AUC-ROC: 0.96 · FNR: 0.05 |
| Key limitation | Limited training data for emerging AI-specific contract clauses (IP ownership of AI outputs, data training rights). Performance lower on these clause types. |
| Bias analysis | No significant disparity across counterparty type. Non-English contracts (via Gemini translation): precision 0.78 vs 0.82 English. Flagged. |
| EU AI Act | Art. 11 ✓ · Art. 13 ✓ (SHAP clause features) · Art. 14 ✓ (HITL-02/03) · HITL-11 approved 2026-01-30 |
| Intended use | Real-time financial anomaly pre-screening. Medium alerts (≥0.65): FC notification. High severity (≥0.85): HITL-08 simultaneous CFO + FC. Never acts autonomously on financial events. |
| Training data | 24 months transactions · 48,000 events · unsupervised · 3 separate models per event_type · contamination: 0.03 |
| HITL [email protected] | 0.91 · False Positive Rate @ 0.85: 0.03 |
| Key limitation | Trained on standard ClaraVis payment patterns. First year in a new market will have elevated false positive rate until baseline accumulates 90-day history. |
| Bias analysis | Performance consistent across account risk tiers 1–3. Tier 4 (small clinics with irregular payment patterns): FPR 0.08 vs overall 0.03. Separate baseline for Tier 4 in roadmap. |
| EU AI Act | Art. 11 ✓ · Art. 13 ✓ (SHAP financial features) · Art. 14 ✓ (HITL-08) · HITL-11 approved 2026-02-03 |
ADR-010 through ADR-012 cover the key ML architecture choices. Each decision was made after evaluating alternatives — the reasoning is documented here because it is the reasoning that a principal ML engineer will probe.
The ML models on this page require a production-grade GCP infrastructure to run on. Page 07 designs that infrastructure — the Terraform IaC, VPC-SC security perimeter, GKE and Cloud Run topology, CI/CD pipeline, FinOps cost allocation, and GreenOps carbon-aware scheduling that make the entire AE system deployable, auditable, and operationally sound.