Page 09 · ML Engineering & MLOps
Three purpose-built models operationalize the finance intelligence layer — anomaly detection, cash forecasting, and invoice classification — unified by a single MLOps spine: continuous retraining, multi-signal concept-drift monitoring, and HITL override rates as one of several early-warning signals of model health.
System Architecture
Data flows from source systems through the Feature Store to three independent model inference paths, each gated by HITL queues before downstream automation.
← scroll to view full diagram →
Data flows from SAP/TMS/External sources through M-08 Data Governance and the Vertex AI Feature Store into three parallel model paths (IC Anomaly, Cash Forecast, Invoice Classifier). Each model is gated by a HITL review queue before automation. Override rates and PSI signals loop back to trigger retraining.
Model Specifications
Each model card documents architecture, features, thresholds, performance metrics, explainability method, and drift signals — stored alongside the model artifact in the Registry.
Model 01 · Anomaly Detection
anomaly_score is the XGBoost probability output — IsoForest acts as a computational gate, not a voting member. This separation preserves label-free coverage while leveraging supervision where labels exist.
Inference Flow
Task
Score each intercompany transaction pair for mismatch probability, routing high-confidence anomalies to the HITL-IC-01 queue before any automated posting.
Training Data
36 months of IC transaction history (Veldtmann Group + anonymised industry comparables). Positive labels: verified mismatch cases from prior audit cycles.
Input Features
Performance Metrics
Model 02 · Time Series Forecasting
Ensemble Architecture & Forecast Flow
Task
Produce a 7-day rolling cash position forecast per legal entity and currency, with calibrated 90% prediction intervals to support liquidity management decisions.
Ensemble Design
LightGBM captures structural signals (scheduled payments, IC sweeps, FX moves). Prophet contributes calendar seasonality. Outputs are blended with rolling weight optimisation.
Input Features
Performance Metrics
Model 03 · Multiclass Classification
Classification Flow & Output Classes
Task
Classify each incoming invoice into one of five exception categories or APPROVED. High-confidence APPROVED predictions bypass the AP queue; all others receive SHAP explanations for the reviewer.
Output Classes
Input Features
Performance Metrics
Shared Infrastructure
A single operational spine supports all three models — from weekly retraining on GreenOps schedules to per-model drift alerting in BigQuery.
← scroll to view full diagram →
Vertex AI Pipelines
All three models retrain on a weekly cadence via Vertex AI managed pipelines. Job scheduling is GreenOps-aware, with a ±6-hour flex window to target low-carbon compute availability.
Model Registry & Canary Deploy
Every trained artifact is version-pinned in the Vertex Model Registry with a data hash and pipeline run ID. Deployment uses a canary pattern: 10% traffic for 48 hours, then full promotion — or automatic rollback to the pinned champion if the override rate exceeds 1.5× baseline.
Multi-Signal Drift Monitoring
Drift is detected via three concurrent signals: Population Stability Index (PSI) on input features (BigQuery scheduled query — triggers retraining if PSI > 0.2 on any key feature), prediction confidence distribution shift (Cloud Monitoring), and HITL override rate as an early behavioral proxy. Rising override rate alone validates PSI before triggering retraining.
Training-Serving Skew Detection
Training-serving skew is validated post-deployment by comparing offline training feature statistics (computed at pipeline compile time and stored in GCS) against live Feature Store serving statistics via TFDV schema checks. Schema violations block the promotion step and trigger an alert.
Feature Governance
All features are validated by the Data Governance process (M-08) before any write to the Feature Store. Lineage tags link each feature to its source system and governance approval record.
← scroll to view full diagram →
Training-Serving Skew Monitoring
Training feature statistics are computed at pipeline compile time and stored as TFDV schema artefacts in GCS alongside each model version. Post-deployment, these offline statistics are compared against live Feature Store serving statistics on each inference batch. Schema violations — indicating distributional divergence between training and serving environments — block the canary promotion step and raise a Cloud Monitoring alert for investigation before any further traffic shift.
EU AI Act Article 13 compliance: SHAP TreeExplainer is applied deterministically to every IC Anomaly score above the 0.72 threshold and to every non-APPROVED Invoice Classifier output. Explanation payloads are stored in the HITL audit trail alongside the original model inputs, prediction confidence, and reviewer decision — satisfying transparency and logging obligations for high-risk automated financial decisions.
| Art. 13 Obligation | Architectural Component | Status |
|---|---|---|
| Transparency of operation | SHAP TreeExplainer payloads stored in HITL audit trail (BigQuery) | ✓ MET |
| Human oversight capability | HITL queues HITL-IC-01 / HITL-CF-02 / HITL-INV-03 for all flagged decisions | ✓ MET |
| Logging of operations | All model inputs, outputs, confidence scores, and reviewer decisions logged to ae_finance.hitl_audit | ✓ MET |
| Intended purpose disclosure | System cards stored in Vertex Model Registry per artifact, documenting scope, known limitations, and deployment constraints | ✓ MET |
| Instructions for use (deployers) | Threshold configuration, HITL escalation paths, and override handling documented in runbook stored alongside each Model Registry entry | ✓ MET |