Page 09 · ML Engineering & MLOps

Machine Learning
Infrastructure & Models

Three purpose-built models operationalize the finance intelligence layer — anomaly detection, cash forecasting, and invoice classification — unified by a single MLOps spine: continuous retraining, multi-signal concept-drift monitoring, and HITL override rates as one of several early-warning signals of model health.

Models 3 production

Platform Vertex AI

Retraining Weekly · GreenOps

Compliance EU AI Act Art. 13

System Architecture

End-to-End ML Pipeline

Data flows from source systems through the Feature Store to three independent model inference paths, each gated by HITL queues before downstream automation.

← scroll to view full diagram →

Data flows from SAP/TMS/External sources through M-08 Data Governance and the Vertex AI Feature Store into three parallel model paths (IC Anomaly, Cash Forecast, Invoice Classifier). Each model is gated by a HITL review queue before automation. Override rates and PSI signals loop back to trigger retraining.

Model Specifications

Three Production Models

Each model card documents architecture, features, thresholds, performance metrics, explainability method, and drift signals — stored alongside the model artifact in the Registry.

Model 01 · Anomaly Detection

IC Anomaly Detector

Two-Stage: IsoForest → XGBoost

Two-stage architecture: IsolationForest (unsupervised) generates an outlier score for every IC transaction pair. Pairs scoring above a pre-filter threshold (0.50) are passed as candidates to a supervised XGBoost classifier, trained on verified mismatch labels from prior audit cycles. The final anomaly_score is the XGBoost probability output — IsoForest acts as a computational gate, not a voting member. This separation preserves label-free coverage while leveraging supervision where labels exist.

Inference Flow

Task

Score each intercompany transaction pair for mismatch probability, routing high-confidence anomalies to the HITL-IC-01 queue before any automated posting.

Training Data

36 months of IC transaction history (Veldtmann Group + anonymised industry comparables). Positive labels: verified mismatch cases from prior audit cycles.

Input Features

entity_pair_id amount_vs_agreement_rate posting_timing_delta prior_mismatch_count currency_pair period_in_cycle

Performance Metrics

Precision @ 0.72 0.91

Recall @ 0.72 0.87

AUC-PR (held-out) 0.94 · 6-month holdout

Anomaly threshold anomaly_score > 0.72 → HITL-IC-01

Drift signals PSI on features · HITL override rate (7-day) · prediction score dist.

Explainability TreeExplainer · deterministic SHAP · EU AI Act Art. 13

Model 02 · Time Series Forecasting

Cash Forecast Model

LightGBM + Prophet Ensemble

Ensemble Architecture & Forecast Flow

Task

Produce a 7-day rolling cash position forecast per legal entity and currency, with calibrated 90% prediction intervals to support liquidity management decisions.

Ensemble Design

LightGBM captures structural signals (scheduled payments, IC sweeps, FX moves). Prophet contributes calendar seasonality. Outputs are blended with rolling weight optimisation.

Input Features

historical_balance_7d scheduled_payments IC_sweep_history FX_rate_7d_ma payroll_calendar invoice_due_dates

Performance Metrics

Achieved MAPE (7-day) 4.3% · 3-month backtest

PI coverage (90%) 91.2% empirical coverage

HITL trigger — PI width PI width > €1.5M → HITL-CF-02

HITL trigger — confidence Point estimate confidence < 0.75

Drift signals forecast_vs_actual MAPE >8% · PSI on balance features

Model 03 · Multiclass Classification

Invoice Exception Classifier

XGBoost Multiclass

Classification Flow & Output Classes

Task

Classify each incoming invoice into one of five exception categories or APPROVED. High-confidence APPROVED predictions bypass the AP queue; all others receive SHAP explanations for the reviewer.

Output Classes

APPROVED PO_MISMATCH PRICE_VARIANCE MISSING_CC DUPLICATE

Input Features

po_match_score unit_price_variance_pct cost_center_present supplier_invoice_hash supplier_prior_exception_rate amount_band

Performance Metrics

Macro F1 (5-class) 0.89 · 90-day holdout

APPROVED precision 0.97 @ 0.92 threshold

Auto-approve threshold P(APPROVED) > 0.92 → skip queue

Explainability SHAP for every non-APPROVED class · Art. 13

Drift signals Exception rate trend (7d MA) · PSI · APPROVED FP rate via HITL

Shared Infrastructure

MLOps Pipeline

A single operational spine supports all three models — from weekly retraining on GreenOps schedules to per-model drift alerting in BigQuery.

← scroll to view full diagram →

Vertex AI Pipelines

All three models retrain on a weekly cadence via Vertex AI managed pipelines. Job scheduling is GreenOps-aware, with a ±6-hour flex window to target low-carbon compute availability.

Weekly cadence GreenOps ±6h Parallel jobs Flexible batch

Model Registry & Canary Deploy

Every trained artifact is version-pinned in the Vertex Model Registry with a data hash and pipeline run ID. Deployment uses a canary pattern: 10% traffic for 48 hours, then full promotion — or automatic rollback to the pinned champion if the override rate exceeds 1.5× baseline.

Version pinning Canary 10%→100% Auto-rollback Model cards

Multi-Signal Drift Monitoring

Drift is detected via three concurrent signals: Population Stability Index (PSI) on input features (BigQuery scheduled query — triggers retraining if PSI > 0.2 on any key feature), prediction confidence distribution shift (Cloud Monitoring), and HITL override rate as an early behavioral proxy. Rising override rate alone validates PSI before triggering retraining.

PSI on features BigQuery Cloud Monitoring Override rate proxy

Training-Serving Skew Detection

Training-serving skew is validated post-deployment by comparing offline training feature statistics (computed at pipeline compile time and stored in GCS) against live Feature Store serving statistics via TFDV schema checks. Schema violations block the promotion step and trigger an alert.

TFDV schema checks Post-deploy gate Skew alert

Feature Governance

Feature Lineage & Validation

All features are validated by the Data Governance process (M-08) before any write to the Feature Store. Lineage tags link each feature to its source system and governance approval record.

← scroll to view full diagram →

Feature	Source System	Model(s)	Governance	Store Status
entity_pair_id	SAP / Intercompany ledger	IC Anomaly	M-08 · approved	✓ VALIDATED
amount_vs_agreement_rate	IC Agreement Register	IC Anomaly	M-08 · approved	✓ VALIDATED
historical_balance_7d	Treasury TMS	Cash Forecast	M-08 · approved	✓ VALIDATED
FX_rate_7d_ma	ECB / Bloomberg feed	Cash Forecast	M-08 · approved	✓ VALIDATED
IC_sweep_history	Treasury TMS	Cash Forecast	M-08 · approved	✓ VALIDATED
po_match_score	Procurement · ERP PO table	Invoice Classifier	M-08 · approved	✓ VALIDATED
supplier_invoice_hash	AP module · invoice metadata	Invoice Classifier	M-08 · approved	✓ VALIDATED
supplier_prior_exception_rate	AP history · raw exception log 30-day lagged window from raw AP audit log — not from model output table, to prevent feedback loop.	Invoice Classifier	M-08 · approved	✓ VALIDATED
payroll_calendar	HR system · payroll schedule	Cash Forecast	M-08 · approved	✓ VALIDATED
prior_mismatch_count	HITL override log · BigQuery	IC Anomaly	M-08 · approved	✓ VALIDATED

Training-Serving Skew Monitoring

Training feature statistics are computed at pipeline compile time and stored as TFDV schema artefacts in GCS alongside each model version. Post-deployment, these offline statistics are compared against live Feature Store serving statistics on each inference batch. Schema violations — indicating distributional divergence between training and serving environments — block the canary promotion step and raise a Cloud Monitoring alert for investigation before any further traffic shift.

EU AI Act Article 13 compliance: SHAP TreeExplainer is applied deterministically to every IC Anomaly score above the 0.72 threshold and to every non-APPROVED Invoice Classifier output. Explanation payloads are stored in the HITL audit trail alongside the original model inputs, prediction confidence, and reviewer decision — satisfying transparency and logging obligations for high-risk automated financial decisions.

Art. 13 Obligation	Architectural Component	Status
Transparency of operation	SHAP TreeExplainer payloads stored in HITL audit trail (BigQuery)	✓ MET
Human oversight capability	HITL queues HITL-IC-01 / HITL-CF-02 / HITL-INV-03 for all flagged decisions	✓ MET
Logging of operations	All model inputs, outputs, confidence scores, and reviewer decisions logged to ae_finance.hitl_audit	✓ MET
Intended purpose disclosure	System cards stored in Vertex Model Registry per artifact, documenting scope, known limitations, and deployment constraints	✓ MET
Instructions for use (deployers)	Threshold configuration, HITL escalation paths, and override handling documented in runbook stored alongside each Model Registry entry	✓ MET