Page 09 · ML Engineering & MLOps

Machine Learning
Infrastructure & Models

Three purpose-built models operationalize the finance intelligence layer — anomaly detection, cash forecasting, and invoice classification — unified by a single MLOps spine: continuous retraining, multi-signal concept-drift monitoring, and HITL override rates as one of several early-warning signals of model health.

Models 3 production
Platform Vertex AI
Retraining Weekly · GreenOps
Compliance EU AI Act Art. 13

End-to-End ML Pipeline

Data flows from source systems through the Feature Store to three independent model inference paths, each gated by HITL queues before downstream automation.

SOURCE SYSTEMS SAP · NetSuite · TMS EXTERNAL FX Rates · Calendar IC HISTORY 36mo transactions DATA GOV M-08 Schema valid. Lineage tag PII masking TFDV skew check on feature serve FEATURE STORE Vertex AI Online serving Batch export Time-travel IC ANOMALY IsoForest→XGBoost score > 0.72 → HITL-IC-01 CASH FORECAST LightGBM + Prophet 7-day PI · 90% MAPE >8% alert INV. EXCEPTION XGBoost multiclass APPROVED > 0.92 → auto-approve HITL QUEUE HITL-IC-01 HITL-CF-02 HITL-INV-03 OVERRIDE RATE ONE OF SEVERAL DRIFT SIGNALS → BigQuery ae_finance .model_drift → PSI + Retrain trigger AUTOMATION IC Posting Agents TREASURY Liquidity Dashboard AP WORKFLOW Auto-approve/queue VERTEX AI PIPELINES — WEEKLY RETRAINING GreenOps · ±6h carbon window · Model Registry · Version pinning DRIFT FEEDBACK → RETRAINING

← scroll to view full diagram →

Data flows from SAP/TMS/External sources through M-08 Data Governance and the Vertex AI Feature Store into three parallel model paths (IC Anomaly, Cash Forecast, Invoice Classifier). Each model is gated by a HITL review queue before automation. Override rates and PSI signals loop back to trigger retraining.

Three Production Models

Each model card documents architecture, features, thresholds, performance metrics, explainability method, and drift signals — stored alongside the model artifact in the Registry.

Model 01 · Anomaly Detection

IC Anomaly Detector

Two-Stage: IsoForest → XGBoost
Two-stage architecture: IsolationForest (unsupervised) generates an outlier score for every IC transaction pair. Pairs scoring above a pre-filter threshold (0.50) are passed as candidates to a supervised XGBoost classifier, trained on verified mismatch labels from prior audit cycles. The final anomaly_score is the XGBoost probability output — IsoForest acts as a computational gate, not a voting member. This separation preserves label-free coverage while leveraging supervision where labels exist.

Inference Flow

IC TRANSACTION Pair ingestion 6 features ISOLATION FOREST outlier score → gate ≥ 0.50 XGBOOST (supervised) candidates → anomaly_score FINAL SCORE anomaly_score P(mismatch|x) TREE EXPLAINER SHAP values · Art.13 THRESHOLD > 0.72 → flag HITL-IC-01 Reviewer decision

Task

Score each intercompany transaction pair for mismatch probability, routing high-confidence anomalies to the HITL-IC-01 queue before any automated posting.

Training Data

36 months of IC transaction history (Veldtmann Group + anonymised industry comparables). Positive labels: verified mismatch cases from prior audit cycles.

Input Features

entity_pair_id amount_vs_agreement_rate posting_timing_delta prior_mismatch_count currency_pair period_in_cycle

Performance Metrics

Precision @ 0.72 0.91
Recall @ 0.72 0.87
AUC-PR (held-out) 0.94 · 6-month holdout
Anomaly threshold anomaly_score > 0.72 → HITL-IC-01
Drift signals PSI on features · HITL override rate (7-day) · prediction score dist.
Explainability TreeExplainer · deterministic SHAP · EU AI Act Art. 13

Model 02 · Time Series Forecasting

Cash Forecast Model

LightGBM + Prophet Ensemble

Ensemble Architecture & Forecast Flow

STRUCTURAL FEATURES balance · payments · FX CALENDAR FEATURES payroll · due dates · IC LIGHTGBM structural signal ŷ₁ PROPHET seasonality signal ŷ₂ WEIGHTED ENSEMBLE 7-day PI · 90% PI WIDTH CHECK > €1.5M → HITL CONFIDENCE < 0.75 → HITL TREASURY DASHBOARD per entity / CCY MAPE >8% → retraining trigger

Task

Produce a 7-day rolling cash position forecast per legal entity and currency, with calibrated 90% prediction intervals to support liquidity management decisions.

Ensemble Design

LightGBM captures structural signals (scheduled payments, IC sweeps, FX moves). Prophet contributes calendar seasonality. Outputs are blended with rolling weight optimisation.

Input Features

historical_balance_7d scheduled_payments IC_sweep_history FX_rate_7d_ma payroll_calendar invoice_due_dates

Performance Metrics

Achieved MAPE (7-day) 4.3% · 3-month backtest
PI coverage (90%) 91.2% empirical coverage
HITL trigger — PI width PI width > €1.5M → HITL-CF-02
HITL trigger — confidence Point estimate confidence < 0.75
Drift signals forecast_vs_actual MAPE >8% · PSI on balance features

Model 03 · Multiclass Classification

Invoice Exception Classifier

XGBoost Multiclass

Classification Flow & Output Classes

INVOICE 6 features extracted XGBOOST MULTICLASS 5-class softmax P(class|features) APPROVED PO_MISMATCH PRICE_VARIANCE MISSING_CC DUPLICATE AUTO-APPROVE P(APPROVED) > 0.92 TREE EXPLAINER SHAP · Art.13 non-APPROVED only HITL-INV-03 Exception review + SHAP audit trail

Task

Classify each incoming invoice into one of five exception categories or APPROVED. High-confidence APPROVED predictions bypass the AP queue; all others receive SHAP explanations for the reviewer.

Output Classes

APPROVED PO_MISMATCH PRICE_VARIANCE MISSING_CC DUPLICATE

Input Features

po_match_score unit_price_variance_pct cost_center_present supplier_invoice_hash supplier_prior_exception_rate amount_band

Performance Metrics

Macro F1 (5-class) 0.89 · 90-day holdout
APPROVED precision 0.97 @ 0.92 threshold
Auto-approve threshold P(APPROVED) > 0.92 → skip queue
Explainability SHAP for every non-APPROVED class · Art. 13
Drift signals Exception rate trend (7d MA) · PSI · APPROVED FP rate via HITL

MLOps Pipeline

A single operational spine supports all three models — from weekly retraining on GreenOps schedules to per-model drift alerting in BigQuery.

VERTEX AI PIPELINES — WEEKLY RETRAINING CYCLE TRIGGER Weekly schedule GreenOps ±6h carbon window FEATURE EXPORT Vertex Feature Store batch to GCS TRAINING JOBS All 3 models parallel execution Vertex custom jobs EVALUATION Hold-out set Champion / Challenger gate MODEL REGISTRY Version pin Model card artifact stored DEPLOY Canary 10%→100% 48h window auto-rollback if override >1.5× base CONCEPT DRIFT MONITORING BIGQUERY ae_finance.model_drift per-model metrics table DRIFT METRICS PSI on key features override rate · MAPE PSI>0.2 → retrain CLOUD MONITORING Alert policies PagerDuty · Slack RETRAIN TRIGGER Ad-hoc pipeline run outside weekly cycle drift-triggered ad-hoc retraining loop ← HITL override rates written here

← scroll to view full diagram →

Vertex AI Pipelines

All three models retrain on a weekly cadence via Vertex AI managed pipelines. Job scheduling is GreenOps-aware, with a ±6-hour flex window to target low-carbon compute availability.

Weekly cadence GreenOps ±6h Parallel jobs Flexible batch

Model Registry & Canary Deploy

Every trained artifact is version-pinned in the Vertex Model Registry with a data hash and pipeline run ID. Deployment uses a canary pattern: 10% traffic for 48 hours, then full promotion — or automatic rollback to the pinned champion if the override rate exceeds 1.5× baseline.

Version pinning Canary 10%→100% Auto-rollback Model cards

Multi-Signal Drift Monitoring

Drift is detected via three concurrent signals: Population Stability Index (PSI) on input features (BigQuery scheduled query — triggers retraining if PSI > 0.2 on any key feature), prediction confidence distribution shift (Cloud Monitoring), and HITL override rate as an early behavioral proxy. Rising override rate alone validates PSI before triggering retraining.

PSI on features BigQuery Cloud Monitoring Override rate proxy

Training-Serving Skew Detection

Training-serving skew is validated post-deployment by comparing offline training feature statistics (computed at pipeline compile time and stored in GCS) against live Feature Store serving statistics via TFDV schema checks. Schema violations block the promotion step and trigger an alert.

TFDV schema checks Post-deploy gate Skew alert

Feature Lineage & Validation

All features are validated by the Data Governance process (M-08) before any write to the Feature Store. Lineage tags link each feature to its source system and governance approval record.

SOURCE EXTRACT Raw field pull M-08 VALIDATION Schema · PII · approve FEATURE STORE WRITE With lineage tag + version TRAINING / SERVING Feature served to model AUDIT LOG BigQuery lineage

← scroll to view full diagram →

Feature Source System Model(s) Governance Store Status
entity_pair_id SAP / Intercompany ledger IC Anomaly M-08 · approved ✓ VALIDATED
amount_vs_agreement_rate IC Agreement Register IC Anomaly M-08 · approved ✓ VALIDATED
historical_balance_7d Treasury TMS Cash Forecast M-08 · approved ✓ VALIDATED
FX_rate_7d_ma ECB / Bloomberg feed Cash Forecast M-08 · approved ✓ VALIDATED
IC_sweep_history Treasury TMS Cash Forecast M-08 · approved ✓ VALIDATED
po_match_score Procurement · ERP PO table Invoice Classifier M-08 · approved ✓ VALIDATED
supplier_invoice_hash AP module · invoice metadata Invoice Classifier M-08 · approved ✓ VALIDATED
supplier_prior_exception_rate AP history · raw exception log 30-day lagged window from raw AP audit log — not from model output table, to prevent feedback loop. Invoice Classifier M-08 · approved ✓ VALIDATED
payroll_calendar HR system · payroll schedule Cash Forecast M-08 · approved ✓ VALIDATED
prior_mismatch_count HITL override log · BigQuery IC Anomaly M-08 · approved ✓ VALIDATED

Training-Serving Skew Monitoring

Training feature statistics are computed at pipeline compile time and stored as TFDV schema artefacts in GCS alongside each model version. Post-deployment, these offline statistics are compared against live Feature Store serving statistics on each inference batch. Schema violations — indicating distributional divergence between training and serving environments — block the canary promotion step and raise a Cloud Monitoring alert for investigation before any further traffic shift.

EU AI Act Article 13 compliance: SHAP TreeExplainer is applied deterministically to every IC Anomaly score above the 0.72 threshold and to every non-APPROVED Invoice Classifier output. Explanation payloads are stored in the HITL audit trail alongside the original model inputs, prediction confidence, and reviewer decision — satisfying transparency and logging obligations for high-risk automated financial decisions.

Art. 13 Obligation Architectural Component Status
Transparency of operation SHAP TreeExplainer payloads stored in HITL audit trail (BigQuery) ✓ MET
Human oversight capability HITL queues HITL-IC-01 / HITL-CF-02 / HITL-INV-03 for all flagged decisions ✓ MET
Logging of operations All model inputs, outputs, confidence scores, and reviewer decisions logged to ae_finance.hitl_audit ✓ MET
Intended purpose disclosure System cards stored in Vertex Model Registry per artifact, documenting scope, known limitations, and deployment constraints ✓ MET
Instructions for use (deployers) Threshold configuration, HITL escalation paths, and override handling documented in runbook stored alongside each Model Registry entry ✓ MET