Every ML model in the Autonomous Supply Chain is designed with its explanation contract before a single line of training code is written. SHAP values generated at inference. Model Cards versioned alongside models. EU AI Act Articles 9, 10, 13, 14, 15, and 17 satisfied by design. GDPR Article 22 position documented per model. Rollback SLA: ≤15 minutes.
Every model below is classified as High-Risk under EU AI Act Annex III. Each carries a SHAP explanation contract — specified before training begins, generated at inference, and written to the immutable audit log before any procurement, sourcing, or operational action is taken.
Every model passes through the same Vertex AI Pipelines topology. Promotion is gated on metric thresholds. Model Cards are auto-generated from training run metadata. High-risk models under EU AI Act require a manual gate before full deployment.
Great Expectations schema validation runs against the Feature Store snapshot before any training computation begins. If schema drift is detected against the registered feature contract, the pipeline fails immediately — no training job is submitted, no compute is wasted, and an alert is raised to the ML Engineer on-call via PagerDuty.
All feature transformations are executed with lineage metadata written to the Feature Store at each step. Transformation logic is versioned alongside the feature group — ensuring that every model version can be exactly reproduced from a point-in-time feature snapshot, satisfying EU AI Act data provenance requirements.
Distributed training on Vertex AI Training with hyperparameter tuning via Vertex AI Vizier. Each training run generates a full experiment lineage record: hyperparameter configuration, training data snapshot hash, framework version, and compute configuration — all captured in the Vertex AI Experiments registry before the model artifact is produced.
Evaluation metrics are computed on the held-out evaluation set and compared to the registered champion model performance. Promotion to the next pipeline step is gated on metric thresholds — MAPE for DemandIQ, AUC-ROC for classifiers. A challenger model that does not improve on the champion is blocked from registration.
Model Card is auto-populated from training run metadata, evaluation results, and bias analysis computed over demographic slices of the evaluation set. Sections include: intended use, training data provenance, evaluation results, bias analysis, ethical considerations, EU AI Act classification, regulatory obligations, and version history. No manual authoring required.
Model artifact, Model Card, and SHAP explainer artifact are registered together as a versioned bundle in Vertex AI Model Registry. Each version carries a regulatory metadata tag set: EU AI Act risk class, applicable regulatory obligations, Model Card reference, and the identity of the training run that produced it.
The challenger model receives 10% of live inference traffic alongside the champion. Champion/challenger performance comparison runs over a 48-hour observation window. Both models log SHAP explanations to the audit trail during this period, enabling side-by-side explainability comparison as well as metric comparison. Canary window / retraining conflict protocol: For SupplierSentinel (weekly retraining cadence), if a drift event fires during an active 48-hour canary window, the canary is immediately paused, the champion serves 100% traffic, and the retraining pipeline takes priority. The canary observation window restarts fresh once the new challenger clears the champion gate. This conflict protocol is encoded as a pipeline condition, not an operational procedure.
Promotion to 100% traffic is conditional on two gates passing in sequence. First, canary metrics must hold against champion thresholds over the full observation window. Second, because all five models are EU AI Act High-Risk, full promotion requires explicit manual approval from the designated ML Engineer and a compliance sign-off record before traffic shift. Neither gate can be bypassed. Rollback capability is preserved: if post-promotion metrics degrade within the 72-hour monitoring window, automated rollback to the previous champion is triggered and logged to the immutable audit trail.
Feature drift, prediction drift, and performance drift are monitored continuously via Vertex AI Model Monitoring and Cloud Monitoring custom metrics. Every drift event is timestamped, logged with the model version reference, and routed through PagerDuty before any retraining trigger fires.
PSI > 0.2 on any top-5 feature
KL divergence > 0.1 on output distribution
>5pp against registered champion baseline
>0.05 against registered champion baseline
The Model Card template below is the standard applied across all five models. Every section is auto-populated from Vertex AI training run metadata and evaluation results — Model Cards are generated artifacts of the pipeline, not manually authored documents.
XGBoost + ARIMA Time-Series Ensemble · 12-week rolling demand forecast per SKU per site
Feature groups are designed to be shared across models — eliminating training/serving skew and ensuring that a feature value computed once is consumed consistently by every model that depends on it. Feature freshness SLOs are enforced by the ingestion pipeline, not just measured. Access to feature groups is governed by IAM role-based access control — models and agents consume features via service accounts with least-privilege scope. GDPR data minimisation applies: no feature group stores personal data beyond what is necessary for the declared model purpose; supplier_features contains entity-level scores only, not the underlying personal data records from which scores are derived.
| Model | Retraining Cadence | Feature Drift Threshold | Performance Drift Trigger |
|---|---|---|---|
| DemandIQ | Monthly | PSI > 0.2 on any top-5 feature | MAPE degrades >5pp vs champion |
| SupplierSentinel | Weekly | PSI > 0.2 on any top-5 feature | AUC-ROC drops >0.05 vs champion |
| InventoryOrchestrator | Monthly | PSI > 0.2 on any top-5 feature | Inventory cost KPI degrades >3% |
| QualityTrace | Quarterly | PSI > 0.2 on any top-5 feature | Top-1 accuracy drops >5pp vs champion Compensating control: MDR-class NCRs always route to quality engineer HITL regardless of model confidence — classification degradation cannot silently affect safety-critical decisions |
| ContractIntelligence | Quarterly | PSI > 0.2 on clause risk vector distribution | Procurement outcome correlation < 0.65 on 90-day rolling window (KL divergence on TCO output distribution > 0.1 used as leading indicator) |
Article 15 requires that high-risk AI systems achieve appropriate levels of accuracy, robustness, and cybersecurity across their operational lifetime — not just at deployment. This section specifies the robustness controls applied uniformly to all five models, the rollback procedure triggered when a promoted model degrades, and the GDPR Article 22 position for the two models whose outputs most directly affect third-party decisions.
| Model | Art. 22 Applicability | Safeguard | Right to Explanation |
|---|---|---|---|
| DemandIQ | Not applicable — outputs are inputs to a human procurement decision, not autonomous decisions affecting natural persons directly. | HITL gate above financial threshold. Procurement Manager approval required before PO raised. | SHAP top-5 features presented at HITL checkpoint. |
| SupplierSentinel | Potentially applicable — risk scores may produce significant effects on supplier commercial relationships (individuals acting as sole traders or named representatives). | All sourcing exclusion or downgrade actions require human procurement agent review. Score alone cannot trigger supplier removal — a human decision is required. HITL is structural, not configurable. | Top-3 risk dimensions + SHAP per supplier event available to affected party on request via DPO channel. Response SLA: 72 hours. |
| InventoryOrchestrator | Not applicable — reorder decisions affect inventory levels, not natural persons. | HITL gate for orders above CFO-defined financial threshold. | SHAP attribution available in procurement dashboard. |
| QualityTrace | Not applicable — root cause classification affects batch records, not natural persons directly. | All MDR-class NCRs route to quality engineer HITL unconditionally. | Root cause SHAP report attached to quality engineer review surface. |
| ContractIntelligence | Potentially applicable — clause risk scoring and TCO output may significantly affect contract outcomes for natural persons party to supplier agreements. | Legal reviewer HITL is mandatory before any counter-proposal is drafted. No contract action is taken from model output alone. | Gemini clause citations + SHAP TCO attribution surfaced in legal reviewer interface and available to affected party on request. |