Asset IQ — Module 05 · The Autonomous Enterprise

System Context — C4 Level 1

Six regional systems in. One unified intelligence out.

Asset IQ's context is defined by its inputs — six regional asset telemetry systems that have never spoken to each other — and its outputs — predictive work orders, fleet anomaly alerts, and ISO 13485 Device History Records. The unified Pub/Sub ingestion pipeline is the H1 foundation that makes Asset IQ possible.

Asset IQ — System Context (C4 Level 1)

Two-tier ML system · six input sources · three output destinations · two HITL actors

Asset IQ boundary (two-tier ML)

6 regional systems (input)

HITL actors (FSM + VP Field)

Feature Store (shared platform)

BigQuery outputs (Device + DHR)

Architecture — Two-Tier ML

Batch RUL daily. Streaming anomaly always on.

The two-tier architecture separates workloads by their time characteristics. The RUL regression is a batch job — run once daily across all 12,000 units using GKE Autopilot with GPU access. The anomaly detector is a streaming agent — always running on Cloud Run, processing individual sensor events in near-real time. Neither tier alone is sufficient. Together they cover both scheduled prediction and reactive detection.

Asset IQ — GCP Architecture · Two-Tier ML

Tier 1: GKE Autopilot batch RUL (daily) · Tier 2: Cloud Run streaming anomaly (always-on) · shared Feature Store + audit trail

Tier 1 — GKE batch RUL (daily)

Tier 2 — Cloud Run streaming anomaly

Unified ingestion (6 regions)

HITL-06/07 routing

Shared Feature Store

Agent State Machines — Side by Side

Two agents. Two state machines. One deployed per compute tier.

The RUL Batch Agent and the Anomaly Detection Agent are separate ADK deployments on separate compute substrates. Showing them side by side makes the parallel nature of the two-tier architecture immediately visible — different triggers, different processing patterns, different HITL paths, but the same shared platform underneath.

Asset IQ — Parallel State Machines · RUL Batch Agent (left) · Anomaly Detection Agent (right)

Same shared platform · different compute substrate · different trigger · different HITL paths

RUL batch states

HITL-06 (FSM) · HITL-07 (VP Field)

Anomaly detection states

HITL-06 (unit anomaly)

ALERTING (medium, no HITL)

Data Flows — Two Sequences

RUL low-confidence prediction. Then fleet anomaly.

Two sequences. The first shows the daily RUL run finding a low-confidence prediction for the München MRI-7T — the same unit from the ContractGuard and RevRec AI demos, showing portfolio continuity. The second shows the fleet anomaly detection surfacing a cross-regional pattern across four EMEA-North units.

Sequence 1 — Daily RUL Batch: Low-Confidence Prediction → HITL-06

MRI-7T unit in Universitätsklinikum München · 8.4 days RUL · confidence 0.78 · below threshold → Field Service Manager queue

Sequence 2 — Streaming Anomaly: Cross-Regional Fleet Pattern → HITL-07

Düsseldorf MRI-7T sensor event triggers anomaly · BigQuery fleet query finds 4-unit pattern across EMEA-North · VP Field Service HITL-07

HITL-06 Presentation

What the Field Service Manager sees — sensor context before the decision.

HITL-06 differs from RevRec AI and ContractGuard in one important way: the human reviewer is in the field, not in an office. The interface must communicate urgency, sensor context, and decision clarity in a format that works at a glance. Three decisions: schedule a planned intervention, dismiss the alert, or request on-site verification before committing to a work order.

HITL-06 · Field Service Manager Queue — Unit MRI-7T-MCH-0042 · Universitätsklinikum München

Asset IQ — HITL-06 Review · MRI-7T-MCH-0042 · SLA: 5h 47m remaining · conf: 0.78 below threshold

RUL Prediction

PREDICTED DAYS TO FAILURE

8.4

days · q10: 4.2d · q90: 14.1d

MODEL CONFIDENCE

0.78

Below auto-threshold (0.82) → human review required

SHAP Sensor Attribution — Top 3 Features Driving Failure Prediction

gradient_coil_temp_p95

73.4°C

baseline: 61.2°C

SHAP: −3.8 days · 19.9% above 90-day baseline

helium_level_slope

−0.041 L/day

baseline: −0.018 L/day

SHAP: −2.7 days · 2.3× faster depletion than baseline

rf_power_deviation

+14.2 dB

baseline: ±2.1 dB

SHAP: −1.6 days · 6.8× above normal variance

ISO 13485 Context

Device History Record — MRI-7T-MCH-0042
Last planned maintenance: 2025-09-14 · 182 days ago
Last unplanned event: None in past 12 months
Cumulative scan count: 4,847 · utilisation: 68% of rated capacity
Unit age: 38 months · warranty status: Active (expires 2027-01-31)
Any work order created will generate an ISO 13485 DHR event in BigQuery ae_devices.dhr_events

✓ Schedule Maintenance

✗ Dismiss + Reason

↗ On-site Verify

⏱ SLA: 5h 47m remaining · Timeout → auto-schedule preventive maintenance · Decision immutably recorded

Unit Detail

Unit ID

MRI-7T-MCH-0042

Model

ClaraVis MRI-7T Gen 1

Location

Universitätsklinikum München · Radiology Dept B · Ward 4

Region

EMEA-North

Confidence vs threshold

0.78 vs 0.82 threshold
→ HITL-06 required

HITL checkpoint

HITL-06 · 8h SLA

SHAP record

Written to BigQuery before this queue created · EU AI Act Art. 13

Similar fleet units

No cross-regional pattern detected for this unit. 3 similar MRI-7T units in EMEA-North flagged separately today.

ISO 13485 ✓
DHR event created on any decision. Work order includes SHAP ref and HITL record ID.

Note
This unit is on active warranty. Emergency dispatch (~€42K) vs planned intervention (~€8K). Decision affects warranty reserve.

Architecture Decision Records

Three Asset IQ decisions. Every alternative documented.

ADR-011 (restated for Asset IQ)

Isolation Forest over autoencoder for unit-level anomaly detection

Autoencoder-based anomaly detection was the initial design for Asset IQ's streaming anomaly detector. Rejected for the same core reason as FinRisk Sentinel (ADR-011 on Page 06): autoencoders require KernelExplainer SHAP approximations that are too slow for real-time sensor event scoring — an MRI sensor event arrives every few minutes per unit across 12,000 units and must be scored within the 5-minute SLA. Isolation Forest's decision-tree structure is directly compatible with TreeExplainer, producing exact SHAP attributions in milliseconds. There is an additional asset-specific reason: anomaly detection for MRI units is inherently a "normal vs abnormal" problem without labelled failure examples — the training dataset is normal operating data only. Isolation Forest is designed for exactly this scenario (unsupervised, contamination parameter for expected anomaly rate). Autoencoders trained on normal data also work in principle but require careful architecture design to avoid learning the identity function. Isolation Forest's simplicity and interpretability are advantages, not limitations, for this use case.

Accepted · Phase ML Design · Page 06 ADR-011

ADR-AQ01 — Asset IQ specific

GKE Autopilot for batch RUL over Cloud Run — long-running GPU workloads need a different substrate

Cloud Run was considered for the RUL batch job alongside the anomaly detection agent, for consistency. Rejected because Cloud Run has a maximum request timeout of 60 minutes — the daily RUL batch job across 12,000 units takes approximately 45 minutes, which is within Cloud Run's limit, but leaves no margin. More importantly, the RUL batch job requires GPU access (A100 for the Gemini embedding computation in the feature engineering step) — Cloud Run GPU support is available but at higher cost and with less scheduling flexibility than GKE Autopilot. GKE Autopilot submits pod specs to match workload requirements and bills per-pod rather than per-instance-time — the correct billing model for an intermittent batch job that runs once daily. Cloud Run's per-request billing model is optimal for the anomaly detection agent (short, frequent requests). GKE Autopilot's per-pod billing model is optimal for the RUL batch job (one long, infrequent job). The compute tier split is a deliberate billing and capability optimisation, not a complexity preference.

Accepted · Phase Compute Design · Asset IQ module

ADR-AQ02 — Asset IQ specific

Unified Pub/Sub schema over per-region API adapters for telemetry ingestion

The alternative ingestion design was to build a per-region API adapter for each of the 6 regional systems — each adapter translating the region's native format into the canonical asset feature schema. This is the standard enterprise integration pattern and was the initial design. Rejected in favour of a unified Pub/Sub schema for three reasons: (1) Adapter proliferation — 6 adapters means 6 codebases to maintain, 6 failure modes to monitor, and 6 schema migration paths when a regional system updates its API. A single canonical schema pushes the translation burden to the ingestion point (where it belongs) and makes all downstream consumers — Feature Store, anomaly agent, RUL batch job, BigQuery analytics — schema-agnostic. (2) Cross-regional analytics require a common schema — the fleet anomaly detection (HITL-07) depends on querying across regions in a single BigQuery query. That query is only possible if all regions publish to the same schema. Per-region adapters that produce slightly different schemas (common in practice) break cross-regional analytics without a separate normalisation layer. (3) The Data Governance module (M-08) validates every Pub/Sub message against the canonical schema before it enters the Feature Store — this validation gate is only possible because there is one schema to validate against. The unified schema is the architectural decision that makes Asset IQ's cross-regional intelligence possible.

Accepted · Phase Data Design · Asset IQ module

Stakeholder Rebuttals

Six objections. Each with an architectural answer.

CTO · S-01

Why two separate ML models instead of one unified model?

"You have a RUL regressor for batch and an Isolation Forest for streaming. Why not one model that does both? The operational complexity of two separate Vertex AI endpoints, two monitoring jobs, and two HITL paths seems high."

Architectural response

The two models have fundamentally different problem formulations that cannot be merged without compromising both. The RUL regressor is a supervised regression model — it requires historical failure labels and produces a continuous prediction (days to failure) with a confidence interval. It is designed for daily batch processing where the full feature history of each unit is available. The Isolation Forest is an unsupervised anomaly detector — it requires no failure labels and produces a binary signal (normal vs abnormal) based on real-time sensor deviation from a learned baseline. It is designed for streaming, event-driven processing where a single sensor reading triggers an immediate response. Merging them would require a semi-supervised model that is harder to train, harder to explain, and harder to monitor. Separation of concerns is the right architectural decision: the RUL model answers "when will this unit fail?" on a planning horizon; the anomaly model answers "is something wrong with this unit right now?" on an operational horizon.

Evidence: ADR-AQ01 (different compute substrates) · Page 06 Model 02 (RUL) and Model 03 (Anomaly) — separate model cards, separate evaluation metrics, separate confidence thresholds

CCO · S-02

Does the RUL prediction fall under EU AI Act Annex III?

"The EU AI Act Annex III covers AI systems used in critical infrastructure. Does a predictive maintenance model for MRI scanners in hospitals qualify? And if so, what are the specific documentation obligations?"

Architectural response

Yes. EU AI Act Annex III Category 2 covers AI systems used in the management and operation of critical infrastructure — including healthcare facilities. An MRI scanner failure in a hospital is a patient safety event. A predictive maintenance model that determines when maintenance interventions occur is directly influencing the operational status of medical critical infrastructure. The documentation obligations are: Article 11 (Technical documentation — satisfied by the Asset IQ Model Card on Page 06), Article 13 (Transparency — satisfied by SHAP sensor attribution per prediction), Article 14 (Human oversight — satisfied by HITL-06 for low-confidence predictions and HITL-07 for fleet anomalies), and Article 15 (Accuracy and robustness — satisfied by the drift detection monitoring and the quarterly full-rebuild validation). The ISO 13485 DHR requirement is separate from EU AI Act — it applies to the Device History Record for the physical MRI unit, not to the AI model itself.

Evidence: Page 06 Asset IQ Model Card · HITL-06/07 specification · EU AI Act Annex III Cat. 2 · ISO 13485 DHR integration in state machine

Field Service Manager

What if the model predicts a failure that doesn't happen?

"If Asset IQ sends me to a site for a planned intervention and the unit is fine, I've wasted an engineer's day. At €800 per field visit, false positives are expensive. What's the false positive rate and who is accountable when it's wrong?"

Architectural response

The false positive question is answered by design: confidence below 0.82 routes to you for a decision — it does not auto-dispatch an engineer. When the model is uncertain, you decide. At 0.82 confidence and above, the precision at the 14-day horizon is 0.87 — meaning 13% of auto-dispatched interventions are for units that would not have failed within 14 days. At approximately €800 per visit versus approximately €42,000 for an emergency dispatch plus hospital disruption costs, the break-even false positive rate is well above 13%. When the model is wrong, the override you enter in HITL-06 becomes a training example that improves the model's precision on that failure mode. Accountability is shared by design: below-threshold predictions require your professional judgment before any action is taken.

Evidence: Page 06 Asset IQ RUL Model Card (Precision@14d: 0.87, confidence threshold 0.82) · HITL-06 three-decision interface · override as training signal (concept drift section Page 06)

Enterprise Architect · S-08

How do you handle 6 regional systems with different sensor schemas?

"The 6 regional asset management systems were built at different times by different vendors. Their sensor data schemas are almost certainly inconsistent. How does Asset IQ produce a unified feature vector from six incompatible data sources?"

Architectural response

ADR-AQ02 is the direct answer to this question. The unified Pub/Sub canonical schema is the architectural decision that resolves schema inconsistency at the ingestion boundary. Each regional system has a lightweight ingestion adapter that translates its native schema to the canonical asset event schema before publishing to Pub/Sub. The Data Governance module (M-08, H1) validates every Pub/Sub message against the canonical schema — malformed records are quarantined before they reach the Feature Store. The canonical schema defines 24 mandatory sensor fields — regional systems that do not produce all 24 fields publish null values for missing sensors, which are imputed at feature engineering time using fleet median values. The RUL model was trained on this imputed feature matrix — it handles missing sensors gracefully without requiring data quality perfection from every regional system on day one.

Evidence: ADR-AQ02 (unified schema) · M-08 Data Governance (schema validation gate) · Page 06 Asset IQ Feature Store (24 features, null imputation strategy)

CISO · S-09

Sensor telemetry data — what data classification applies?

"The asset telemetry includes scanner operating data from hospitals. Does that constitute health data under GDPR? Or is it purely machine data with no personal data implications?"

Architectural response

The sensor telemetry processed by Asset IQ is machine operational data — coil temperatures, helium levels, RF power deviation, scan utilisation rates. It does not include DICOM patient images, patient identifiers, scan schedules, or any data that can be linked to an individual patient. The data classification is Internal (not Confidential or Restricted) — it is commercially sensitive (ClaraVis's operational performance data) but not personal data under GDPR. The Pub/Sub canonical schema explicitly excludes any patient-linked fields — this is validated by the Data Governance schema at ingestion. The DLP API is not configured for asset telemetry because there is no PII to detect. The EU boundary (europe-west3 VPC-SC) applies to all asset telemetry regardless of data classification — both for data sovereignty and because the Org Policy region constraint applies uniformly across the entire GCP project.

Evidence: Page 07 data classification labels (asset telemetry: internal) · Pub/Sub canonical schema (no patient fields) · Page 07 DLP configuration (contracts only, not telemetry)

VP Field Service · S-06

Can we use RUL predictions to renegotiate warranty reserves?

"We currently hold €40M in warranty reserves because we assume worst-case failure scenarios. If Asset IQ can predict failures with 87% precision at 14 days, can the CFO use those predictions to reduce the reserve? And what's the audit trail if the predictions are wrong?"

Architectural response

This is a financial accounting decision that requires input from the Finance Controller and auditors — Asset IQ provides the data, not the accounting judgment. What Asset IQ can provide: a fleet-level RUL distribution showing the probability of failure within each time window across all 12,000 units, with confidence intervals. This distribution, combined with historical actual failure data and the precision metrics from the Model Card, gives the Finance team a defensible actuarial basis for reserve calculation that is more accurate than worst-case assumptions. The audit trail is already designed for this: every RUL prediction is written to the BigQuery Device entity with confidence interval, SHAP attribution, and the model version that produced it. The HITL decision (human override or approval) is linked to the prediction record. If an auditor asks "what was the fleet failure probability distribution on 31 March 2026?", the answer is a BigQuery query returning 12,000 rows of predictions with confidence intervals — an evidence package that justifies any reserve calculation based on it.

Evidence: BigQuery Device entity (RUL score · confidence interval · model version) · HITL override records (linked to prediction) · Page 06 Asset IQ Model Card (precision metrics for audit defence)

Demo Pathway

Three minutes. Two scenarios. Same München unit throughout.

The demo continues the portfolio narrative — the München MRI-7T unit (MCH-0042) appears here after its contract was processed by ContractGuard and its revenue recognised by RevRec AI. Now Asset IQ is monitoring its operational health. Scenario 1: a low-confidence RUL prediction triggers HITL-06. Scenario 2: a fleet anomaly across four EMEA-North units triggers HITL-07.

00

Setup · 30s before

Open the Asset IQ dashboard — show the fleet status panel

Open the Strategy Dashboard (or the Asset IQ module dashboard). Show the fleet status panel: 12,000 units, RUL distribution, colour-coded by risk tier. Point out the red cluster in EMEA-North — 4 units flagged. Navigate to the Cloud Scheduler — show the daily RUL batch job. This establishes the scale of the system before showing the detail.

Asset IQ dashboardFleet status panelCloud Scheduler

01

Batch RUL Run · 0:00

Trigger the daily RUL batch job — watch it run

Manually trigger the Cloud Scheduler job (simulating the 02:00 UTC run). Show the GKE Autopilot workload spinning up in the GKE Console — pod allocated, GPU attached. Show the Vertex AI Pipelines run starting. The batch job processes 12,000 units in approximately 45 minutes — for the demo, show a pre-computed run that completed earlier, then jump to the output.

"This is running on GKE Autopilot — I didn't provision a node pool, I submitted a pod spec. Autopilot allocated an A100 GPU node, ran the job, and will scale back to zero when it's done. The billing is per pod, not per node — I'm only paying for the 45 minutes the job runs."

GKE AutopilotVertex AI PipelinesA100 GPU pod

02

HITL-06 · 0:30

Open the Field Service Manager HITL-06 queue — MCH-0042

Navigate to the HITL Approval UI. Open the FSM queue — one item: MRI-7T-MCH-0042, Universitätsklinikum München, RUL 8.4 days, confidence 0.78. Show the full interface: the RUL prediction with confidence interval, the SHAP sensor attribution (gradient_coil_temp_p95 is the dominant feature), and the ISO 13485 DHR context showing the unit's service history.

"Notice the confidence is 0.78 — below the 0.82 auto-dispatch threshold. So the model is saying 'I think this unit needs service in about 8 days, but I'm not confident enough to book the engineer without asking you.' The Field Service Manager sees the sensor context — the coil temperature is 20% above baseline, the helium is depleting 2.3× faster than normal. They click Schedule Maintenance."

HITL-06 UIRUL predictionSHAP sensor attributionISO 13485 DHR

03

Work Order · 1:15

Show the Salesforce work order and ISO 13485 DHR event

After the FSM approves, show the Salesforce Cases view — a preventive maintenance case has been created for MCH-0042, due 2026-03-22. Open BigQuery — the ae_devices.dhr_events table has a new row: device_id MCH-0042, event_type PREDICTIVE_MAINTENANCE, hitl_ref, shap_ref, ISO13485_standard_ref. This is the Device History Record that satisfies ISO 13485.

"Every work order creates a DHR entry — that's the ISO 13485 Device History Record. If ClaraVis ever needs to demonstrate to a regulator that this unit's maintenance history is documented, it's all in BigQuery. The SHAP explanation and the HITL approval record are linked to the DHR entry."

Salesforce CasesBigQuery dhr_eventsFirestore HITL record

04

Fleet Anomaly · 1:50

Trigger a sensor event — watch the fleet anomaly detection

Publish a synthetic sensor event to the ae-asset-events Pub/Sub topic simulating the Düsseldorf unit (DUS-0118) reporting an rf_power_deviation spike. Show the Cloud Run anomaly agent logs: "Event received · DUS-0118 · anomaly_score: 0.83", "SHAP computed · top sensor: rf_power_deviation +4.8", "Score above threshold · querying fleet pattern…", "4 units matching in EMEA-North · triggering HITL-07".

"This is the capability that makes Asset IQ different from standard predictive maintenance. One unit triggers an anomaly. The agent immediately checks: is this unit-specific, or is there a pattern across the fleet? It finds 3 other EMEA-North units with similar RF power deviation profiles. Four units, same symptom, same region — that's a fleet alert, not a single unit repair. This goes to VP Field Service, not Field Service Manager."

Pub/Sub synthetic eventCloud Run anomaly agentBigQuery fleet queryHITL-07

05

HITL-07 · 2:20

Open the VP Field Service HITL-07 queue — fleet alert

Navigate to the HITL Approval UI — VP Field Service view. Show HITL-07: 4 affected units listed, anomaly scores, regions, SHAP attribution for each unit (rf_power_deviation dominant across all four), and the recommended action: Fleet Alert + Engineering escalation for potential recall review. VP Field Service clicks Fleet Alert.

"HITL-07 has a 2-hour SLA — because a cross-regional fleet pattern is urgent. The 4 units are in active service at hospitals. VP Field Service approves Fleet Alert, which dispatches engineers to all four units simultaneously and escalates to VP Engineering for a potential recall review. Four work orders created, four DHR events written, full fleet alert audit trail in BigQuery."

HITL-07 VP Field Service4x Salesforce Cases4x DHR eventsFleet alert audit trail

06

Audit · 2:50

Query the fleet decision audit trail

Open BigQuery. Show a query joining ae_audit.shap_explanations, ae_audit.hitl_events, and ae_devices.dhr_events for the four fleet anomaly units. The result: each unit's SHAP sensor attribution, the HITL-07 decision, the VP's approval, and the DHR event — all linked. Same audit pattern as ContractGuard and RevRec AI: one BigQuery query covers the full compliance evidence package.

"Three modules into the portfolio demo — ContractGuard, RevRec AI, Asset IQ — and they all produce the same evidence pattern: SHAP before HITL, HITL record immutable, everything queryable from BigQuery. That's not coincidence. That's a shared platform design."

BigQuery ae_auditae_devices.dhr_eventsCross-module audit join