The Autonomous Enterprise · AE Platform Layer · Module 08 · H1 Foundation · PI-1

Data Governance
The Platform Foundation
— Every feature has a lineage. Every record has a source.

Data Governance is deployed in Horizon 1 — before any ML model, before any HITL checkpoint, before any inference in any suite. It validates every record entering the AE data fabric, tags every feature with a traceable lineage, and quarantines anything that fails. Without it, the SHAP explanations that satisfy EU AI Act Article 11 have no verified provenance.

H1 Foundation · PI-1 Prerequisite TFX Validate · Schema enforcement Feature Store lineage · BigQuery Pub/Sub · 6 regional sources Quarantine-then-review · Steward alert Platform Layer · Minimal Risk
H1
This is the PI-1 prerequisite. No domain suite — Autonomous Seller, Autonomous HR, or any future suite — can deploy until Data Governance is live and the 6-region canonical schema is validated. Deployment order: M-08 (H1) → M-06 (H3 · PI-7) → M-07 (H3 · PI-8).
System Context — C4 Level 1

Raw data in from six sources. Validated, lineage-tagged features out to every suite.

Data Governance sits at the boundary between raw regional data sources and the AE data fabric. Everything entering the Feature Store, BigQuery financial stream, and contract event stream — from any suite — passes through Data Governance first. All suites are blocked until this module is live.

Data Governance — System Context (C4 Level 1)
H1 platform foundation · three input streams · six regions · Feature Store + BigQuery outputs · data steward for quarantine only
Data Governance Module 08 · Platform Layer · H1 · PI-1 TFX Schema Validator Quality scorer (completeness · freshness) Lineage tagger (source · event_id · ts) Quarantine manager · Schema drift detector Cloud Run · Pub/Sub · BigQuery · TFX · DLP API No ML inference · deterministic validation only All domain suites blocked until DG is live 6 Regional Asset Systems Pub/Sub ae-asset-events Sensor telemetry · DICOM events Variable schema quality Salesforce Events Pub/Sub ae-q2c-events Contract · opportunity · payment Financial Events Pub/Sub ae-financial-events Vertex AI Feature Store Validated · lineage-tagged quality_score per feature All suites: AS · AHR · future BigQuery Canonical ae_assets · ae_financial · ae_contracts quality_score · lineage_ref per row Quarantine + Steward ae_quarantine · failed records P2 data steward alert Reinstate or discard
Data Governance boundary (H1)
Feature Store (validated output)
Quarantine + steward alert
Canonical Schema

The load-bearing artifact. Every suite depends on this.

The 6-region canonical schema v2.0 is the contract between every regional data source and the AE data fabric. All suites are blocked until this schema is validated and live. Schema changes must be registered before deployment — unregistered changes trigger quarantine automatically.

AE Canonical Schema v2.0 — ae_assets · representative fields
BigQuery table definition · TFX Schema proto source of truth · 6-region enforcement · version-controlled in ML Metadata
Field Type Constraint Version Notes
entity_idSTRINGREQUIREDv1.0Unique asset identifier · format: {REGION}-{TYPE}-{SEQ}
event_idSTRINGREQUIREDv1.0Pub/Sub message ID · used as lineage_ref anchor
event_timestampTIMESTAMPREQUIREDv1.0UTC · ISO-8601 · type mismatch triggers quarantine
source_regionSTRINGREQUIREDv1.0ENUM: EMEA-North · EMEA-West · APAC-East · APAC-South · AMER-East · AMER-West
asset_typeSTRINGREQUIREDv1.0ENUM: MRI-7T · CT-Premium · MRI-3T · Ultrasound-Elite
bearing_vibration_hzFLOATREQUIREDv1.0Range: 0–500 · anomaly threshold: fleet_mean ± 3σ
bearing_temp_cFLOATREQUIREDv1.0Range: -10–150°C · null triggers quality score deduction
bearing_temp_optical_cFLOATOPTIONALv2.1New canonical field · maps from optical_sensor_temp_c (APAC-East firmware v4.2.1)
power_draw_kwFLOATREQUIREDv1.0Range: 0–50kW · feeds GreenOps carbon model
firmware_versionSTRINGREQUIREDv1.0Semver · tracks schema drift to firmware release correlation
free_text_notesSTRINGOPTIONALDLP SCANv1.0Engineer field notes · DLP API scanned inline before TFX · PII detection routes to DLP HOLD (not schema quarantine)
schema_versionSTRINGREQUIREDv1.0Written by DG agent post-validation · used as lineage_ref component
quality_scoreFLOATREQUIREDv1.0Written by DG quality scorer · 0–1 · threshold 0.85 for Feature Store · 0.60 for BigQuery
lineage_refSTRINGREQUIREDv1.0Composite: event_id + schema_version + ingest_ts · EU AI Act Art. 11 anchor
REQUIRED Schema violation → immediate quarantine
OPTIONAL Absence recorded in quality score
DLP SCAN Cloud DLP inline scan before TFX · routes to DLP HOLD on PII detection
v2.1 Version introduced · tracked in ML Metadata Schema artifact
Cost Model — Cloud DLP (CFO rebuttal addendum)
DLP scope: Only free_text_notes (OPTIONAL field, present in ~12% of records at ClaraVis volume) is submitted to the Cloud DLP Content API. Structured numeric fields are not inspected — DLP has no value on FLOAT/TIMESTAMP fields. At 15,000 msg/day × 12% = ~1,800 DLP API calls/day → ~54,000/month. Cloud DLP free tier: 1 unit = 1 API call for content <500KB. Monthly DLP cost: 54,000 calls × $0.003/unit ≈ $162/month. Full revised cost model: Cloud Run ~€0.14 + BigQuery ~€2 + Cloud DLP ~$162 = ≈ $165/month total. Batch validation was not considered — see ADR-DG02. DLP cost scales linearly with free_text field prevalence; an operator flag suppresses DLP scan when notes field is absent.
Architecture — Validation Pipeline

Schema. Quality. Lineage. Three gates. Every record.

Data Governance uses TFX's deterministic validation engine and BigQuery lineage tables — no ML inference. Every record passes through schema conformance, quality scoring, and lineage tagging before reaching the Feature Store. Records that fail schema validation go to quarantine immediately. Records that pass schema but score below 0.60 quality also quarantine.

Data Governance — Validation Pipeline
Three input streams · TFX schema validation · quality scoring · lineage tagging · three output paths
VPC-SC PERIMETER · europe-west3 Pub/Sub ae-asset-events ae-q2c-events ae-financial-events always-on pull DG Agent Cloud Run · stateless SA: dg-sa@ · WIF DLP API inline scan TFX Validator Schema conformance vs ae_schemas (BQ) Drift fingerprint check Quality Scorer Completeness · freshness Null rate · range check Lineage Tagger event_id · schema_version quality_score · ingest_ts ≥0.85 Vertex AI Feature Store Lineage tag · quality_score · all suites 0.60–0.85 BigQuery Canonical Conditional · quality_flag=WARNING schema fail Quarantine Manager ae_quarantine · P2 alert Raw record preserved ae_quarantine · Steward Alert Reinstate or discard decision
Record State Machine

Every record. One of five states. No silent discard.

A record that fails validation is never silently discarded — it moves to QUARANTINED, where it waits for a data steward decision. If reinstated, it re-enters the VALIDATING state with the approved mapping rule applied. If discarded, an audit tombstone is written to ae_governance.discard_log.

Data Governance — Record State Machine
INGESTED → VALIDATING → QUALITY_SCORED → LINEAGE_TAGGED → FEATURE_STORE_READY · or QUARANTINED → REINSTATED | DISCARDED
INGESTED VALIDATING QUALITY_SCORED LINEAGE_TAGGED FEATURE_STORE QUARANTINED schema fail quality<0.60 reinstate DISCARDED discard DLP HOLD DLP PII trigger → separate quarantine path
Data Steward Interface

The only human decision in Data Governance — quarantine reinstatement.

The data steward interface is operationally different from every other HITL in any suite. No time pressure (P2, not P0), no financial consequence to the individual decision, no dual-reviewer requirement. The steward's job is schema triage: understand the violation, confirm the mapping rule, reinstate or discard.

Data Governance · Quarantine Review — APAC-East · Schema v2.1 · 847 records on hold
Data Governance — Quarantine Queue · APAC-East · 2 violations · P2 · 847 records on hold
Quarantine Summary
847
Records quarantined
2
Violations detected
P2
Alert priority
Violation Details
Violation 1 — Type Mismatch (BLOCKING)
Field: event_timestamp · Expected: TIMESTAMP · Arriving: STRING ("2026-03-15T14:22:11Z")
Suggested fix: PARSE_TIMESTAMP('%Y-%m-%dT%H:%M:%SZ', event_timestamp) AS event_timestamp
Violation 2 — New Field (SCHEMA DRIFT)
Field: optical_sensor_temp_c (FLOAT) · Not in canonical schema v2.0
Source: APAC-East firmware v4.2.1 added optical bore temperature sensor
Suggested: optical_sensor_temp_c → bearing_temp_optical_c (new canonical field v2.1)
Schema Diff
Canonical v2.0
event_timestamp: TIMESTAMP
bearing_vibration_hz: FLOAT
... 8 other fields unchanged
Arriving v2.1 (APAC-East)
event_timestamp: STRING ⚠
bearing_vibration_hz: — (absent)
optical_sensor_temp_c: FLOAT ✗
Suggested Mapping Rule (auto-generated)
Mapping rule v2.1_APAC_EAST (pending approval):
1. PARSE_TIMESTAMP('%Y-%m-%dT%H:%M:%SZ', event_timestamp) AS event_timestamp
2. optical_sensor_temp_c → stored as bearing_temp_optical_c (new canonical field)
3. bearing_vibration_hz → NULL · imputed via fleet median at inference

Approving reinstates all 847 records and updates ae_schemas to v2.1.
✓ Approve Mapping + Reinstate 847 Records
✗ Discard Batch
Batch Detail
Source region
APAC-East
Records quarantined
847
Units affected
14 MRI-7T + 9 CT-Premium · APAC-East
Schema version
v2.1 (unregistered)
Asset IQ impact
APAC-East: stale features, confidence degraded → HITL-06. Other 5 regions: unaffected.
EU AI Act
Reinstating with mapping rule preserves Art. 11 provenance chain. Mapping logged to ae_governance.schema_mappings.
P2 — no SLA
Not an emergency. Asset IQ continues for 5 other regions. Reinstate when mapping is confirmed.
Architecture Decision Records

Three Data Governance decisions. Every alternative documented.

ADR-DG01 — Data Governance Platform
TFX ExampleValidator over custom JSON schema validation
Custom JSON schema validation (Python jsonschema or Pydantic) was the initial design. Replaced with TFX ExampleValidator for four reasons: (1) TFX is stateful — it learns statistics from observed data and can flag distributional drift, not just structural violations. A temperature reading within the valid range but 10 standard deviations above fleet mean would pass JSON schema validation and fail TFX's anomaly detection. (2) TFX integrates natively with Vertex AI Pipelines — the same schema definitions used in offline training are enforced at serving time, eliminating train-serve skew. (3) TFX's Schema artifact is versionable in the ML Metadata store — schema evolution is tracked automatically. (4) TFX Validate data quality expectations are living documentation satisfying EU AI Act Article 11's training data requirement. Custom JSON schema has none of these properties.
Accepted · Platform Layer · Data Governance
ADR-DG02 — Data Governance Platform
Quarantine-then-review over reject-on-arrival
The simpler design: reject schema-violating records at the Pub/Sub subscriber — log the error and discard. Rejected for two reasons: (1) Data loss is irreversible in a real-time sensor system. A regional system sending 847 records per batch cannot easily re-send data from 4 hours ago. Quarantine preserves the raw record in BigQuery, allowing reprocessing once the schema mapping is confirmed. Reject-on-arrival means the data is permanently gone. (2) Schema violations often indicate a legitimate upstream system update, not corrupted data. The APAC-East firmware update is the canonical example — the data is valid, the schema registration is missing. Reject-on-arrival silently discards valid, irreplaceable telemetry. Quarantine-then-review surfaces the issue to a human who can determine whether the data is recoverable.
Accepted · Platform Layer · Data Governance
ADR-AQ02 (cross-reference) — enforced here
Unified Pub/Sub schema over per-region API adapters — enforced by Data Governance
ADR-AQ02 was documented in the Asset IQ module as the schema standardisation decision. It is cross-referenced here because Data Governance is the component that makes ADR-AQ02 enforceable at runtime. The unified Pub/Sub canonical schema is not a convention that source systems are expected to follow — it is a gate that every record must pass. Without Data Governance, ADR-AQ02 is a design aspiration. With Data Governance, it is a hard technical constraint. Any regional system that updates its schema without registering the change will have its records quarantined until a steward reviews them. This creates a virtuous feedback loop — schema violations are visible, immediate, and operationally costly (P2 alert), incentivising regional teams to register schema changes before deploying.
Accepted · Cross-reference · Asset IQ ADR-AQ02 · enforced by Data Governance · → Asset IQ module
ADR-DG03 — Data Governance · DLP Routing
DLP PII Hold routes to Redact-then-Reinstate, not schema quarantine reinstatement
Two PII handling approaches were evaluated. (1) Reject-on-PII-detection: discard records containing PII in free_text_notes. Rejected because field engineer notes are operationally valuable — they often contain maintenance observations that inform the RUL model's context features. Silent discard removes recoverable signal. (2) DLP HOLD with Redact-then-Reinstate: quarantine the raw record, alert the data steward with the specific detected entity types (e.g., PERSON_NAME, EMAIL_ADDRESS), and require an explicit redaction action before reinstatement. The steward does not see a "Reinstate as-is" option for DLP holds — only "Redact detected fields + Reinstate" or "Discard". This distinction is critical: schema quarantine reinstatement is a mapping approval; DLP reinstatement without redaction would write raw PII to the Feature Store, violating GDPR data minimisation (Art. 5(1)(c)). The DLP HOLD state is therefore architecturally distinct from the QUARANTINED state — different steward UI, different approval action, different audit log entry (ae_governance.dlp_remediation_log vs ae_governance.discard_log). Cloud DLP is scoped to free_text_notes only; structured numeric/timestamp fields are not inspected. Cost: ~$162/month at ClaraVis volume — see Canonical Schema section cost addendum.
Accepted · Platform Layer · Data Governance · GDPR Art. 5(1)(c) · Cloud DLP Content API
Stakeholder Rebuttals

Six objections. Each with an architectural answer.

CTO · S-01
Why a separate module — can't the Feature Store handle validation natively?
"Vertex AI Feature Store has data validation capabilities. Why build a separate Data Governance module when the Feature Store already handles ingestion quality?"
Architectural response
Vertex AI Feature Store validates that incoming data conforms to the feature group schema — it ensures correct feature types are present. It does not validate business rules that make a record meaningful: a temperature reading of 73.4°C is schema-valid but only meaningful if the freshness is within the expected window, the quality score meets the threshold, the source system is registered, and the lineage tag is attached. The Feature Store also does not write quarantine records, alert data stewards, track schema drift fingerprints, or produce the lineage metadata required for EU AI Act Article 11. Data Governance handles the layer between raw Pub/Sub events and Feature Store ingest — a prerequisite to it, not a replacement.
Evidence: ADR-DG01 (TFX vs Feature Store native validation) · EU AI Act Art. 11 (lineage provenance requirement)
CCO · S-02
How does feature lineage satisfy EU AI Act Article 11?
"EU AI Act Article 11 requires documentation covering training data provenance, characteristics, and quality. How does attaching a lineage tag satisfy that obligation in a way an auditor can verify?"
Architectural response
For any SHAP explanation presented to a human reviewer, an auditor can trace: the lineage_ref field in the SHAP record → the Feature Store feature group entry with the lineage tag → the DG validation log entry confirming schema version and quality score → the original Pub/Sub message ID and source system. This is not a description of provenance — it is a queryable audit chain. The TFX validation log entry, quality score, and schema version are all written to ae_governance.validation_log at validation time. The Feature Store offline store records used for model training carry the same lineage tags, so the training dataset's provenance is fully documented.
Evidence: ae_governance.validation_log · Feature Store lineage_ref field · EU AI Act Art. 11
Enterprise Architect · S-08
What happens to downstream modules when a large batch is quarantined?
"If APAC-East sends 847 records and all are quarantined, does Asset IQ stop predicting for those units? Does the quarantine cascade to other regions?"
Architectural response
Asset IQ continues running on all non-APAC-East records without interruption — the quarantine is scoped to the batch and region that failed. For APAC-East units, Asset IQ uses the last valid feature snapshot from before the quarantine event. Stale features are flagged — the quality_score drops below the 0.85 high-confidence threshold, routing APAC-East unit predictions to HITL-06 rather than auto-work-order until the quarantine resolves. The quarantine cannot cascade to other regions because Pub/Sub topics use per-region message filtering and the DG agent processes each message independently.
Evidence: Asset IQ HITL-06 (confidence threshold + staleness flag) · Pub/Sub per-region filtering
Asset IQ — Field Service Manager
If APAC-East data is quarantined, do I have visibility into those units?
"We have 23 units in APAC-East. If their sensor data is quarantined for 4+ hours, am I flying blind while the data steward resolves the schema issue?"
Architectural response
You are not flying blind. Asset IQ uses the last validated feature snapshot for APAC-East units. The daily RUL batch job still produces predictions for all 23 units using yesterday's feature values, with a lower confidence score (freshness component of the quality metric degrades as features become stale). Predictions below the 0.82 confidence threshold route to your HITL-06 queue with the note: "Feature freshness below threshold — APAC-East schema quarantine active." You see the prediction, the SHAP attribution, and the staleness flag. The system never silently stops predicting — it degrades gracefully with visible flags.
Evidence: Asset IQ Feature Store (freshness in quality score) · HITL-06 interface (staleness flag)
CISO · S-09
Who has access to the quarantine dataset — it contains raw telemetry?
"ae_quarantine contains raw telemetry that failed validation — including original field values and device ID. Who has access, and for how long is data retained?"
Architectural response
The ae_quarantine dataset has four principals: the DG agent SA (write — inserts quarantine records), the data steward role (read — reviews violations), the DG admin SA (read-write — reinstatement processing), and the audit SA (read-only — 7-year audit access). No developer SA has access. Quarantine records are retained for 90 days after resolution (reinstatement or discard), then deleted by BigQuery table expiry. Unresolved records are retained indefinitely until a steward decision is made. VPC-SC applies to ae_quarantine as to all ae_ datasets.
Evidence: BigQuery IAM (ae_quarantine access policy) · 90-day quarantine retention · VPC-SC perimeter
CFO · S-03
What is the cost of running Data Governance continuously on all six streams?
"Data Governance is always-on, processing three Pub/Sub topics continuously. What is the monthly infrastructure cost, and is there a cheaper architecture — batch validation, for example?"
Architectural response
Data Governance's Cloud Run cost is minimal because it is stateless and event-driven — it runs only when a Pub/Sub message arrives. At ClaraVis's volume (~15,000 messages/day), Cloud Run processing time per message is approximately 80ms. Total monthly Cloud Run compute: ~10 CPU-hours ≈ €0.14/month. BigQuery writes: approximately €2/month. Total infrastructure cost: approximately €2.20/month. Batch validation was not considered because schema violations need detection before the daily RUL batch job runs — a quarantine event discovered at 03:00 UTC daily scan means an entire day of potentially corrupt features already in the Feature Store. Streaming validation at point of ingestion is both cheaper and faster than batch validation with quarantine rollback.
Evidence: Cloud Run event-driven pricing · ADR-DG02 · Asset IQ RUL batch (needs clean features before 02:00 UTC)
Operational Dashboard

Platform health. Real-time. Six regions. One screen.

The Data Governance Ops Dashboard is the primary operational interface for the platform on-call engineer. It shows pipeline throughput per region, active quarantine queue depth, SLO burn rate, and DLQ status. All data is live-updated — every record processed, quarantined, or DLQ'd is visible within 5 seconds of the event.

Pipeline P99 Latency
312ms
SLO target: <500ms · ✓ PASS
Quarantine Resolution SLO
1h 42m
P2 target: <4h business hours · ✓ PASS
DLQ Messages (7d)
0
Max delivery attempts: 5 · Retry budget: 60min
Feature Store Error Budget
98.7%
30d window · SLO: 99.5% · ⚠ 1.9% burned
AE Platform · Data Governance Ops — europe-west3 · ae-governance-ops
● LIVE
Records today
14,892
↑ 3.2% vs yesterday
Passed validation
14,618
98.2% pass rate
Quarantined
247
3 batches · 2 pending
DLP Holds
27
1 pending redact review
P99 Latency
312ms
SLO: <500ms · ✓
DLQ (7d)
0
Retry budget: 5 attempts
Pipeline throughput · by region
6 / 6 REGIONS
Quarantine queue
2 PENDING
SLO burn rate · 30d
2 SLOs
Pipeline P99 Latency <500ms
312ms
0msSLO: 500ms1000ms
Quarantine Resolution <4h
1h 42m
0hSLO: 4h8h
Feature Store Error Budget (30d)
1.3% burned
0%Burned: 1.3%Budget: 0.5%
⚠ Approaching budget limit — investigate APAC-East staleness impact
DLQ Failure Rate (7d)
0 messages
0Target: 010
Interactive Demo

Three paths. Every record goes somewhere. Nothing is silently lost.

Select a scenario and run the simulation. Watch records arrive via Pub/Sub, pass through the TFX validation pipeline, and route to their outcome. The schema violation scenario shows the full quarantine-then-review path with a data steward alert — click Reinstate to watch the records flow through to the Feature Store.

Scenario
Clean Record · Feature Store
Schema Violation · Quarantine
DLP Trigger · PII Hold
Pub/Sub Ingestion Stream
IDLE
Select a scenario and press Run
to publish a record to ae-asset-events
Validation Pipeline
READY
📡
Pub/Sub message received
IDLE
🔍
DLP API scan
IDLE
🧪
TFX ExampleValidator · schema check
IDLE
📊
Quality scorer · completeness + freshness
IDLE
🏷️
Lineage tagger · event_id + schema_version
IDLE
Feature Store write
IDLE
Quarantine Path Active
Writing to ae_quarantine...
⚠ Data Steward Alert — P2
Review quarantine queue
✓ Approve Mapping + Reinstate
✗ Discard + Tombstone
⚠ Dead-Letter Queue · Processing Failure
Message routed to ae-dg-dlq after max delivery attempts exceeded.
P2 alert fired · on-call notified · retry budget: 5 attempts · backoff: 30s/2min/10min/30min/60min
Vertex AI Feature Store · new record written
Source Region
Pub/Sub origin
Validation Result
TFX outcome
Quality Score
0–1 threshold: 0.85
Lineage Tagged
EU AI Act Art. 11
AE Platform Navigation
This is where
the platform begins.

Data Governance is the H1 prerequisite. Deployment order: M-08 (H1 · PI-1) → M-06 GreenOps (H3 · PI-7) → M-07 Strategy Dashboard (H3 · PI-8). No suite can operate on the AE Platform until the 6-region canonical schema is validated and the Feature Store lineage pipeline is live.

M-06
GreenOps Platform →
Carbon-aware scheduling for all AE workloads · H3 · PI-7
M-07
Strategy Dashboard →
All suites · all signals · one screen · H3 · PI-8
PG 09
← AE Platform Index