ContractGuard — Module 02 · The Autonomous Enterprise

System Context — C4 Level 1

Document in. Scored clauses out. Legal HITL in between.

ContractGuard sits between the document store (where contracts arrive) and the Legal team (who reviews flagged clauses). It is the intelligence layer that makes the Legal team's time go further — they only see clauses that exceed the risk threshold, with full context and a draft counter-position already prepared.

ContractGuard — System Context (C4 Level 1)

External actors · external systems · integration patterns · two-pass processing model visible at context level

ContractGuard boundary

Document input (GCS)

Legal HITL actor

Document AI + Vector Store

Audit trail + EU AI Act

Architecture — Two-Pass Processing

Document AI first. Gemini second. Each pass earns its cost.

The two-pass architecture is ADR-CG01 — the most important design decision on this page. Document AI provides fast, deterministic, structured clause extraction in Pass 1. Gemini 1.5 Pro provides full semantic reasoning over the entire document in Pass 2. Neither pass alone is sufficient. Together, they are cheaper and more accurate than Gemini alone.

ContractGuard — Two-Pass Architecture with Latency Breakdown

Pass 1 (Document AI) → Pass 2 (Gemini) → Risk model → SHAP → Vector Store → HITL creation · each step timed

Pass 1 — Document AI (~40s)

Pass 2 — Gemini 1.5 Pro (~75s)

HITL creation (~8s)

Risk model + SHAP

Total Processing Latency — Per Contract · Typical ClaraVis MRI Purchase Agreement

Document AI parse

Pass 1

~40s

Gemini 1.5 Pro inference

Pass 2

~75s

Risk model + SHAP

~8s

Vector Store lookup

~4s

HITL checkpoint creation

~6s

Total end-to-end (contract upload → HITL queue): approximately 2–3 minutes per contract ~133s typical

Why this is acceptable: ContractGuard is asynchronous — the contract is uploaded and processed in the background. The Legal team is not waiting in front of a screen. The HITL-02 queue appears in their interface when processing completes. For a contract review that previously took 2–3 days, a 2–3 minute processing time is not a bottleneck — it is an elimination of one. The 900K token limit covers 99%+ of ClaraVis's standard MRI purchase agreements. Contracts above this threshold route automatically to HITL-03 for manual Legal review with the Document AI clause list as a starting point.

Agent State Machine

Two passes. Multiple HITL nodes. One coherent state machine.

The ContractGuard state machine reflects the two-pass architecture — Pass 1 and Pass 2 are distinct states with their own failure modes and timeout behaviours. HITL nodes can be triggered multiple times in a single contract run — once per flagged clause above the risk threshold, plus a separate HITL-03 if the governing law is non-standard.

ContractGuard — Agent Finite State Machine

Two processing passes · multiple HITL branches · per-clause scoring loop · rollback paths · audit at every transition

Pass 1 — PARSING

Pass 2 — ANALYSING

SCORING (risk model + SHAP)

HITL-02 (risk clause)

HITL-03 (governing law)

DRAFTING / COMPLETE

Data Flow & Sequence

One contract. Two passes. Three flagged clauses.

The same University Hospital München contract from the RevRec AI demo — uploaded to ContractGuard for risk analysis before signing. Document AI parses it in Pass 1. Gemini analyses the full document in Pass 2. Three clauses are flagged. The highest-risk clause routes to HITL-02. The Legal team approves with an amendment. A counter-proposal is generated.

ContractGuard — Sequence: München Contract · Clause Risk Analysis

Contract uploaded → Document AI parse → Gemini analysis → 3 clauses flagged → HITL-02 Legal review → counter-proposal → Salesforce write-back

HITL-02 Presentation

The most information-rich HITL interface in the suite. Everything Legal needs.

HITL-02 is more complex than RevRec AI's HITL-04 because the Legal reviewer needs more context: the clause text, the risk reasoning, three precedent contracts with their outcomes, and a draft counter-position already generated by Gemini. The goal is that by the time Legal opens the queue, the analysis work is done — they are making a judgment, not doing extraction.

HITL-02 Legal Review Queue — Clause: Liability Cap · risk score 0.82

ContractGuard — HITL-02 Review · Clause 14 of 47 · SLA: 20h 14m remaining · 2 more clauses queued

Flagged Clause — Risk Score

HIGH RISK · 0.82 · Liability Cap

Clause 14 — Limitation of Liability

"Notwithstanding anything to the contrary herein, ClaraVis's aggregate liability under or in connection with this Agreement, whether arising in contract, tort (including negligence), breach of statutory duty, or otherwise, shall not exceed fifty percent (50%) of the total Contract Value paid or payable in the twelve (12) months preceding the event giving rise to the claim."

SHAP Attribution — Why risk score 0.82?

liability_cap_ratio

+0.41

val: 0.50 · ClaraVis standard: 1.0× contract value

governing_law_match

+0.24

val: false · counterparty jurisdiction: Bavaria (DE) vs ClaraVis standard: Munich Courts

indemnification_asymmetry

+0.16

val: 0.78 · counterparty-favourable indemnification obligations

Precedent Contracts

CV2024-0108 · Hôpital Lariboisière Paris · similarity: 0.91

Liability cap: 50% → negotiated to 80% · outcome: ACCEPTED · approved by: A. Laurent (GC) · 2024-09-12

CV2025-0033 · UKE Hamburg · similarity: 0.87

Liability cap: 50% → accepted with carve-out for IP infringement · outcome: ACCEPTED_WITH_CARVE_OUT · 2025-03-08

CV2023-0271 · AMC Amsterdam · similarity: 0.74

Liability cap: 100% · no negotiation required · outcome: ACCEPTED · no precedent value for this case

Draft Counter-Position (Gemini-generated · requires Legal approval before sending)

Proposed amendment to Clause 14:

"ClaraVis's aggregate liability under or in connection with this Agreement shall not exceed one hundred percent (100%) of the total Contract Value paid or payable in the twelve (12) months preceding the event giving rise to the claim. Notwithstanding the foregoing, the foregoing limitation shall not apply to: (i) claims arising from ClaraVis's gross negligence or wilful misconduct; (ii) claims for personal injury or death; or (iii) claims arising from infringement of third-party intellectual property rights."

⚠ This is a Gemini-generated draft. Legal review and approval required before transmission to counterparty.

✓ Approve Counter-Position

✗ Request Revision

↗ External Counsel

⏱ SLA expires in 20h 14m · Timeout escalates to General Counsel's manager · Decision is immutably recorded

Contract Context

Contract ID

sfdc_contract_CV2026_0042

Counterparty

Universitätsklinikum München

Contract Value

€2,840,000

Clause position

Clause 14 of 47 · Section 8: Liability

This queue

1 of 3 flagged clauses
Clause 22: Indemnity (0.71)
Clause 38: Termination (0.67)

Processing time

Pass 1: 42s
Pass 2: 73s
Risk + SHAP: 8s
Total: 2min 3s

SHAP record

Written to BigQuery
ae_audit.shap_explanations
before this queue was created

EU AI Act Art. 13 ✓
SHAP attribution provided before HITL created. Counter-position is Gemini-generated — flagged as draft requiring human approval before use.

Counter-proposal status
DRAFT — Gemini generated
NOT sent to counterparty
Requires Legal approval

Architecture Decision Records

Three ContractGuard decisions. Every alternative documented.

ADR-010 governs the risk model. ADR-CG01 and ADR-CG02 are ContractGuard-specific — the two-pass architecture and the Vector Store choice. These are the decisions that distinguish ContractGuard from a naive "send the contract to Gemini and ask if it's risky" implementation.

ADR-CG01 — ContractGuard specific

Two-pass architecture (Document AI then Gemini) over single-pass Gemini-only analysis

The simplest design is to send the entire contract to Gemini with a system prompt asking for risk analysis. This was the initial prototype approach. Rejected for production because of three problems: (1) Gemini's output on a full contract risk analysis is an unstructured text response — converting it to a machine-readable clause-by-clause risk assessment reliably requires additional prompt engineering that is brittle and breaks on unusual contract structures. Document AI provides a deterministic, structured clause list that is the same regardless of contract formatting. (2) Gemini-only removes the ability to run a fast, cheap, deterministic classification on standard clauses — approximately 80% of ClaraVis's contract clauses are standard and score below 0.30 on the risk model. Document AI extracts these in 40 seconds for ~€0.08 per contract. Gemini at €0.35 per 1K tokens on a 100-page contract is substantially more expensive per clause when used for extraction rather than reasoning. (3) The two-pass architecture separates concerns: Document AI handles structured extraction (what it is optimised for), Gemini handles semantic reasoning (what it is optimised for). Each pass is testable, monitorable, and independently improvable.

Accepted · Phase Agent Design · ContractGuard module

ADR-010 (restated for ContractGuard)

XGBoost on structured features + Gemini embeddings over fine-tuned LLM for risk scoring

Fine-tuning a legal domain LLM (LegalBERT, SaulLM) for clause risk classification was evaluated. Rejected because: (1) fine-tuned LLMs do not produce deterministic SHAP attributions — the same EU AI Act Art. 13 argument from ADR-010 on Page 06 applies here. The General Counsel must be able to see exactly which structured contract features drove a risk score. "The model attended to certain tokens" is not an acceptable explanation in a legal review context. (2) The ContractGuard training dataset (12,400 labelled clauses) is not large enough to fine-tune an LLM without significant overfitting risk. XGBoost on structured features + Gemini embeddings achieves 0.95 recall at 4,800-contract scale with far lower overfitting risk. (3) The hybrid approach is the best of both worlds: Gemini's semantic understanding is captured in the text-embedding-004 vector, and XGBoost's structured feature interpretability is preserved in the TreeExplainer SHAP values.

Accepted · Phase ML Design · Page 06 ADR-010

ADR-CG02 — ContractGuard specific

Vertex AI Vector Store over BigQuery vector search for precedent lookup

BigQuery's VECTOR_SEARCH function was evaluated as an alternative to Vertex AI Vector Store for precedent contract lookup. The BigQuery approach has the advantage of keeping all data in one system — vectors stored alongside the contract metadata in BigQuery. Rejected for two reasons: (1) BigQuery VECTOR_SEARCH operates on the full dataset on each call — there is no approximate nearest-neighbour index for small-to-medium corpora. Vertex AI Vector Store uses an ANN index that returns top-k results in under 100ms for a corpus of 50,000 contracts. At ClaraVis's contract volume (growing towards 10,000 contracts per year), BigQuery VECTOR_SEARCH latency degrades as the corpus grows. (2) Vertex AI Vector Store has first-class integration with Vertex AI Feature Store and the Gemini embedding API — the embedding pipeline, the index, and the similarity query are all managed in one place. The operational overhead of maintaining a BigQuery vector search pipeline separately from the Vertex AI ML infrastructure is not justified by the cost saving.

Accepted · Phase Agent Design · ContractGuard module

Stakeholder Rebuttals

Six objections. Each with an architectural answer.

ContractGuard generates more stakeholder questions than any other module — because it operates in a domain where the stakes of being wrong (a bad contract term missed) are high and visible, and where the technology (LLM-based analysis) is the most unfamiliar to the people who matter most.

CTO · S-01

Why not just use Gemini for everything — including the risk score?

"You're already using Gemini for the semantic analysis. Why add a separate XGBoost risk model and a separate SHAP layer? Couldn't Gemini just output a risk score directly?"

Architectural response

Gemini can output a risk score. The problem is that Gemini's risk score is not auditable in the way EU AI Act Article 13 requires. If the General Counsel approves a contract clause based on a risk assessment, and an auditor later asks "what drove that risk score?", the answer cannot be "the model paid attention to certain tokens." The answer must be: "liability_cap_ratio was 0.50 versus the standard 1.0, governing_law_match was false, and indemnification_asymmetry was 0.78 — these three structured features produced a SHAP attribution of +0.41, +0.24, and +0.16 respectively." That answer requires a structured model with deterministic SHAP values. Gemini is used for the semantic reasoning that a structured model cannot do. XGBoost is used for the explainable scoring that a black-box model cannot provide. The two-pass architecture is not redundancy — it is a separation of concerns between what each system does well.

Evidence: ADR-CG01 (two-pass rationale) · ADR-010 (XGBoost over LLM for SHAP) · Page 06 SHAP faithfulness validation · EU AI Act Art. 13

CCO · S-02

What if Gemini hallucinates a clause risk assessment?

"LLMs hallucinate. If Gemini incorrectly identifies a clause as high-risk and the General Counsel rejects a good contract based on that, or misses a genuine risk, the liability is ours. How do you prevent this?"

Architectural response

Gemini does not produce the risk score — the XGBoost model does. Gemini's role in Pass 2 is semantic clause classification (what type of clause is this, what are the key terms) and counter-position drafting. The risk score that determines whether a clause goes to HITL-02 is produced by a deterministic XGBoost classifier on structured features — it cannot hallucinate because it does not generate text. The hallucination risk is in the counter-position draft, which is explicitly labelled "Gemini-generated draft — not sent to counterparty — requires Legal approval before use." The HITL-02 interface makes this warning unavoidable. Additionally, the SHAP faithfulness validation in the Vertex AI Pipeline (Page 06) ensures that the risk model's SHAP values are faithful to its actual predictions before promotion — unfaithful explanations cannot reach production.

Evidence: ADR-CG01 (Gemini for reasoning, XGBoost for scoring) · HITL-02 UI mockup (counter-proposal warning label) · Page 06 SHAP faithfulness gate · ADR-010

General Counsel

What is the legal standing of the Gemini-generated counter-position?

"ContractGuard generates a draft counter-position. If that draft contains an error — a clause that inadvertently weakens our position — and it gets sent to the counterparty before Legal reviews it, who is liable? The system? Siddharthan? Me?"

Architectural response

The counter-position draft cannot be sent to the counterparty without a committed HITL-02 approval record on that specific clause. The architecture enforces this in the same pattern as RevRec AI's SAP write guard: the generate_and_send_counter() tool requires a hitl_id parameter that corresponds to a HITL-02 approval decision of REQUEST_REVISION or APPROVE_COUNTER — not a pending or null state. The Firestore HITL record must exist and must have a human decision recorded before the counter-position is released from draft status. The HITL-02 UI labels the draft explicitly as "requires Legal approval before transmission" in a warning that is visually unmissable. The General Counsel's approval is not a rubber stamp — it is a required gate that is physically enforced by the architecture, not assumed by convention.

Evidence: HITL-02 UI mockup (warning label · approval gate) · Page 05 ContractGuard tool manifest (generate_counter requires approved HITL record) · ADR-R01 pattern (write guard as function signature)

Enterprise Architect · S-08

What happens to contracts above 900K tokens?

"You've set a 900K token limit. ClaraVis has multi-year framework agreements and master service agreements that run to hundreds of pages. What happens when ContractGuard encounters a contract it can't process?"

Architectural response

Contracts above 900K tokens — approximately 675 pages at standard legal formatting — are extremely rare in ClaraVis's commercial portfolio. MRI purchase agreements typically run 40–80 pages (50–100K tokens). The 900K limit covers 99%+ of ClaraVis's standard contract types. For the rare exception: the agent detects the token count during Pass 1 before attempting the Gemini call. It routes the contract to HITL-03 with the Document AI clause list as a starting point and a note that full Gemini analysis was not performed due to size. The Legal team receives the structured clause extraction from Pass 1 (which has no token limit — it operates on the document structure, not the token count) and performs the semantic review manually. This is the correct fallback: partial automation with a clear human handoff, not a silent failure or a degraded result passed to Legal without a warning.

Evidence: Page 05 ContractGuard agent spec (900K token threshold · HITL-03 fallback) · two-pass architecture (Document AI has no token limit · Gemini limit is Pass 2 only) · ERROR state machine branch

CISO · S-09

Contract documents contain sensitive commercial terms — where do they live?

"These are live customer contracts with pricing, liability terms, and commercial commitments. Where exactly does the contract PDF live in your architecture, and who can access it?"

Architectural response

The contract PDF is stored in a dedicated GCS bucket (claravis-contracts-eu) in europe-west3. The bucket has CMEK encryption using ClaraVis's own Cloud KMS key — Google cannot access the contents. DLP scanning runs automatically on every object creation to detect and flag PII before the file is passed to Document AI or Gemini. The bucket's IAM policy has three principals with read access: the ContractGuard service account (cg-sa@), the Cloud DLP service account, and a Legal team read-only role for direct access. No other principal — including developers and the Orchestrator SA — has bucket access. The VPC-SC perimeter prevents any data from leaving europe-west3. The contract content is never written to BigQuery or Firestore — only the structured analysis output (clause types, risk scores, SHAP values, HITL decisions) is persisted in those stores. The raw contract PDF stays in GCS and is deleted after a configurable retention period.

Evidence: Page 07 GCS bucket IAM · CMEK configuration · DLP scan · VPC-SC perimeter · ContractGuard agent tool manifest (gcs.read_document — read-only)

CFO · S-03

2–3 minutes per contract — what does this cost at scale?

"If ContractGuard processes 200 contracts per month, at 2–3 minutes per contract plus Gemini token costs on a 100-page document, what is the monthly infrastructure cost? Is this better than having a paralegal do it?"

Architectural response

At 200 contracts per month, approximate monthly cost: Document AI at €0.08 per contract × 200 = €16. Gemini 1.5 Pro at approximately €0.35 per 1K tokens for a 75K-token contract = €26.25 per contract × 200 = €5,250. XGBoost inference + SHAP: negligible. Cloud Run processing: approximately €40. Vector Store lookups: approximately €10. Total: approximately €5,316 per month. A paralegal doing equivalent clause-level analysis across 200 contracts at 3 hours per contract × €35 per hour = €21,000 per month — not counting the variability in quality, the lack of SHAP audit trail, and the absence of precedent lookup consistency. The architecture is approximately 4× cheaper at this volume, produces a consistent auditable output, and frees the Legal team to focus on negotiation judgment rather than extraction work. The Gemini token cost is the dominant variable — it scales linearly with contract volume and is the number to monitor in the FinOps dashboard.

Evidence: Latency breakdown panel above · Page 07 FinOps cost allocation (module: contractguard tag) · Page 04 FRD-02 (Legal team time saved) · GreenOps (batch Gemini calls deferrable to low-carbon window)

Demo Pathway

Three minutes. One contract. Two passes. One HITL moment.

The same München contract from the RevRec AI demo — showing the continuity between modules. Upload the PDF, watch both passes execute, open the HITL-02 Legal queue with the flagged liability cap clause, show the counter-position draft, approve with amendment.

00

Setup · 30 seconds before

Have the München contract PDF ready — show it's a real document

Open the contract PDF (Universitätsklinikum München MRI-7T purchase agreement). Scroll through it briefly — show it's a real 47-clause document, approximately 60 pages. Point out Clause 14 (liability cap at 50%). This establishes that ContractGuard is processing a realistic contract, not a toy example. Navigate to the GCS contract store bucket in the GCP Console — show it's empty before upload.

GCS contract bucket Contract PDF

01

Upload · 0:00

Upload the contract PDF to GCS — trigger the pipeline

Upload the München contract PDF to the GCS contract store bucket. Show the Pub/Sub monitoring console — the gcs-object-trigger message appears within 2 seconds. Show the Cloud Run logs beginning: "ContractGuard agent triggered — contract_id: sfdc_contract_CV2026_0042".

"The moment the contract lands in GCS, ContractGuard starts. No one needs to trigger it manually, no one needs to route it — the upload event is the trigger. Watch the logs."

GCS upload Pub/Sub trigger Cloud Run agent start

02

Pass 1 · 0:10

Watch Document AI parse the contract in real time

Show the Cloud Run logs: "Pass 1: Document AI processing…", "47 clauses extracted · typed · positioned". Show the Document AI output — a JSON structure with each clause's text, position, and type. This takes approximately 40 seconds. Point out that this is deterministic — the same contract always produces the same clause list.

"Pass 1 is Document AI — fast, structured, deterministic. 47 clauses in 40 seconds. This is the foundation that Pass 2 operates on. Without this, Gemini would have to extract structure from unstructured text — which is slower, more expensive, and less reliable."

Document AI Form Parser Structured clause JSON

03

Pass 2 · 0:55

Watch Gemini analyse the full contract — then the risk model score each clause

Show the Cloud Run logs continuing: "Pass 2: Gemini 1.5 Pro analysis — 68,420 tokens…", "Semantic analysis complete — 3 clauses flagged", "XGBoost scoring clause 14: risk 0.82", "SHAP computed — top 3 features: liability_cap_ratio +0.41, governing_law_match +0.24, indemnification_asymmetry +0.16". This is approximately 75 seconds for Gemini plus 8 seconds for the risk model and SHAP. Show the latency breakdown panel.

"Pass 2 is Gemini doing what it's best at — reading the whole document and understanding what each clause actually means, not just what type it is. The risk model then scores each flagged clause using structured features. Notice SHAP is computed and written to BigQuery before the HITL queue is created — the explanation is in the audit trail before Legal ever opens the queue."

Gemini 1.5 Pro · 68K tokens XGBoost risk model SHAP TreeExplainer BigQuery shap_explanations

04

HITL-02 · 2:10 — the moment

Open the Legal team's HITL-02 queue — the full interface

Navigate to the HITL Approval UI. Open the Legal team's queue — three items (3 flagged clauses). Open the highest-risk: Clause 14, liability cap, risk 0.82. Show the full interface: clause text, SHAP attribution bars (liability_cap_ratio is the dominant feature), three precedent contracts with their outcomes, and the Gemini-generated counter-position draft clearly labelled as requiring approval before sending.

"This is what HITL-02 looks like. The General Counsel opens the queue and immediately sees: what the clause says, why the model flagged it (in structured feature language — not 'the model thinks this is risky'), what happened in three similar contracts in the past, and a draft counter-position already prepared. The analysis work is done. Their job is judgment — do we accept, request a revision, or escalate to external counsel? Click Request Revision, enter the reason code."

HITL Approval UI Firestore HITL state Vector Store precedents Gemini counter-position draft

05

Counter-Proposal · 2:50

Show the approved counter-position and the Salesforce write-back

After Legal approves the counter-position (with the amendment from 50% to 80% liability cap), show the Cloud Run logs: "HITL-02 decision received: REQUEST_REVISION", "Generating counter-proposal with approved amendments…", "Updating Salesforce Contract object — risk_summary + HITL ref + counter_proposal_url". Navigate to Salesforce — show the Contract object now has a risk_score field (0.82), an Activity log entry with the ContractGuard analysis reference, and a link to the counter-proposal document.

"The counter-proposal only generated after the Legal approval was committed. You cannot call generate_and_send_counter() without a HITL record. And the Salesforce Contract object now has a full audit trail — risk score, HITL reference, and the link to the counter-proposal that's ready to send to the counterparty."

Firestore immutable HITL write Gemini counter-proposal Salesforce Contract update

06

Optional · 3:00

Query the audit trail — show the full evidence package

Open BigQuery. Run a join query on shap_explanations and hitl_events for contract CV2026-0042. Show: the three SHAP attributions for Clause 14, the Legal team decision, the approver identity, and the timestamp. This is the EU AI Act Article 13 + 14 evidence package for this contract's risk analysis. Same pattern as the RevRec AI demo — shows consistency across the suite.

"Same audit trail pattern as RevRec AI. Every module in this suite writes its SHAP explanations and HITL decisions to the same BigQuery audit dataset. One query covers the compliance evidence for any module, any decision, any point in time."

BigQuery ae_audit shap_explanations · hitl_events join