This repository provides reference implementation and ADK agent that covers Workstream #2 from project Cogent
Status: Production-Ready Reference Implementation Framework: Google Agent Development Kit (ADK) Pattern: Reasoning over Extracted Data (Hybrid Search)
This project demonstrates a complete Hybrid Search solution that combines:
- Structured Data (BigQuery) - Vendor spend records from ERP systems
- Unstructured Data (Vertex AI Search) - Contract PDFs and legal documents
- AI Reasoning (ADK Agent) - Intelligent correlation and risk detection
This repository is a reference implementation for Hybrid Search Reasoning. It is specifically designed for scenarios where transactional data (SQL) must be reconciled against unstructured ground truth (PDFs).
- Source of Truth Reconciliation: Identifying discrepancies between a database (e.g., ERP renewal dates) and legal contracts (e.g., termination clauses).
- Procurement & Compliance Audits: Automating risk detection in high-value vendor relationships.
- Heterogeneous Document Analysis: Extracting insights from diverse legal papers that lack a standard template.
Unlike basic RAG, this agent doesn't just "find" information; it reasons across systems to detect data "traps" that traditional automation would miss.
๐ Read the Full 'When to Use' Guide for deep-dive discovery questions and strategic selling points.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Vendor Compliance Agent โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Orchestration: Correlates data across sources โ โ
โ โ Tools: โ โ
โ โ 1. get_high_value_vendors (BigQuery) โ โ
โ โ 2. check_contract_compliance (Vertex AI Search) โ โ
โ โ 3. check_contract_expiration (Cross-reference) โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโ
โ โ
โโโโโโโโโโโโโผโโโโโโโโโโโ โโโโโผโโโโโโโโโโโโโโโโโโโ
โ BigQuery โ โ Vertex AI Search โ
โ Dataset โ โ Datastore โ
โโโโโโโโโโโโโโโโโโโโโโโโค โโโโโโโโโโโโโโโโโโโโโโโโค
โ 50 Vendor Records โ โ 20 PDF Contracts โ
โ - Vendor ID โ โ - Indemnification โ
โ - Annual Spend โ โ - Warranties โ
โ - DB Renewal Date โ โ - Termination Dates โ
โ - Status โ โ - Legal Clauses โ
โโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโ
The mock data includes a planted scenario:
- Vendor: Apex Logistics (ID 99)
- Database Shows: Active, $200M spend, Renewal Date: 2027-01-01
- Contract PDF Shows: "This agreement shall terminate automatically on December 31, 2024"
- Result: Agent detects the discrepancy and raises a CRITICAL ALERT
This demonstrates real-world scenarios where:
- Legacy systems have outdated data
- Manual entry errors occur
- Multi-system synchronization fails
- Fraud or unauthorized spending happens
Prerequisites: Google Cloud Project with billing enabled, gcloud CLI authenticated (gcloud auth login), Python 3.10+
# 1. Configure your Google Cloud project
gcloud config set project <your-project-id>
gcloud auth application-default set-quota-project <your-project-id>
# 2. Install dependencies
make install
# 3. Setup infrastructure (~5-10 mins)
make infra
# 4. Launch the agent playground
make playgroundUse the "Golden Queries" from PROMPTS.md to trigger the Apex Trap.
Use this to demonstrate the complete data journey from Dynamics 365 โ GCP โ Agent.
# 1. Configure your Google Cloud project
gcloud config set project <your-project-id>
gcloud auth application-default set-quota-project <your-project-id>
# 2. Configure Dynamics 365 credentials
cd infra
cp .env.example .env
# Edit .env with your D365 credentials
cd ..
# 3. Run end-to-end demo (handles all dependencies automatically)
make demo-e2e
# 4. Launch the agent playground
make playgroundThe demo-e2e target will:
- Install all project dependencies (including DVC)
- Pull contract PDFs and CSV from DVC remote storage
- Upload data to Dynamics 365 CRM
- PAUSE for you to demo the D365 UI (press Enter to continue)
- Download data back from Dynamics 365
- Create BigQuery dataset and load vendor data
- Create Vertex AI Search datastore and index contracts
That's it! The agent will analyze vendors and detect the contract expiration trap.
Note: Both workflows automatically detect your project from
gcloud config. You can also setPROJECT_IDenvironment variable to override.
# Verbose output showing all tool calls
uv run python tests/test_agent_verbose.py
# Full integration test suite
pytest tests/integration/test_agent.py -vThis project uses DVC (Data Version Control) to manage datasets stored in Google Cloud Storage.
The demo-e2e target demonstrates a complete data journey:
DVC (GCS) โ Local Files โ Dynamics 365 โ Local Files โ GCP (BigQuery + VAIS)
Step-by-step flow:
-
DVC Pull: Downloads production data from GCS remote storage
infra/data/contracts_to_upload/- Contract PDFs ready for D365infra/data/structured_to_upload/vendor_spend.csv- Vendor records
-
D365 Upload: Uploads data to Dynamics 365 CRM
- Creates Account records with vendor metadata
- Attaches contract PDFs as annotations
- Creates Invoice records for spend tracking
- PAUSES for demo of D365 UI
-
D365 Download: Extracts data back from Dynamics 365
- Downloads to
infra/data/contracts/ - Generates CSV at
infra/data/structured/vendor_spend.csv - Preserves original vendor IDs for consistency
- Downloads to
-
GCP Setup: Creates cloud infrastructure
- Uploads PDFs to GCS bucket
- Loads CSV into BigQuery table
- Indexes contracts in Vertex AI Search
To manually pull data:
# Pull all tracked data
dvc pull
# Pull specific files
dvc pull infra/data/contracts_to_upload.dvc
dvc pull infra/data/structured_to_upload/vendor_spend.csv.dvc- Remote Storage:
gs://project-cogent-2-dvc/dvc-store - Tracked Paths:
infra/data/contracts_to_upload/- Original contract PDFsinfra/data/structured_to_upload/vendor_spend.csv- Original vendor data
Note: You need access to the GCS bucket
project-cogent-2-dvc. Ensure you're authenticated withgcloud auth loginand have the necessary permissions.
"Analyze all vendors with annual spend over $100 million. For each vendor, check their contract compliance and verify that contract termination dates match our database renewal dates. Flag any discrepancies."
โ ๏ธ CRITICAL ALERT: CONTRACT EXPIRATION MISMATCH!
============================================================
Database shows renewal date: 2027-01-01 (FUTURE)
Contract PDF indicates termination in 2024 (PAST/EXPIRED)
Current date: 2025-12-16
RISK: Active vendor with high spend ($200M) operating under EXPIRED contract!
ACTION REQUIRED: Immediate contract review and legal verification needed.
============================================================
ge-multi-search/
โโโ app/ # Core Agent Application
โ โโโ agent.py # Main ADK agent logic & reasoning
โ โโโ config.py # App configuration and environment mapping
โ โโโ tools.py # Hybrid search tool definitions (BQ + VAIS)
โ โโโ __init__.py
โโโ docs/ # Strategic & Sales Enablement
โ โโโ WHEN_TO_USE.md # Discovery guide & customer use cases
โโโ evals/ # Quality & Validation
โ โโโ scenarios.md # 6 Detailed engineering test cases
โโโ infra/ # Infrastructure & Data Hydration
โ โโโ data/
โ โ โโโ contracts_to_upload/ # [DVC] Original PDFs for D365 upload
โ โ โโโ structured_to_upload/ # [DVC] Original CSV for D365 upload
โ โ โโโ contracts/ # Downloaded/Generated contract PDFs
โ โ โโโ structured/ # Generated vendor_spend.csv
โ โโโ scripts/ # Automation scripts
โ โ โโโ check_datastore.py # VAIS health check utility
โ โ โโโ d365_backfill.py # Upload data to Dynamics 365
โ โ โโโ d365_dump.py # Download data from Dynamics 365
โ โ โโโ generate_contracts.py # PDF document generation (synthetic)
โ โ โโโ setup_bigquery.py # BQ schema & data hydration
โ โ โโโ setup_vertex_ai_search.py # VAIS datastore & engine setup
โ โโโ Makefile # One-command automation (infra, demo-e2e)
โ โโโ README.md # Infrastructure-specific guide
โ โโโ infrastructure_metadata.json
โ โโโ requirements.txt # Infrastructure-specific dependencies
โโโ tests/ # Test Suites
โ โโโ integration/
โ โ โโโ test_agent.py # E2E test for the "Apex Trap"
โ โโโ unit/
โ โโโ test_agent_verbose.py # Log-heavy tool orchestration test
โ โโโ test_dummy.py
โโโ GEMINI.md # Project-specific AI notes
โโโ LICENSE # Apache 2.0 License
โโโ PROMPTS.md # "Greatest Hits" Demo Menu
โโโ pyproject.toml # Project metadata and dependencies
โโโ README.md # Main overview and Quick Start
โโโ uv.lock # Lockfile for reproducible environments
- Infrastructure Setup: infra/README.md
- ADK Documentation: https://github.com/google/adk-python
- Vertex AI Search: https://cloud.google.com/generative-ai-app-builder
- Demo Menu (Golden Queries): PROMPTS.md
- Engineering Test Scenarios: evals/scenarios.md
1. "No vendors found" error
bq query --use_legacy_sql=false \
"SELECT COUNT(*) FROM \`$PROJECT_ID.vendor_spend_dataset.vendor_spend\`"Expected: 50
2. "No contract documents found"
Verify the Vertex AI Search datastore exists:
# Check datastore status
uv run python infra/scripts/check_datastore.py
# Check infrastructure metadata
cat infra/infrastructure_metadata.jsonIf documents are missing, wait 10-15 minutes for initial indexing after running make infra.
3. Agent doesn't detect the trap
Run the verbose test to see all tool calls:
uv run python tests/test_agent_verbose.pyThe agent should:
- Call
get_high_value_vendorsto retrieve vendors from BigQuery - Call
search_contractsmultiple times for each vendor - Flag expired contracts with
โ ๏ธ CRITICAL ALERT messages
After running this demo, you should see:
โ The Main Trap Detected: Apex Logistics ($200M spend)
- Database shows: renewal_date = 2027-01-01 (future), status = Active
- Contract PDF shows: "terminate automatically on December 31, 2024"
- Agent flags:
โ ๏ธ CRITICAL ALERT - contract expired nearly a year ago
โ Additional Findings:
- Alpha Systems Inc - expired Oct 2025
- Premier Logistics Inc - expired Dec 2025
- Zeta Corporation - expires in 4 days
โ Hybrid Search Working:
- BigQuery provides structured vendor data (spend, renewal dates)
- Vertex AI Search retrieves unstructured contract terms (actual termination dates)
- Agent reasoning compares both sources to detect discrepancies