Skip to content

misha-chertushkin/project-cogent-2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

25 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

project-cogent-2

This repository provides reference implementation and ADK agent that covers Workstream #2 from project Cogent

Vendor Spend Analysis Agent - Hybrid Search Demo

Status: Production-Ready Reference Implementation Framework: Google Agent Development Kit (ADK) Pattern: Reasoning over Extracted Data (Hybrid Search)

๐ŸŽฏ Overview

This project demonstrates a complete Hybrid Search solution that combines:

  • Structured Data (BigQuery) - Vendor spend records from ERP systems
  • Unstructured Data (Vertex AI Search) - Contract PDFs and legal documents
  • AI Reasoning (ADK Agent) - Intelligent correlation and risk detection

๐ŸŽฏ When to Use This Demo

This repository is a reference implementation for Hybrid Search Reasoning. It is specifically designed for scenarios where transactional data (SQL) must be reconciled against unstructured ground truth (PDFs).

โœ… Ideal Use Cases

  • Source of Truth Reconciliation: Identifying discrepancies between a database (e.g., ERP renewal dates) and legal contracts (e.g., termination clauses).
  • Procurement & Compliance Audits: Automating risk detection in high-value vendor relationships.
  • Heterogeneous Document Analysis: Extracting insights from diverse legal papers that lack a standard template.

๐Ÿš€ Key Value Prop

Unlike basic RAG, this agent doesn't just "find" information; it reasons across systems to detect data "traps" that traditional automation would miss.

๐Ÿ“– Read the Full 'When to Use' Guide for deep-dive discovery questions and strategic selling points.

๐Ÿ—๏ธ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                  Vendor Compliance Agent                     โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚  โ”‚  Orchestration: Correlates data across sources         โ”‚ โ”‚
โ”‚  โ”‚  Tools:                                                 โ”‚ โ”‚
โ”‚  โ”‚    1. get_high_value_vendors (BigQuery)                โ”‚ โ”‚
โ”‚  โ”‚    2. check_contract_compliance (Vertex AI Search)     โ”‚ โ”‚
โ”‚  โ”‚    3. check_contract_expiration (Cross-reference)      โ”‚ โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                       โ”‚                  โ”‚
           โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
           โ”‚   BigQuery           โ”‚  โ”‚ Vertex AI Search     โ”‚
           โ”‚   Dataset            โ”‚  โ”‚ Datastore            โ”‚
           โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค  โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
           โ”‚ 50 Vendor Records    โ”‚  โ”‚ 20 PDF Contracts     โ”‚
           โ”‚ - Vendor ID          โ”‚  โ”‚ - Indemnification    โ”‚
           โ”‚ - Annual Spend       โ”‚  โ”‚ - Warranties         โ”‚
           โ”‚ - DB Renewal Date    โ”‚  โ”‚ - Termination Dates  โ”‚
           โ”‚ - Status             โ”‚  โ”‚ - Legal Clauses      โ”‚
           โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

โœจ Key Features

The "Data Trap" - Risk Detection Demo

The mock data includes a planted scenario:

  • Vendor: Apex Logistics (ID 99)
  • Database Shows: Active, $200M spend, Renewal Date: 2027-01-01
  • Contract PDF Shows: "This agreement shall terminate automatically on December 31, 2024"
  • Result: Agent detects the discrepancy and raises a CRITICAL ALERT

This demonstrates real-world scenarios where:

  • Legacy systems have outdated data
  • Manual entry errors occur
  • Multi-system synchronization fails
  • Fraud or unauthorized spending happens

๐Ÿš€ Quick Start

Prerequisites: Google Cloud Project with billing enabled, gcloud CLI authenticated (gcloud auth login), Python 3.10+

Streamlined Setup (Recommended)

# 1. Configure your Google Cloud project
gcloud config set project <your-project-id>
gcloud auth application-default set-quota-project <your-project-id>

# 2. Install dependencies
make install

# 3. Setup infrastructure (~5-10 mins)
make infra

# 4. Launch the agent playground
make playground

Use the "Golden Queries" from PROMPTS.md to trigger the Apex Trap.

Option 2: End-to-End Demo (with Dynamics 365)

Use this to demonstrate the complete data journey from Dynamics 365 โ†’ GCP โ†’ Agent.

# 1. Configure your Google Cloud project
gcloud config set project <your-project-id>
gcloud auth application-default set-quota-project <your-project-id>

# 2. Configure Dynamics 365 credentials
cd infra
cp .env.example .env
# Edit .env with your D365 credentials
cd ..

# 3. Run end-to-end demo (handles all dependencies automatically)
make demo-e2e

# 4. Launch the agent playground
make playground

The demo-e2e target will:

  • Install all project dependencies (including DVC)
  • Pull contract PDFs and CSV from DVC remote storage
  • Upload data to Dynamics 365 CRM
  • PAUSE for you to demo the D365 UI (press Enter to continue)
  • Download data back from Dynamics 365
  • Create BigQuery dataset and load vendor data
  • Create Vertex AI Search datastore and index contracts

That's it! The agent will analyze vendors and detect the contract expiration trap.

Note: Both workflows automatically detect your project from gcloud config. You can also set PROJECT_ID environment variable to override.

Alternative Testing Options

# Verbose output showing all tool calls
uv run python tests/test_agent_verbose.py

# Full integration test suite
pytest tests/integration/test_agent.py -v

๐Ÿ“ฆ Data Management with DVC

This project uses DVC (Data Version Control) to manage datasets stored in Google Cloud Storage.

End-to-End Demo Workflow

The demo-e2e target demonstrates a complete data journey:

DVC (GCS) โ†’ Local Files โ†’ Dynamics 365 โ†’ Local Files โ†’ GCP (BigQuery + VAIS)

Step-by-step flow:

  1. DVC Pull: Downloads production data from GCS remote storage

    • infra/data/contracts_to_upload/ - Contract PDFs ready for D365
    • infra/data/structured_to_upload/vendor_spend.csv - Vendor records
  2. D365 Upload: Uploads data to Dynamics 365 CRM

    • Creates Account records with vendor metadata
    • Attaches contract PDFs as annotations
    • Creates Invoice records for spend tracking
    • PAUSES for demo of D365 UI
  3. D365 Download: Extracts data back from Dynamics 365

    • Downloads to infra/data/contracts/
    • Generates CSV at infra/data/structured/vendor_spend.csv
    • Preserves original vendor IDs for consistency
  4. GCP Setup: Creates cloud infrastructure

    • Uploads PDFs to GCS bucket
    • Loads CSV into BigQuery table
    • Indexes contracts in Vertex AI Search

Manual DVC Operations

To manually pull data:

# Pull all tracked data
dvc pull

# Pull specific files
dvc pull infra/data/contracts_to_upload.dvc
dvc pull infra/data/structured_to_upload/vendor_spend.csv.dvc

DVC Configuration

  • Remote Storage: gs://project-cogent-2-dvc/dvc-store
  • Tracked Paths:
    • infra/data/contracts_to_upload/ - Original contract PDFs
    • infra/data/structured_to_upload/vendor_spend.csv - Original vendor data

Note: You need access to the GCS bucket project-cogent-2-dvc. Ensure you're authenticated with gcloud auth login and have the necessary permissions.

๐Ÿ“Š What the Demo Shows

Sample Query

"Analyze all vendors with annual spend over $100 million. For each vendor, check their contract compliance and verify that contract termination dates match our database renewal dates. Flag any discrepancies."

Expected Output

โš ๏ธ  CRITICAL ALERT: CONTRACT EXPIRATION MISMATCH!
============================================================
Database shows renewal date: 2027-01-01 (FUTURE)
Contract PDF indicates termination in 2024 (PAST/EXPIRED)
Current date: 2025-12-16

RISK: Active vendor with high spend ($200M) operating under EXPIRED contract!
ACTION REQUIRED: Immediate contract review and legal verification needed.
============================================================

๐Ÿ“ Project Structure

ge-multi-search/
โ”œโ”€โ”€ app/                      # Core Agent Application
โ”‚   โ”œโ”€โ”€ agent.py              # Main ADK agent logic & reasoning
โ”‚   โ”œโ”€โ”€ config.py             # App configuration and environment mapping
โ”‚   โ”œโ”€โ”€ tools.py              # Hybrid search tool definitions (BQ + VAIS)
โ”‚   โ””โ”€โ”€ __init__.py           
โ”œโ”€โ”€ docs/                     # Strategic & Sales Enablement
โ”‚   โ””โ”€โ”€ WHEN_TO_USE.md        # Discovery guide & customer use cases
โ”œโ”€โ”€ evals/                    # Quality & Validation
โ”‚   โ””โ”€โ”€ scenarios.md          # 6 Detailed engineering test cases
โ”œโ”€โ”€ infra/                    # Infrastructure & Data Hydration
โ”‚   โ”œโ”€โ”€ data/
โ”‚   โ”‚   โ”œโ”€โ”€ contracts_to_upload/     # [DVC] Original PDFs for D365 upload
โ”‚   โ”‚   โ”œโ”€โ”€ structured_to_upload/    # [DVC] Original CSV for D365 upload
โ”‚   โ”‚   โ”œโ”€โ”€ contracts/               # Downloaded/Generated contract PDFs
โ”‚   โ”‚   โ””โ”€โ”€ structured/              # Generated vendor_spend.csv
โ”‚   โ”œโ”€โ”€ scripts/              # Automation scripts
โ”‚   โ”‚   โ”œโ”€โ”€ check_datastore.py       # VAIS health check utility
โ”‚   โ”‚   โ”œโ”€โ”€ d365_backfill.py         # Upload data to Dynamics 365
โ”‚   โ”‚   โ”œโ”€โ”€ d365_dump.py             # Download data from Dynamics 365
โ”‚   โ”‚   โ”œโ”€โ”€ generate_contracts.py    # PDF document generation (synthetic)
โ”‚   โ”‚   โ”œโ”€โ”€ setup_bigquery.py        # BQ schema & data hydration
โ”‚   โ”‚   โ””โ”€โ”€ setup_vertex_ai_search.py # VAIS datastore & engine setup
โ”‚   โ”œโ”€โ”€ Makefile              # One-command automation (infra, demo-e2e)
โ”‚   โ”œโ”€โ”€ README.md             # Infrastructure-specific guide
โ”‚   โ”œโ”€โ”€ infrastructure_metadata.json
โ”‚   โ””โ”€โ”€ requirements.txt      # Infrastructure-specific dependencies
โ”œโ”€โ”€ tests/                    # Test Suites
โ”‚   โ”œโ”€โ”€ integration/
โ”‚   โ”‚   โ””โ”€โ”€ test_agent.py         # E2E test for the "Apex Trap"
โ”‚   โ””โ”€โ”€ unit/
โ”‚       โ”œโ”€โ”€ test_agent_verbose.py # Log-heavy tool orchestration test
โ”‚       โ””โ”€โ”€ test_dummy.py
โ”œโ”€โ”€ GEMINI.md                 # Project-specific AI notes
โ”œโ”€โ”€ LICENSE                   # Apache 2.0 License
โ”œโ”€โ”€ PROMPTS.md                # "Greatest Hits" Demo Menu
โ”œโ”€โ”€ pyproject.toml            # Project metadata and dependencies
โ”œโ”€โ”€ README.md                 # Main overview and Quick Start
โ””โ”€โ”€ uv.lock                   # Lockfile for reproducible environments

๐Ÿ“š Documentation

๐Ÿ› Troubleshooting

Common Issues

1. "No vendors found" error

bq query --use_legacy_sql=false \
  "SELECT COUNT(*) FROM \`$PROJECT_ID.vendor_spend_dataset.vendor_spend\`"

Expected: 50

2. "No contract documents found"

Verify the Vertex AI Search datastore exists:

# Check datastore status
uv run python infra/scripts/check_datastore.py

# Check infrastructure metadata
cat infra/infrastructure_metadata.json

If documents are missing, wait 10-15 minutes for initial indexing after running make infra.

3. Agent doesn't detect the trap

Run the verbose test to see all tool calls:

uv run python tests/test_agent_verbose.py

The agent should:

  • Call get_high_value_vendors to retrieve vendors from BigQuery
  • Call search_contracts multiple times for each vendor
  • Flag expired contracts with โš ๏ธ CRITICAL ALERT messages

๐ŸŽฏ Success Criteria

After running this demo, you should see:

โœ… The Main Trap Detected: Apex Logistics ($200M spend)

  • Database shows: renewal_date = 2027-01-01 (future), status = Active
  • Contract PDF shows: "terminate automatically on December 31, 2024"
  • Agent flags: โš ๏ธ CRITICAL ALERT - contract expired nearly a year ago

โœ… Additional Findings:

  • Alpha Systems Inc - expired Oct 2025
  • Premier Logistics Inc - expired Dec 2025
  • Zeta Corporation - expires in 4 days

โœ… Hybrid Search Working:

  • BigQuery provides structured vendor data (spend, renewal dates)
  • Vertex AI Search retrieves unstructured contract terms (actual termination dates)
  • Agent reasoning compares both sources to detect discrepancies

About

This repository provides reference implementation and ADK agent that covers Workstream #2 from project Cogent

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

โšก