Architecture Documentation

Comprehensive architecture reference for the PII Agent project.

1. Executive Summary

PII Agent is an AI agent demonstration that queries a PostgreSQL database containing personally identifiable information through natural language. The project compares multiple agent frameworks — Agno and Microsoft Agent Framework — against the same database, providing a practical benchmark for evaluating how different frameworks handle tool use, schema discovery, and conversational interaction.

The architecture follows a clear separation: the data/ layer provisions the database independently, while agent implementations in src/ each connect to it using their own patterns.

2. High-Level Architecture

graph TB
    subgraph "Agent Layer (src/)"
        AG[Agno Agent<br/>src/agno/db-agent.py]
        MS[MSFT Agent<br/>src/msft-agent-framework/data-agent.py]
        LG[LangGraph Agent<br/>src/langgraph/<br/>planned]
    end

    subgraph "LLM Gateway"
        LP[LiteLLM Proxy]
    end

    subgraph "Database Layer (data/)"
        DC[Docker Compose<br/>docker-compose.yml]
        PG[(PostgreSQL 16<br/>pii_demo)]
        CSV[CSV Loader<br/>load_csv.py]
        ORM[SQLAlchemy ORM<br/>model.py]
        SD[Sample Data<br/>sample-pii-data.csv]
    end

    AG -->|LiteLLM client| LP
    MS -->|OpenAIChatClient| LP
    LG -.->|planned| LP

    LP -->|route to model| LLM[Claude / GPT-4o / ...]

    AG -->|PostgresTools| PG
    MS -->|psycopg direct| PG

    CSV --> ORM
    ORM --> PG
    SD --> CSV
    DC --> PG

3. Repository Structure

pii-agent/
├── data/                              # Database provisioning (standalone)
│   ├── __init__.py
│   ├── docker-compose.yml             # Postgres 16 container
│   ├── model.py                       # SQLAlchemy ORM: Base + Person
│   ├── load_csv.py                    # Idempotent CSV-to-DB loader
│   ├── sample-pii-data.csv            # 30 rows of synthetic PII
│   ├── requirements.txt               # sqlalchemy, psycopg[binary], pytest
│   └── tests/
│       ├── __init__.py
│       ├── test_model.py              # Schema verification (SQLite)
│       └── test_load_csv.py           # Loader behavior (SQLite)
├── src/
│   ├── agno/                          # Agno framework agent
│   │   ├── __init__.py
│   │   ├── db-agent.py                # Agent + PostgresTools + LiteLLM
│   │   └── prompts.py                 # System prompt (unused by agent)
│   ├── msft-agent-framework/          # Microsoft Agent Framework agent
│   │   ├── data-agent.py              # Tools + Agent + async REPL
│   │   ├── test_tools.py              # Smoke tests (live DB required)
│   │   └── requirements.txt           # agent-framework, psycopg, dotenv
│   └── langgraph/                     # Future implementation
│       └── __init__.py
├── docs/
│   └── plans/                         # Design and implementation plans
├── .env.sample                        # LLM credential template
├── .gitignore                         # __pycache__, .pyc, .venv, .env
└── CLAUDE.md                          # Project guidance for Claude Code

Key design decision: The data/ layer is self-contained. It has its own requirements.txt, tests, and Docker Compose file. Agent implementations depend on the database but not on the data/ Python code at runtime.

4. Database Layer

4.1 Docker Setup

PostgreSQL 16 runs in a Docker container defined in data/docker-compose.yml:

Setting	Value
Image	`postgres:16`
User	`postgres`
Password	`MyDbPassword`
Database	`pii_demo`
Port	`5432:5432`
Volume	`pgdata` (persistent named volume)

4.2 SQLAlchemy ORM (`data/model.py`)

The Person class maps to the persons table with 16 string columns:

Category	Columns
Identity	`id` (PK), `fname`, `lname`, `maiden_name`, `gender`, `birthdate`
Contact	`address`, `city`, `state`, `zip`, `phone`, `email`
Financial	`cc_type`, `cc_number`, `cc_cvc`, `cc_expiredate`

All columns are Mapped[str] — intentionally strings, including dates and numeric-looking values. This avoids type conversion complexity in a demo (e.g., ZIP codes like 94025 would lose leading zeros as integers).

4.3 CSV Loader (`data/load_csv.py`)

load_csv(engine, csv_path) is idempotent:

Creates the table schema via Base.metadata.create_all(engine)
Reads CSV rows with csv.DictReader
For each row, checks if Person.id already exists
Inserts only new rows, commits, returns the insert count

The main() function reads DATABASE_URL from the environment (default: postgresql+psycopg://postgres:MyDbPassword@localhost:5432/pii_demo).

4.4 Sample Data

data/sample-pii-data.csv contains 30 rows of synthetic PII data covering names, addresses, phone numbers, email addresses, and credit card information.

5. Agent Implementations

5.1 Agno Agent (`src/agno/db-agent.py`)

Pattern: Single-shot query execution using framework-managed tools.

Developer → Agent → PostgresTools (internal) → PostgreSQL → Response

Uses Agno's built-in PostgresTools, which handles schema discovery and SQL execution internally
LLM: claude-sonnet-4-5 via LiteLLM proxy
No session management — each run is independent
Output: Markdown-formatted response printed to stdout

Key characteristics:

~50 lines of code
Minimal boilerplate — the framework handles tool orchestration
Schema discovery happens internally within PostgresTools
No streaming — blocks until the full response is ready

5.2 MSFT Agent Framework (`src/msft-agent-framework/data-agent.py`)

Pattern: Interactive REPL with explicit schema-discovery tools and streaming output.

User Input → REPL → Agent Session → Tools:
  1. get_tables()      → Discover schema
  2. get_columns(name) → Get column info
  3. run_query(sql)    → Execute SQL
→ Stream Output → User

Three hand-written tools using psycopg directly:

Tool	Purpose	SQL Used
`get_tables()`	List all public tables	`information_schema.tables`
`get_columns(table_name)`	Get column names and types	`information_schema.columns`
`run_query(sql)`	Execute arbitrary SQL	User/LLM-provided

Key characteristics:

~115 lines of code
Explicit schema discovery — the LLM calls get_tables then get_columns before querying
Session persistence — context carries across turns in the REPL
Streaming output — responses appear incrementally
All tools have approval_mode="never_require" (auto-execute)

6. LLM Gateway Integration

Both agents connect to LLMs through a LiteLLM proxy, configured via environment variables:

Variable	Purpose
`LLM_API_KEY`	Authentication for the LiteLLM proxy
`LLM_BASE_URL`	URL of the LiteLLM proxy endpoint
`LLM_MODEL_ID`	Model identifier to pass to the proxy

graph LR
    AG[Agno Agent] -->|LiteLLM client| P[LiteLLM Proxy]
    MS[MSFT Agent] -->|OpenAIChatClient| P
    P -->|route| M1[claude-sonnet-4-5]
    P -->|route| M2[gpt-4o]
    P -->|route| M3[Other Models...]

The Agno agent hardcodes claude-sonnet-4-5 as the model. The MSFT agent reads LLM_MODEL_ID from the environment with gpt-4o as a fallback.

7. Data Flow Diagrams

7.1 Database Provisioning Flow

sequenceDiagram
    participant Dev as Developer
    participant DC as Docker Compose
    participant PG as PostgreSQL
    participant CSV as load_csv.py
    participant ORM as SQLAlchemy ORM

    Dev->>DC: docker compose up -d
    DC->>PG: Start postgres:16 container
    PG-->>DC: Container healthy, port 5432 open

    Dev->>CSV: python -m data.load_csv
    CSV->>ORM: create_engine(DATABASE_URL)
    CSV->>ORM: Base.metadata.create_all(engine)
    ORM->>PG: CREATE TABLE IF NOT EXISTS persons (...)

    CSV->>CSV: Open sample-pii-data.csv (30 rows)

    loop For each row
        CSV->>PG: SELECT WHERE id = row['id']
        PG-->>CSV: None (not found)
        CSV->>PG: session.add(Person(**row))
    end

    CSV->>PG: session.commit()
    CSV-->>Dev: "Loaded 30 new rows into persons table."

    Note over Dev,PG: Re-running returns "Loaded 0 new rows"

7.2 Agno Agent Query Flow

sequenceDiagram
    participant Dev as Developer
    participant AG as Agno Agent
    participant PT as PostgresTools
    participant LM as LiteLLM Proxy
    participant PG as PostgreSQL

    Dev->>AG: agent.print_response("Get all men's info")

    AG->>LM: Chat completion (system + user + tool schemas)
    LM-->>AG: tool_call: list_tables()

    AG->>PT: PostgresTools.list_tables()
    PT->>PG: SELECT table_name FROM information_schema...
    PG-->>PT: ["persons"]
    PT-->>AG: "persons"

    AG->>LM: Chat completion (history + tool result)
    LM-->>AG: tool_call: run_query("SELECT * FROM persons WHERE gender = 'm'")

    AG->>PT: PostgresTools.run_query(sql)
    PT->>PG: SELECT * FROM persons WHERE gender = 'm'
    PG-->>PT: ~15 rows
    PT-->>AG: Formatted results

    AG->>LM: Chat completion (history + tool result)
    LM-->>AG: Final text response
    AG-->>Dev: Printed Markdown response

7.3 MSFT Agent Schema Discovery Flow

sequenceDiagram
    participant User as User (REPL)
    participant AG as MSFT Agent
    participant LM as LiteLLM Proxy
    participant PG as PostgreSQL

    User->>AG: "How many people are in the database?"

    AG->>LM: Chat completion (stream)
    LM-->>AG: tool_call: get_tables()
    AG->>PG: SELECT table_name FROM information_schema...
    PG-->>AG: [("persons",)]
    AG->>LM: tool result: "persons"

    LM-->>AG: tool_call: get_columns("persons")
    AG->>PG: SELECT column_name, data_type...
    PG-->>AG: 16 rows
    AG->>LM: tool result: "id (character varying)\nfname (character varying)\n..."

    LM-->>AG: tool_call: run_query("SELECT COUNT(*) FROM persons;")
    AG->>PG: SELECT COUNT(*) FROM persons
    PG-->>AG: [(30,)]
    AG->>LM: tool result: "count\n30"

    LM-->>AG: Stream: "There are 30 people..."
    AG-->>User: "There are 30 people in the database."

    Note over User,PG: Second turn skips schema discovery (session retains context)

8. Agent Comparison

Dimension	Agno Agent	MSFT Agent Framework
File	`src/agno/db-agent.py`	`src/msft-agent-framework/data-agent.py`
Interaction	Single-shot	Interactive async REPL
Session/memory	Stateless	Session persists across turns
Streaming	No	Yes
Database access	Framework `PostgresTools`	Hand-written `psycopg` tools
Tool granularity	Opaque (framework-managed)	Three explicit tools
Schema discovery	Internal to PostgresTools	Explicit: LLM calls get_tables → get_columns
LLM adapter	`agno.models.litellm.LiteLLM`	`agent_framework.openai.OpenAIChatClient`
Model	Hardcoded `claude-sonnet-4-5`	Env var `LLM_MODEL_ID` (default: `gpt-4o`)
DB driver	Managed by PostgresTools	`psycopg` (fresh connection per tool call)
Connection config	Constructor kwargs	`DB_CONNINFO` string (libpq format)
Error handling	Framework-managed	Tools return error strings
Lines of code	~50	~115
Observability	Low — reasoning opaque	High — each tool call visible
Best for	Quick scripted queries	Interactive exploration

9. Configuration and Environment

Environment Variables

Configured in .env (copied from .env.sample):

LLM_API_KEY=<API_KEY>
LLM_BASE_URL=<GATEWAY_URL>
LLM_MODEL_ID=<MODEL_ID>

The .env file is listed in .gitignore and must never be committed.

Database Connection Strings

Two formats are used:

SQLAlchemy format (data layer):

postgresql+psycopg://postgres:MyDbPassword@localhost:5432/pii_demo

libpq keyword format (MSFT agent):

host=localhost port=5432 dbname=pii_demo user=postgres password=MyDbPassword

Dependency Management

Uses uv with Python 3.12. Each area has its own requirements:

File	Contents
`data/requirements.txt`	`sqlalchemy`, `psycopg[binary]`, `pytest`
`src/msft-agent-framework/requirements.txt`	`agent-framework>=1.0.0b0`, `psycopg>=3.0`, `python-dotenv>=1.0`
(none — install manually)	`agno`, `python-dotenv` (for Agno agent)

10. Testing Strategy

Tier 1: Unit Tests (`data/tests/`)

Run against in-memory SQLite — no Docker or Postgres required.

File	Tests	What They Verify
`test_model.py`	2	All 16 columns exist; `id` is the primary key
`test_load_csv.py`	3	Row insertion, idempotency, field mapping

python -m pytest data/tests/ -v   # < 1 second, no dependencies

Tier 2: Smoke Tests (`src/msft-agent-framework/test_tools.py`)

Require a running PostgreSQL container with loaded data.

Test	What It Verifies
`test_get_tables_returns_persons`	`persons` table exists
`test_get_columns_returns_expected`	16 columns including `fname`, `email`
`test_run_query_returns_rows`	`SELECT COUNT(*)` returns 30

python -m pytest src/msft-agent-framework/test_tools.py -v   # Requires live DB

Testing Philosophy

Aspect	Unit tests (data/)	Smoke tests (msft/)
Requires Docker	No	Yes
Speed	Fast (< 1s)	Slow (network I/O)
CI-friendly	Yes	Only with DB provisioning
Purpose	ORM correctness, loader logic	Environment readiness

11. Design Decisions

All Columns as Strings

Dates (birthdate, cc_expiredate) and numeric-looking fields (zip, cc_cvc) are stored as VARCHAR. This avoids type conversion errors during CSV loading and prevents data corruption (e.g., ZIP 94025 would lose leading zeros as an integer). Appropriate for a demo; a production system would use typed columns.

Separate `data/` and `src/` Concerns

The database is a dependency of the agents, not a component of them. Benefits:

data/ can be set up once and shared across all agent implementations
data/ tests run without any agent framework installed
Adding a new agent requires no changes to data/

Explicit Schema Discovery (MSFT Agent)

Rather than embedding schema information in the system prompt, the MSFT agent instructs the LLM to call get_tables and get_columns before writing queries. This makes the agent's reasoning process visible and interpretable. The trade-off is two extra database round-trips on the first query (mitigated by session memory on subsequent turns).

Fresh Connections Per Tool Call

The MSFT agent opens a new psycopg.connect() for every tool call rather than maintaining a connection pool. This is simpler and acceptable for a low-concurrency demo. A production implementation would use psycopg_pool.ConnectionPool.

12. Common Commands

# Database
docker compose -f data/docker-compose.yml up -d       # Start Postgres
docker compose -f data/docker-compose.yml down         # Stop Postgres
docker compose -f data/docker-compose.yml down -v      # Stop and delete data
python -m data.load_csv                                 # Load sample data

# Tests
python -m pytest data/tests/ -v                        # Unit tests
python -m pytest src/msft-agent-framework/test_tools.py -v  # Smoke tests

# Agents
python src/agno/db-agent.py                            # Agno (single-shot)
python src/msft-agent-framework/data-agent.py          # MSFT REPL

# Dependencies
uv pip install -r data/requirements.txt
uv pip install -r src/msft-agent-framework/requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architecture Documentation

1. Executive Summary

2. High-Level Architecture

3. Repository Structure

4. Database Layer

4.1 Docker Setup

4.2 SQLAlchemy ORM (`data/model.py`)

4.3 CSV Loader (`data/load_csv.py`)

4.4 Sample Data

5. Agent Implementations

5.1 Agno Agent (`src/agno/db-agent.py`)

5.2 MSFT Agent Framework (`src/msft-agent-framework/data-agent.py`)

6. LLM Gateway Integration

7. Data Flow Diagrams

7.1 Database Provisioning Flow

7.2 Agno Agent Query Flow

7.3 MSFT Agent Schema Discovery Flow

8. Agent Comparison

9. Configuration and Environment

Environment Variables

Database Connection Strings

Dependency Management

10. Testing Strategy

Tier 1: Unit Tests (`data/tests/`)

Tier 2: Smoke Tests (`src/msft-agent-framework/test_tools.py`)

Testing Philosophy

11. Design Decisions

All Columns as Strings

Separate `data/` and `src/` Concerns

Explicit Schema Discovery (MSFT Agent)

Fresh Connections Per Tool Call

12. Common Commands

FilesExpand file tree

ARCHITECTURE.md

Latest commit

History

ARCHITECTURE.md

File metadata and controls

Architecture Documentation

1. Executive Summary

2. High-Level Architecture

3. Repository Structure

4. Database Layer

4.1 Docker Setup

4.2 SQLAlchemy ORM (data/model.py)

4.3 CSV Loader (data/load_csv.py)

4.4 Sample Data

5. Agent Implementations

5.1 Agno Agent (src/agno/db-agent.py)

5.2 MSFT Agent Framework (src/msft-agent-framework/data-agent.py)

6. LLM Gateway Integration

7. Data Flow Diagrams

7.1 Database Provisioning Flow

7.2 Agno Agent Query Flow

7.3 MSFT Agent Schema Discovery Flow

8. Agent Comparison

9. Configuration and Environment

Environment Variables

Database Connection Strings

Dependency Management

10. Testing Strategy

Tier 1: Unit Tests (data/tests/)

Tier 2: Smoke Tests (src/msft-agent-framework/test_tools.py)

Testing Philosophy

11. Design Decisions

All Columns as Strings

Separate data/ and src/ Concerns

Explicit Schema Discovery (MSFT Agent)

Fresh Connections Per Tool Call

12. Common Commands

4.2 SQLAlchemy ORM (`data/model.py`)

4.3 CSV Loader (`data/load_csv.py`)

5.1 Agno Agent (`src/agno/db-agent.py`)

5.2 MSFT Agent Framework (`src/msft-agent-framework/data-agent.py`)

Tier 1: Unit Tests (`data/tests/`)

Tier 2: Smoke Tests (`src/msft-agent-framework/test_tools.py`)

Separate `data/` and `src/` Concerns