Comprehensive architecture reference for the PII Agent project.
PII Agent is an AI agent demonstration that queries a PostgreSQL database containing personally identifiable information through natural language. The project compares multiple agent frameworks — Agno and Microsoft Agent Framework — against the same database, providing a practical benchmark for evaluating how different frameworks handle tool use, schema discovery, and conversational interaction.
The architecture follows a clear separation: the data/ layer provisions the database independently, while agent implementations in src/ each connect to it using their own patterns.
graph TB
subgraph "Agent Layer (src/)"
AG[Agno Agent<br/>src/agno/db-agent.py]
MS[MSFT Agent<br/>src/msft-agent-framework/data-agent.py]
LG[LangGraph Agent<br/>src/langgraph/<br/>planned]
end
subgraph "LLM Gateway"
LP[LiteLLM Proxy]
end
subgraph "Database Layer (data/)"
DC[Docker Compose<br/>docker-compose.yml]
PG[(PostgreSQL 16<br/>pii_demo)]
CSV[CSV Loader<br/>load_csv.py]
ORM[SQLAlchemy ORM<br/>model.py]
SD[Sample Data<br/>sample-pii-data.csv]
end
AG -->|LiteLLM client| LP
MS -->|OpenAIChatClient| LP
LG -.->|planned| LP
LP -->|route to model| LLM[Claude / GPT-4o / ...]
AG -->|PostgresTools| PG
MS -->|psycopg direct| PG
CSV --> ORM
ORM --> PG
SD --> CSV
DC --> PG
pii-agent/
├── data/ # Database provisioning (standalone)
│ ├── __init__.py
│ ├── docker-compose.yml # Postgres 16 container
│ ├── model.py # SQLAlchemy ORM: Base + Person
│ ├── load_csv.py # Idempotent CSV-to-DB loader
│ ├── sample-pii-data.csv # 30 rows of synthetic PII
│ ├── requirements.txt # sqlalchemy, psycopg[binary], pytest
│ └── tests/
│ ├── __init__.py
│ ├── test_model.py # Schema verification (SQLite)
│ └── test_load_csv.py # Loader behavior (SQLite)
├── src/
│ ├── agno/ # Agno framework agent
│ │ ├── __init__.py
│ │ ├── db-agent.py # Agent + PostgresTools + LiteLLM
│ │ └── prompts.py # System prompt (unused by agent)
│ ├── msft-agent-framework/ # Microsoft Agent Framework agent
│ │ ├── data-agent.py # Tools + Agent + async REPL
│ │ ├── test_tools.py # Smoke tests (live DB required)
│ │ └── requirements.txt # agent-framework, psycopg, dotenv
│ └── langgraph/ # Future implementation
│ └── __init__.py
├── docs/
│ └── plans/ # Design and implementation plans
├── .env.sample # LLM credential template
├── .gitignore # __pycache__, .pyc, .venv, .env
└── CLAUDE.md # Project guidance for Claude Code
Key design decision: The data/ layer is self-contained. It has its own requirements.txt, tests, and Docker Compose file. Agent implementations depend on the database but not on the data/ Python code at runtime.
PostgreSQL 16 runs in a Docker container defined in data/docker-compose.yml:
| Setting | Value |
|---|---|
| Image | postgres:16 |
| User | postgres |
| Password | MyDbPassword |
| Database | pii_demo |
| Port | 5432:5432 |
| Volume | pgdata (persistent named volume) |
The Person class maps to the persons table with 16 string columns:
| Category | Columns |
|---|---|
| Identity | id (PK), fname, lname, maiden_name, gender, birthdate |
| Contact | address, city, state, zip, phone, email |
| Financial | cc_type, cc_number, cc_cvc, cc_expiredate |
All columns are Mapped[str] — intentionally strings, including dates and numeric-looking values. This avoids type conversion complexity in a demo (e.g., ZIP codes like 94025 would lose leading zeros as integers).
load_csv(engine, csv_path) is idempotent:
- Creates the table schema via
Base.metadata.create_all(engine) - Reads CSV rows with
csv.DictReader - For each row, checks if
Person.idalready exists - Inserts only new rows, commits, returns the insert count
The main() function reads DATABASE_URL from the environment (default: postgresql+psycopg://postgres:MyDbPassword@localhost:5432/pii_demo).
data/sample-pii-data.csv contains 30 rows of synthetic PII data covering names, addresses, phone numbers, email addresses, and credit card information.
Pattern: Single-shot query execution using framework-managed tools.
Developer → Agent → PostgresTools (internal) → PostgreSQL → Response
- Uses Agno's built-in
PostgresTools, which handles schema discovery and SQL execution internally - LLM:
claude-sonnet-4-5via LiteLLM proxy - No session management — each run is independent
- Output: Markdown-formatted response printed to stdout
Key characteristics:
- ~50 lines of code
- Minimal boilerplate — the framework handles tool orchestration
- Schema discovery happens internally within
PostgresTools - No streaming — blocks until the full response is ready
Pattern: Interactive REPL with explicit schema-discovery tools and streaming output.
User Input → REPL → Agent Session → Tools:
1. get_tables() → Discover schema
2. get_columns(name) → Get column info
3. run_query(sql) → Execute SQL
→ Stream Output → User
Three hand-written tools using psycopg directly:
| Tool | Purpose | SQL Used |
|---|---|---|
get_tables() |
List all public tables | information_schema.tables |
get_columns(table_name) |
Get column names and types | information_schema.columns |
run_query(sql) |
Execute arbitrary SQL | User/LLM-provided |
Key characteristics:
- ~115 lines of code
- Explicit schema discovery — the LLM calls
get_tablesthenget_columnsbefore querying - Session persistence — context carries across turns in the REPL
- Streaming output — responses appear incrementally
- All tools have
approval_mode="never_require"(auto-execute)
Both agents connect to LLMs through a LiteLLM proxy, configured via environment variables:
| Variable | Purpose |
|---|---|
LLM_API_KEY |
Authentication for the LiteLLM proxy |
LLM_BASE_URL |
URL of the LiteLLM proxy endpoint |
LLM_MODEL_ID |
Model identifier to pass to the proxy |
graph LR
AG[Agno Agent] -->|LiteLLM client| P[LiteLLM Proxy]
MS[MSFT Agent] -->|OpenAIChatClient| P
P -->|route| M1[claude-sonnet-4-5]
P -->|route| M2[gpt-4o]
P -->|route| M3[Other Models...]
The Agno agent hardcodes claude-sonnet-4-5 as the model. The MSFT agent reads LLM_MODEL_ID from the environment with gpt-4o as a fallback.
sequenceDiagram
participant Dev as Developer
participant DC as Docker Compose
participant PG as PostgreSQL
participant CSV as load_csv.py
participant ORM as SQLAlchemy ORM
Dev->>DC: docker compose up -d
DC->>PG: Start postgres:16 container
PG-->>DC: Container healthy, port 5432 open
Dev->>CSV: python -m data.load_csv
CSV->>ORM: create_engine(DATABASE_URL)
CSV->>ORM: Base.metadata.create_all(engine)
ORM->>PG: CREATE TABLE IF NOT EXISTS persons (...)
CSV->>CSV: Open sample-pii-data.csv (30 rows)
loop For each row
CSV->>PG: SELECT WHERE id = row['id']
PG-->>CSV: None (not found)
CSV->>PG: session.add(Person(**row))
end
CSV->>PG: session.commit()
CSV-->>Dev: "Loaded 30 new rows into persons table."
Note over Dev,PG: Re-running returns "Loaded 0 new rows"
sequenceDiagram
participant Dev as Developer
participant AG as Agno Agent
participant PT as PostgresTools
participant LM as LiteLLM Proxy
participant PG as PostgreSQL
Dev->>AG: agent.print_response("Get all men's info")
AG->>LM: Chat completion (system + user + tool schemas)
LM-->>AG: tool_call: list_tables()
AG->>PT: PostgresTools.list_tables()
PT->>PG: SELECT table_name FROM information_schema...
PG-->>PT: ["persons"]
PT-->>AG: "persons"
AG->>LM: Chat completion (history + tool result)
LM-->>AG: tool_call: run_query("SELECT * FROM persons WHERE gender = 'm'")
AG->>PT: PostgresTools.run_query(sql)
PT->>PG: SELECT * FROM persons WHERE gender = 'm'
PG-->>PT: ~15 rows
PT-->>AG: Formatted results
AG->>LM: Chat completion (history + tool result)
LM-->>AG: Final text response
AG-->>Dev: Printed Markdown response
sequenceDiagram
participant User as User (REPL)
participant AG as MSFT Agent
participant LM as LiteLLM Proxy
participant PG as PostgreSQL
User->>AG: "How many people are in the database?"
AG->>LM: Chat completion (stream)
LM-->>AG: tool_call: get_tables()
AG->>PG: SELECT table_name FROM information_schema...
PG-->>AG: [("persons",)]
AG->>LM: tool result: "persons"
LM-->>AG: tool_call: get_columns("persons")
AG->>PG: SELECT column_name, data_type...
PG-->>AG: 16 rows
AG->>LM: tool result: "id (character varying)\nfname (character varying)\n..."
LM-->>AG: tool_call: run_query("SELECT COUNT(*) FROM persons;")
AG->>PG: SELECT COUNT(*) FROM persons
PG-->>AG: [(30,)]
AG->>LM: tool result: "count\n30"
LM-->>AG: Stream: "There are 30 people..."
AG-->>User: "There are 30 people in the database."
Note over User,PG: Second turn skips schema discovery (session retains context)
| Dimension | Agno Agent | MSFT Agent Framework |
|---|---|---|
| File | src/agno/db-agent.py |
src/msft-agent-framework/data-agent.py |
| Interaction | Single-shot | Interactive async REPL |
| Session/memory | Stateless | Session persists across turns |
| Streaming | No | Yes |
| Database access | Framework PostgresTools |
Hand-written psycopg tools |
| Tool granularity | Opaque (framework-managed) | Three explicit tools |
| Schema discovery | Internal to PostgresTools | Explicit: LLM calls get_tables → get_columns |
| LLM adapter | agno.models.litellm.LiteLLM |
agent_framework.openai.OpenAIChatClient |
| Model | Hardcoded claude-sonnet-4-5 |
Env var LLM_MODEL_ID (default: gpt-4o) |
| DB driver | Managed by PostgresTools | psycopg (fresh connection per tool call) |
| Connection config | Constructor kwargs | DB_CONNINFO string (libpq format) |
| Error handling | Framework-managed | Tools return error strings |
| Lines of code | ~50 | ~115 |
| Observability | Low — reasoning opaque | High — each tool call visible |
| Best for | Quick scripted queries | Interactive exploration |
Configured in .env (copied from .env.sample):
LLM_API_KEY=<API_KEY>
LLM_BASE_URL=<GATEWAY_URL>
LLM_MODEL_ID=<MODEL_ID>
The .env file is listed in .gitignore and must never be committed.
Two formats are used:
SQLAlchemy format (data layer):
postgresql+psycopg://postgres:MyDbPassword@localhost:5432/pii_demo
libpq keyword format (MSFT agent):
host=localhost port=5432 dbname=pii_demo user=postgres password=MyDbPassword
Uses uv with Python 3.12. Each area has its own requirements:
| File | Contents |
|---|---|
data/requirements.txt |
sqlalchemy, psycopg[binary], pytest |
src/msft-agent-framework/requirements.txt |
agent-framework>=1.0.0b0, psycopg>=3.0, python-dotenv>=1.0 |
| (none — install manually) | agno, python-dotenv (for Agno agent) |
Run against in-memory SQLite — no Docker or Postgres required.
| File | Tests | What They Verify |
|---|---|---|
test_model.py |
2 | All 16 columns exist; id is the primary key |
test_load_csv.py |
3 | Row insertion, idempotency, field mapping |
python -m pytest data/tests/ -v # < 1 second, no dependenciesRequire a running PostgreSQL container with loaded data.
| Test | What It Verifies |
|---|---|
test_get_tables_returns_persons |
persons table exists |
test_get_columns_returns_expected |
16 columns including fname, email |
test_run_query_returns_rows |
SELECT COUNT(*) returns 30 |
python -m pytest src/msft-agent-framework/test_tools.py -v # Requires live DB| Aspect | Unit tests (data/) | Smoke tests (msft/) |
|---|---|---|
| Requires Docker | No | Yes |
| Speed | Fast (< 1s) | Slow (network I/O) |
| CI-friendly | Yes | Only with DB provisioning |
| Purpose | ORM correctness, loader logic | Environment readiness |
Dates (birthdate, cc_expiredate) and numeric-looking fields (zip, cc_cvc) are stored as VARCHAR. This avoids type conversion errors during CSV loading and prevents data corruption (e.g., ZIP 94025 would lose leading zeros as an integer). Appropriate for a demo; a production system would use typed columns.
The database is a dependency of the agents, not a component of them. Benefits:
data/can be set up once and shared across all agent implementationsdata/tests run without any agent framework installed- Adding a new agent requires no changes to
data/
Rather than embedding schema information in the system prompt, the MSFT agent instructs the LLM to call get_tables and get_columns before writing queries. This makes the agent's reasoning process visible and interpretable. The trade-off is two extra database round-trips on the first query (mitigated by session memory on subsequent turns).
The MSFT agent opens a new psycopg.connect() for every tool call rather than maintaining a connection pool. This is simpler and acceptable for a low-concurrency demo. A production implementation would use psycopg_pool.ConnectionPool.
# Database
docker compose -f data/docker-compose.yml up -d # Start Postgres
docker compose -f data/docker-compose.yml down # Stop Postgres
docker compose -f data/docker-compose.yml down -v # Stop and delete data
python -m data.load_csv # Load sample data
# Tests
python -m pytest data/tests/ -v # Unit tests
python -m pytest src/msft-agent-framework/test_tools.py -v # Smoke tests
# Agents
python src/agno/db-agent.py # Agno (single-shot)
python src/msft-agent-framework/data-agent.py # MSFT REPL
# Dependencies
uv pip install -r data/requirements.txt
uv pip install -r src/msft-agent-framework/requirements.txt