Quick-start guide to run and test every feature of the pipeline.
# Python 3.13 required (ChromaDB does not support 3.14)
python3 --version # should show 3.13.x
# Anthropic API key set in .env
cat .env # should contain ANTHROPIC_API_KEY=sk-ant-...cd agentic-outreach-pipeline
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txtpython -c "
import sys; sys.path.insert(0, 'src')
from orchestrator import Orchestrator
from knowledge.product_loader import COMPANY_CONFIG, PRODUCTS_CONFIG
from context import ContextManager
print(f'Company: {COMPANY_CONFIG[\"name\"]}')
print(f'Products: {list(PRODUCTS_CONFIG[\"products\"].keys())}')
print(f'Case studies: {len(PRODUCTS_CONFIG[\"case_studies\"])}')
print('All imports OK')
"Expected: Company name, product keys, case study count, and "All imports OK".
python src/main.py plan "Mueller Automotive GmbH, Germany"Expected: Structured log output showing the pipeline plan:
pipeline_plan target='Mueller Automotive GmbH'
plan_step agent=researcher criticality=required group=0
plan_step_parallel agents='Analyst + Architect' group=1
plan_step agent=analyst criticality=optional group=1
plan_step agent=architect criticality=required group=1
plan_step agent=scorer criticality=optional group=2
plan_step agent=writer criticality=required group=3
This verifies the orchestrator and models work without spending any API credits.
python -c "
import sys; sys.path.insert(0, 'src')
from tools.knowledge_query import KnowledgeQueryTool
tool = KnowledgeQueryTool()
result = tool.run(query='energy monitoring for stamping factories')
print(result[:500])
"Expected: Relevant product knowledge matching the query — product features, case studies, or ideal customer data. Results include relevance scores from ChromaDB vector similarity search.
First run note: ChromaDB downloads the all-MiniLM-L6-v2 embedding model (~80MB) on first use. Subsequent runs are instant.
python -c "
import sys; sys.path.insert(0, 'src')
from tools.knowledge_query import KnowledgeQueryTool
tool = KnowledgeQueryTool()
# Semantic search: no shared words with 'energy monitoring' but should still match
result = tool.run(query='power consumption analysis for metal forming')
print('=== Semantic match test ===')
print(result[:500])
print()
# Verify ChromaDB collections
from knowledge.store import get_knowledge_store
store = get_knowledge_store()
print(f'Store type: {type(store).__name__}')
print(f'ChromaDB initialized: {hasattr(store, \"_client\")}')
"Expected: Semantic search returns relevant product knowledge about energy/power monitoring even though the query uses different words ("power consumption analysis" vs "energy monitoring"). Store type should be VectorKnowledgeStore.
python -c "
import sys; sys.path.insert(0, 'src')
from tools.web_scraper import WebScraperTool
scraper = WebScraperTool()
result = scraper.run(url='https://httpbin.org/html')
print(result[:300])
"Expected: Clean text from the page. Errors return strings like Error: Could not connect..., not crashes.
python src/main.py proposal "Koelle GmbH, Germany"What to watch for:
▸ Researcher— multi-turn agentic research starts- Tool calls:
search_web(...),query_knowledge_base(...),scrape_company_website(...) - Turn progress:
Turn 1/5,Turn 2/5, etc. ✓ Done— researcher complete with token counts▸ Analyst+▸ Architect— running in parallel▸ Scorer— deal estimation▸ Writer— proposal generation- Pipeline summary table with total tokens and duration
Output: Check outputs/proposal_Koelle_GmbH_*.md for the full proposal.
python src/main.py --examplePick 1, 2, or 3 from the menu. Each runs the full proposal pipeline with a pre-configured company.
| # | Prospect | Industry |
|---|---|---|
| 1 | Mueller Automotive GmbH, Germany | Auto parts stamping, 150 presses |
| 2 | Pacific Brass & Copper, USA | Copper fittings, 40+ machines |
| 3 | Vina Precision Parts, Vietnam | Electronics stamping, 80 presses |
python src/main.py --interactiveType a company description (press Enter twice to submit):
Samsung SDI, South Korea
Battery cell manufacturer for EV market
Large-scale production with 200+ machines
ESG compliance critical for European automotive customers
python src/main.py search "metal stamping companies Germany"What happens:
- DuckDuckGo finds companies
- Contact info extracted (emails, phones)
- Deal sizes estimated (structured JSON via tool_choice)
- Rich table displayed — pick companies to pursue
- Full pipeline runs for each selected prospect (researcher + architect + writer)
- Cold emails written and previewed
- Confirm to send via Gmail (or skip)
- Results saved to
outputs/outreach_*.json
Try also:
python src/main.py search "automotive parts manufacturer Japan"
python src/main.py search "copper fittings factory USA"
python src/main.py search "injection molding company Vietnam"streamlit run app.pyOpen http://localhost:8501. The UI has two tabs:
- Select an example prospect from the dropdown (or enter custom input)
- Click "Run Sales Agent Pipeline" — watch all agents run with streaming progress
- Expand Research Brief and Solution Mapping to review intermediate outputs
- Download the proposal as Markdown
- Click "Generate Cold Email" — runs the Deal Estimator + Email Writer
- Edit the email, enter a recipient, and click "Send Email" (requires Gmail in
.env)
- Enter a search query (e.g. "metal stamping companies Germany")
- Set max results and click "Search"
- Review the prospect table (company, industry, email, estimated deal, category)
- Select companies and click "Generate Detailed Proposals"
- Full pipeline runs per company: Researcher → Architect → Writer
- Review proposals, emails, and research for each company
- Download or send emails directly from the UI
pytest tests/ -vExpected: All tests pass. Tests use mocked API calls and don't require an Anthropic API key.
After running tests, check:
ls -la outputs/| File Pattern | From |
|---|---|
proposal_*.md |
Proposal mode — full Markdown proposal |
pipeline_*.json |
Proposal mode — debug output with agent results |
trace_*.json |
Proposal mode — per-agent timing and cost |
outreach_*.json |
Search mode — prospects, deals, emails |
| Issue | Fix |
|---|---|
ModuleNotFoundError |
source venv/bin/activate && pip install -r requirements.txt |
credit balance is too low |
Add credits at https://console.anthropic.com/settings/billing |
No companies found (search) |
Try more specific terms: "CNC machining factory Vietnam" |
| Rate limited during pipeline | Built-in exponential backoff handles this (3 retries, 10-40s waits) |
| Researcher falls back to legacy | API error during multi-turn research — check terminal logs |
| Gmail authentication failed | Use a Google App Password, not your regular password |
| ChromaDB import error | ChromaDB requires Python <=3.13. Recreate venv: python3.13 -m venv venv |
| Slow first knowledge query | ChromaDB downloads all-MiniLM-L6-v2 (~80MB) on first run. Cached at ~/.cache/chroma/ |
| Stale knowledge results | Delete chroma_data/ directory and restart — it auto-recreates and re-seeds from YAML config |
chromadb.errors on startup |
Delete chroma_data/ and let it regenerate: rm -rf chroma_data/ |
Verify everything works without spending API credits:
python -c "
import sys; sys.path.insert(0, 'src')
# 1. Models
from models import PipelineResult, DealEstimate, ContextPacket
print('Models OK')
# 2. Knowledge
from knowledge.product_loader import COMPANY_CONFIG, get_full_product_context
ctx = get_full_product_context()
print(f'Knowledge OK ({len(ctx)} chars, {len(ctx.split(chr(10)))} lines)')
# 3. Tools
from tools.web_search import WebSearchTool
from tools.web_scraper import WebScraperTool
from tools.knowledge_query import KnowledgeQueryTool
print(f'Tools OK (3 tools loaded)')
# 4. ChromaDB vector store
from knowledge.store import get_knowledge_store
store = get_knowledge_store()
print(f'ChromaDB OK (store type: {type(store).__name__})')
# 5. Knowledge query (semantic search)
kb = KnowledgeQueryTool()
result = kb.run(query='defect detection')
print(f'KB query OK ({len(result)} chars)')
# 6. Orchestrator
from orchestrator import Orchestrator
o = Orchestrator(interactive=True)
plan = o.plan('Test Company, Germany')
print(f'Orchestrator OK ({len(plan.steps)} steps)')
# 7. Context
from context import ContextManager
cm = ContextManager()
tokens = cm.count_tokens('hello world')
print(f'Context OK ({tokens} tokens)')
print()
print('All systems operational')
"