A document analysis system that uses vision-language models (VLMs) to extract technical specifications from PDFs with full provenance tracking.
The Tech Diligence Report Generator processes PDF documents by:
- Extracting text from each page
- Converting each page to a high-quality image for VLM analysis
- Using GPT-4o vision to analyze both text and visual content (diagrams, tables, schematics)
- Tracking provenance (source document, page number, relevant text) for every claim
PdfImageExtractor (app/services/pdf_image_extractor.rb)
- Converts PDF pages to PNG images using
pdftoppm(poppler-utils) - 150 DPI for good quality while keeping file sizes manageable
- Supports single page or batch extraction
DocumentProcessingJob (app/jobs/document_processing_job.rb)
- Extracts text using
pdf-readergem - Generates page images and stores as Active Storage blobs
- Stores references in
page_imagesJSON column:{ "1" => "signed_blob_id", ... }
Agent (app/agents/tech_diligence_agent.rb)
- Inherits from
ApplicationAgentwithhas_contextfor provenance tracking - Supports vision API via
api_version: :chat - Four main actions:
answer_questions- Answer specific technical questionsanalyze_pages- Deep analysis of specific pagesextract_specs- Extract structured specificationsverify_claims- Verify claims against document evidence
Views (app/views/tech_diligence_agent/)
answer_questions.text.erb- Q&A prompt with citation requirementsanalyze_pages.text.erb- Page analysis promptextract_specs.text.erb- Structured spec extractionverify_claims.text.erb- Claim verification prompt
AgentFragment (app/models/agent_fragment.rb)
- New fragment types:
citation,spec_extraction - Metadata JSON includes:
page_number- Source pagesource_document- Document filenameconfidence- Extraction confidence score
- Scopes:
citations,with_page_reference(page_num)
POST /assistants/tech_diligence/questions
document_id: integer (required)
questions: array of strings
use_vision: boolean (default: true)
POST /assistants/tech_diligence/pages
document_id: integer (required)
page_numbers: array of integers
focus_areas: array of strings
POST /assistants/tech_diligence/specs
document_id: integer (required)
spec_categories: array (default: [cpu, memory, storage, connectivity, power])
POST /assistants/tech_diligence/verify
document_id: integer (required)
claims: array of strings
All endpoints return { stream_id: string } for ActionCable streaming.
# Process a PDF (extract text + page images)
rake tech_diligence:process_pdf[/path/to/motherboard_manual.pdf]
# Ask questions about a document
rake tech_diligence:ask[123,"What is the CPU socket type?"]
# Extract all specifications
rake tech_diligence:extract_specs[123]
# Run full test suite with example questions
rake tech_diligence:test_motherboard[123]# Load and process a document
document = Document.find(123)
# Ask questions with vision
agent = TechDiligenceAgent.with(
document: document,
questions: [
"What is the CPU socket type?",
"What is the maximum RAM speed?",
"Does it have optical audio?"
],
use_vision: true
)
response = agent.answer_questions.generate
puts response.message.content
# Check provenance
agent.context.fragments.citations.each do |citation|
puts "Page #{citation.source_page_number}: #{citation.generated_content}"
endThe agent is instructed to cite sources in a parseable format:
The CPU socket type is LGA 1700 [Source: motherboard_manual.pdf, Page 12]
These citations are automatically parsed and stored as AgentFragment records with full metadata.
pdftoppm(from poppler-utils):brew install popplerpdf-readergem (already in Gemfile)image_processinggem (already in Gemfile)- OpenAI API key with GPT-4o vision access
- Line-level provenance - Track specific line numbers within pages
- Confidence scoring - ML-based confidence for extracted claims
- Multi-document analysis - Compare specs across multiple documents
- Export to structured formats - JSON, CSV, or custom report templates