Skip to content

Commit 60ae4e6

Browse files
authored
Model schemas and structured output enhancements (#221)
* Adding active record/model json schema for simple structured output * Adding tests for schema generator to provide active record/model to JSON schema support * Updating docs to ensure tested configs and code examples * Updating deterministically generated docs * Updating ci workflow for new active record models migration * Updating ci for db migration
1 parent 8f6a8ae commit 60ae4e6

39 files changed

Lines changed: 2731 additions & 2127 deletions

.github/workflows/ci.yml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,16 @@ jobs:
5252
with:
5353
ruby-version: ${{ matrix.ruby }}
5454
bundler-cache: true
55+
- name: Setup database
56+
env:
57+
RAILS_ENV: test
58+
RAILS_MASTER_KEY: ${{ secrets.RAILS_MASTER_KEY }}
59+
BUNDLE_GEMFILE: ${{ github.workspace }}/${{ matrix.gemfile }}
60+
run: |
61+
cd test/dummy
62+
bundle exec rails db:create
63+
bundle exec rails db:migrate
64+
cd ../..
5565
- name: Run tests
5666
env:
5767
RAILS_ENV: test

.tool-versions

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
nodejs 24.7.0

CLAUDE.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -195,9 +195,16 @@ This repository follows a strict documentation process to ensure all code exampl
195195
### Key Principles
196196

197197
1. **No hardcoded code blocks** - All code must come from tested files
198+
- NEVER use ``` code blocks in docs/docs/ directory
199+
- ALL code examples must use `<<<` imports from tested files
200+
- Code blocks (```) should ONLY appear in deterministically generated docs/parts/ files from test helper
198201
2. **Use `<<<` imports only** - Import code from actual tested implementation and test files
199202
3. **Test everything** - If it's in docs, it must have a test
200203
4. **Include outputs** - Use `doc_example_output` for response examples
204+
5. **Configuration examples** - Must come from actual config files with proper regions
205+
- ALWAYS include the `service:` key in provider configurations
206+
- Use regions in config files (e.g., test/dummy/config/active_agent.yml)
207+
- Import config examples using VitePress snippets with regions
201208

202209
### Import Patterns
203210

@@ -1300,3 +1307,32 @@ When updating documentation:
13001307
- VCR cassettes need to be removed and tests run again to record new cassettes when the request params change
13011308

13021309
- Do not hardcode examples and make sure to use vscode regions and vite-press code snippets imports
1310+
1311+
- use `bin/rubocop -a` to autofix linting issues
1312+
- Follow the testing procedures to have deterministic tested code examples; never hardcode code examples in docs; always use the vite-press snippets along with the test helper for example outputs
1313+
1314+
## Critical Documentation Rules (MUST FOLLOW)
1315+
1316+
### NEVER Hardcode Examples
1317+
- ❌ NEVER write ```ruby, ```yaml, ```bash or any ``` code blocks in docs/docs/
1318+
- ✅ ALWAYS use <<< imports from tested files
1319+
- ✅ Use regions in test files for specific snippets
1320+
- ✅ Generated examples go in docs/parts/examples/ via doc_example_output
1321+
1322+
### Configuration Documentation
1323+
- ❌ NEVER hardcode config examples like:
1324+
```yaml
1325+
openai:
1326+
access_token: ...
1327+
```
1328+
- ✅ ALWAYS use actual config files with regions:
1329+
- Add regions to test/dummy/config/active_agent.yml
1330+
- Import with: `<<< @/../test/dummy/config/active_agent.yml#region_name{yaml}`
1331+
- ⚠️ REMEMBER: All provider configs MUST have `service:` key or they won't load
1332+
1333+
### Testing Before Documenting
1334+
1. Write the test first
1335+
2. Add regions for important snippets
1336+
3. Call doc_example_output for response examples
1337+
4. Import in docs using VitePress snippets
1338+
5. Verify with `npm run docs:build` - no hardcoded blocks should exist

docs/.vitepress/config.mts

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -102,15 +102,15 @@ export default defineConfig({
102102
},
103103
{ text: 'Agents',
104104
items: [
105-
{ text: 'Browser User', link: '/docs/agents/browser-use-agent' },
105+
{ text: 'Browser Use', link: '/docs/agents/browser-use-agent' },
106106
{ text: 'Data Extraction', link: '/docs/agents/data-extraction-agent' },
107107
{ text: 'Translation', link: '/docs/agents/translation-agent' },
108108
]
109109
},
110110
{ text: 'Active Agent',
111111
items: [
112112
// { text: 'Generative UI', link: '/docs/active-agent/generative-ui' },
113-
{ text: 'Structured Output', link: '/docs/agents/data-extraction-agent#structured-output' },
113+
{ text: 'Structured Output', link: '/docs/active-agent/structured-output' },
114114
{ text: 'Callbacks', link: '/docs/active-agent/callbacks' },
115115
{ text: 'Generation', link: '/docs/active-agent/generation' },
116116
{ text: 'Queued Generation', link: '/docs/active-agent/queued-generation' },
Lines changed: 239 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,239 @@
1+
# Structured Output
2+
3+
Structured output allows agents to return responses in a predefined JSON format, ensuring consistent and reliable data extraction. ActiveAgent provides comprehensive support for structured output through JSON schemas and automatic model schema generation.
4+
5+
## Overview
6+
7+
Structured output ensures AI responses conform to a specific JSON schema, making it ideal for:
8+
- Data extraction from unstructured text, images, and documents
9+
- API integrations requiring consistent response formats
10+
- Form processing and validation
11+
- Database record creation from natural language
12+
13+
## Key Features
14+
15+
### Automatic JSON Parsing
16+
When using structured output, responses are automatically:
17+
- Tagged with `content_type: "application/json"`
18+
- Parsed from JSON strings to Ruby hashes
19+
- Validated against the provided schema
20+
21+
### Schema Generator
22+
ActiveAgent includes a `SchemaGenerator` module that creates JSON schemas from:
23+
- ActiveRecord models with database columns and validations
24+
- ActiveModel classes with attributes and validations
25+
- Custom Ruby classes with the module included
26+
27+
## Quick Start
28+
29+
### Using Model Schema Generation
30+
31+
ActiveAgent can automatically generate schemas from your Rails models:
32+
33+
<<< @/../test/schema_generator_test.rb#agent_using_schema {ruby:line-numbers}
34+
35+
### Basic Structured Output Example
36+
37+
Define a schema and use it with the `output_schema` parameter:
38+
39+
<<< @/../test/integration/structured_output_json_parsing_test.rb#34-70{ruby:line-numbers}
40+
41+
The response will automatically have:
42+
- `content_type` set to `"application/json"`
43+
- `content` parsed as a Ruby Hash
44+
- `raw_content` available as the original JSON string
45+
46+
## Schema Generation
47+
48+
### From ActiveModel
49+
50+
Create schemas from ActiveModel classes with validations:
51+
52+
<<< @/../test/schema_generator_test.rb#basic_user_model {ruby:line-numbers}
53+
54+
Generate the schema:
55+
56+
<<< @/../test/schema_generator_test.rb#basic_schema_generation {ruby:line-numbers}
57+
58+
### From ActiveRecord
59+
60+
Generate schemas from database-backed models:
61+
62+
<<< @/../test/schema_generator_test.rb#activerecord_schema_generation {ruby:line-numbers}
63+
64+
### Strict Schemas
65+
66+
For providers requiring strict schemas (like OpenAI):
67+
68+
<<< @/../test/schema_generator_test.rb#strict_schema_generation {ruby:line-numbers}
69+
70+
In strict mode:
71+
- All properties are marked as required
72+
- `additionalProperties` is set to false
73+
- The schema is wrapped with name and strict flags
74+
75+
### Excluding Fields
76+
77+
Exclude sensitive or unnecessary fields from schemas:
78+
79+
<<< @/../test/schema_generator_test.rb#schema_with_exclusions {ruby:line-numbers}
80+
81+
## JSON Response Handling
82+
83+
### Automatic Parsing
84+
85+
With structured output, responses are automatically parsed:
86+
87+
```ruby
88+
# Without structured output
89+
response = agent.prompt(message: "Hello").generate_now
90+
response.message.content # => "Hello! How can I help?"
91+
response.message.content_type # => "text/plain"
92+
93+
# With structured output
94+
response = agent.prompt(
95+
message: "Extract user data",
96+
output_schema: schema
97+
).generate_now
98+
response.message.content # => { "name" => "John", "age" => 30 }
99+
response.message.content_type # => "application/json"
100+
response.message.raw_content # => '{"name":"John","age":30}'
101+
```
102+
103+
### Error Handling
104+
105+
Handle JSON parsing errors gracefully:
106+
107+
<<< @/../test/integration/structured_output_json_parsing_test.rb#155-169{ruby:line-numbers}
108+
109+
## Provider Support
110+
111+
Different AI providers have varying levels of structured output support:
112+
113+
- **[OpenAI](/docs/generation-providers/openai-provider#structured-output)** - Native JSON mode with strict schema validation
114+
- **[OpenRouter](/docs/generation-providers/open-router-provider#structured-output-support)** - Support through compatible models, ideal for multimodal tasks
115+
- **[Anthropic](/docs/generation-providers/anthropic-provider#structured-output)** - Instruction-based JSON generation
116+
- **[Ollama](/docs/generation-providers/ollama-provider#structured-output)** - Local model support with JSON mode
117+
118+
## Real-World Examples
119+
120+
### Data Extraction Agent
121+
122+
The [Data Extraction Agent](/docs/agents/data-extraction-agent#structured-output) demonstrates comprehensive structured output usage:
123+
124+
<<< @/../test/agents/data_extraction_agent_test.rb#data_extraction_agent_parse_chart_with_structured_output {ruby:line-numbers}
125+
126+
### Integration with Rails Models
127+
128+
Use your existing Rails models for schema generation:
129+
130+
<<< @/../test/integration/structured_output_json_parsing_test.rb#110-137{ruby:line-numbers}
131+
132+
## Best Practices
133+
134+
### 1. Use Model Schemas
135+
Leverage ActiveRecord/ActiveModel for single source of truth:
136+
137+
```ruby
138+
class User < ApplicationRecord
139+
include ActiveAgent::SchemaGenerator
140+
141+
validates :email, presence: true, format: { with: URI::MailTo::EMAIL_REGEXP }
142+
validates :age, numericality: { greater_than: 18 }
143+
end
144+
145+
# In your agent
146+
schema = User.to_json_schema(strict: true, name: "user_data")
147+
prompt(output_schema: schema)
148+
```
149+
150+
### 2. Schema Design
151+
- Keep schemas focused and minimal
152+
- Use strict mode for critical data
153+
- Include validation constraints
154+
- Provide clear descriptions for complex fields
155+
156+
### 3. Testing
157+
Always test structured output with real providers:
158+
159+
```ruby
160+
test "extracts data with correct schema" do
161+
VCR.use_cassette("structured_extraction") do
162+
response = agent.extract_data.generate_now
163+
164+
assert_equal "application/json", response.message.content_type
165+
assert response.message.content.is_a?(Hash)
166+
assert_valid_schema response.message.content, expected_schema
167+
end
168+
end
169+
```
170+
171+
## Migration Guide
172+
173+
### From Manual JSON Parsing
174+
175+
Before:
176+
```ruby
177+
response = agent.prompt(message: "Extract data as JSON").generate_now
178+
data = JSON.parse(response.message.content) rescue {}
179+
```
180+
181+
After:
182+
```ruby
183+
response = agent.prompt(
184+
message: "Extract data",
185+
output_schema: MyModel.to_json_schema(strict: true)
186+
).generate_now
187+
data = response.message.content # Already parsed!
188+
```
189+
190+
### From Custom Schemas
191+
192+
Before:
193+
```ruby
194+
schema = {
195+
type: "object",
196+
properties: {
197+
name: { type: "string" },
198+
age: { type: "integer" }
199+
}
200+
}
201+
```
202+
203+
After:
204+
```ruby
205+
class ExtractedUser
206+
include ActiveModel::Model
207+
include ActiveAgent::SchemaGenerator
208+
209+
attribute :name, :string
210+
attribute :age, :integer
211+
end
212+
213+
schema = ExtractedUser.to_json_schema(strict: true)
214+
```
215+
216+
## Troubleshooting
217+
218+
### Common Issues
219+
220+
**Invalid JSON Response**
221+
- Ensure provider supports structured output
222+
- Check model compatibility
223+
- Verify schema is valid JSON Schema
224+
225+
**Missing Fields**
226+
- Use strict mode to require all fields
227+
- Add validation constraints to model
228+
- Check provider documentation for limitations
229+
230+
**Type Mismatches**
231+
- Ensure schema types match provider capabilities
232+
- Use appropriate type coercion in models
233+
- Test with actual provider responses
234+
235+
## See Also
236+
237+
- [Data Extraction Agent](/docs/agents/data-extraction-agent) - Complete extraction examples
238+
- [OpenAI Structured Output](/docs/generation-providers/openai-provider#structured-output) - OpenAI implementation details
239+
- [OpenRouter Structured Output](/docs/generation-providers/open-router-provider#structured-output-support) - Multimodal extraction

docs/docs/agents/data-extraction-agent.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,42 @@ When using structured output:
5757
- The response content will be valid JSON matching your schema
5858
- Parse the response with `JSON.parse(response.message.content)`
5959

60+
#### Generating Schemas from Models
61+
62+
ActiveAgent provides a `SchemaGenerator` module that can automatically create JSON schemas from your ActiveRecord and ActiveModel classes. This makes it easy to ensure extracted data matches your application's data models.
63+
64+
##### Basic Usage
65+
66+
::: code-group
67+
<<< @/../test/schema_generator_test.rb#basic_user_model {ruby:line-numbers}
68+
<<< @/../test/schema_generator_test.rb#basic_schema_generation {ruby:line-numbers}
69+
:::
70+
71+
The `to_json_schema` method generates a JSON schema from your model's attributes and validations.
72+
73+
##### Schema with Validations
74+
75+
Model validations are automatically included in the generated schema:
76+
77+
<<< @/../test/schema_generator_test.rb#schema_with_validations {ruby:line-numbers}
78+
79+
##### Strict Schema for Structured Output
80+
81+
For use with AI providers that support structured output, generate a strict schema:
82+
83+
::: code-group
84+
<<< @/../test/schema_generator_test.rb#blog_post_model {ruby:line-numbers}
85+
<<< @/../test/schema_generator_test.rb#strict_schema_generation {ruby:line-numbers}
86+
:::
87+
88+
##### Using Generated Schemas in Agents
89+
90+
Agents can use the schema generator to create structured output schemas dynamically:
91+
92+
<<< @/../test/schema_generator_test.rb#agent_using_schema {ruby:line-numbers}
93+
94+
This allows you to maintain a single source of truth for your data models and automatically generate schemas for AI extraction.
95+
6096
::: info Provider Support
6197
Structured output requires a generation provider that supports JSON schemas. Currently supported providers include:
6298
- **OpenAI** - GPT-4o, GPT-4o-mini, GPT-3.5-turbo variants

0 commit comments

Comments
 (0)