- Project Overview
- Architecture
- Development History
- Refactoring & Modernization
- Tool System
- Auto-Discovery Registry
- Security Framework
- OCR System Architecture
- API Documentation
- Installation & Setup
- Testing
- Recent Improvements
- Troubleshooting
- Future Enhancements
Frappe Assistant Core is a comprehensive, MIT-licensed open source Model Context Protocol (MCP) server implementation that enables AI assistants like Claude to interact seamlessly with Frappe Framework and ERPNext systems. The server implements JSON-RPC 2.0 based assistant protocol for secure document operations, report execution, data analysis, and visualization with inline display capabilities.
- 🏗️ Plugin-Based Architecture - Complete architectural redesign from monolithic to modular plugin system
- 18 Comprehensive Tools across 4 plugin categories (see Tool Reference)
- 🔌 Plugin Auto-Discovery - Zero configuration plugin and tool loading
- 🎯 Plugin Manager - Centralized plugin lifecycle management with validation
- 📦 Tool Registry - Dynamic tool discovery and registration system
- ⚙️ Runtime Plugin Management - Enable/disable plugins through web interface
- 🐍 Python Code Execution - Safe sandboxed analysis environment with auto-import handling
- 📊 Data Analysis & Visualization - Statistical analysis with pandas/numpy and matplotlib integration
- 📄 File Processing & OCR - Extract content from PDFs, images, CSV, Excel, and documents for LLM analysis
- 📈 Enhanced Report Integration - Execute all Frappe report types with improved debugging
- 🔍 Advanced Search & Metadata - Comprehensive data exploration across all DocTypes
- 📋 Robust Document Operations - CRUD operations with enhanced error handling
- 🔒 Multi-Layer Security Framework - Comprehensive role-based access control with row-level security
- 🛡️ Advanced Permission System - Granular permissions with document-level and field-level security
- 🔐 Sensitive Data Protection - Automatic filtering of passwords, API keys, and sensitive information
- 📝 Comprehensive Audit Trail - Complete operation logging with security monitoring
- 🏛️ Modular Architecture - Clean, maintainable, extensible codebase
- 📊 Centralized Logging - Professional logging system replacing print statements
- 📦 Modern Python Packaging - pyproject.toml with proper dependency management
- ⚖️ MIT Licensed - Free for all commercial and personal use
- Backend: Python, Frappe Framework
- Protocol: JSON-RPC 2.0, MCP (Model Context Protocol)
- Data Analysis: pandas, numpy, matplotlib, seaborn
- Database: MariaDB (via Frappe ORM)
- Communication: WebSocket, HTTP REST API
- Security: Multi-layer security framework with role-based permissions, field-level filtering, and audit trails
- Architecture: Modular handlers, centralized constants, proper logging
frappe_assistant_core/
├── core/ # Core system components
│ ├── tool_registry.py # Auto-discovery tool registry
│ └── base_tool.py # Base tool implementation class
├── utils/ # Utility modules
│ ├── plugin_manager.py # Plugin discovery and loading
│ └── logger.py # Professional logging system
└── assistant_core/ # Frappe DocType implementations
└── doctype/
├── assistant_core_settings/ # Main settings
├── assistant_plugin_repository/ # Plugin management
└── assistant_tool_registry/ # Tool registration
frappe_assistant_core/plugins/
├── core/ # Core tools plugin (always enabled)
│ ├── plugin.py # Plugin definition and metadata
│ └── tools/ # Tool implementations
│ ├── create_document.py # Document creation
│ ├── get_document.py # Document retrieval
│ ├── update_document.py # Document updates
│ ├── delete_document.py # Document deletion
│ ├── list_documents.py # Document listing
│ ├── search_documents.py # Global search
│ ├── search_doctype.py # DocType-specific search
│ ├── search_link.py # Link field search
│ ├── metadata_*.py # Metadata tools
│ ├── report_*.py # Report tools
│ └── workflow_*.py # Workflow tools
├── data_science/ # Data science plugin (optional)
│ ├── plugin.py # Plugin definition
│ └── tools/
│ ├── run_python_code.py # Python code execution
│ ├── analyze_business_data.py # Data analysis
│ └── query_and_analyze.py # SQL analysis
├── visualization/ # Visualization plugin (optional)
│ ├── plugin.py # Plugin definition
│ └── tools/
│ ├── create_dashboard.py # Dashboard creation
│ ├── create_dashboard_chart.py # Chart creation
│ └── list_user_dashboards.py # Dashboard management
├── websocket/ # WebSocket plugin (optional)
└── batch_processing/ # Batch processing plugin (optional)
- 🔌 Modular Design: Tools organized in logical, discoverable plugins
- 🚀 Auto-Discovery: Automatic plugin and tool discovery on startup
- ⚙️ Runtime Management: Enable/disable plugins through web interface
- 🎯 Focused Functionality: Each plugin handles specific domain
- 🔧 Extensibility: Easy to add new plugins without core changes
- 📦 Dependency Management: Plugin-specific dependencies and validation
# Core registry handles plugin-based tool discovery
from frappe_assistant_core.core.tool_registry import ToolRegistry
registry = ToolRegistry()
# Automatically discovers tools from all enabled plugins
available_tools = registry.get_available_tools()from frappe_assistant_core.utils.plugin_manager import get_plugin_manager
plugin_manager = get_plugin_manager()
# Discovers all available plugins
discovered_plugins = plugin_manager.get_discovered_plugins()
# Loads enabled plugins based on settings
plugin_manager.load_enabled_plugins(['core', 'data_science'])from frappe_assistant_core.core.base_tool import BaseTool
class MyTool(BaseTool):
def __init__(self):
super().__init__()
self.name = "my_tool"
self.description = "Tool description"
self.inputSchema = {...} # JSON schema
def execute(self, arguments):
# Tool implementation
return {"success": True, "result": "..."}[build-system]
requires = ["setuptools>=64", "wheel"]
build-backend = "setuptools.build_meta"
[project]
name = "frappe_assistant_core"
version = "2.3.1"
requires-python = ">=3.8"
dependencies = [
"pandas>=1.3.0",
"numpy>=1.20.0",
"matplotlib>=3.4.0",
"seaborn>=0.11.0",
"requests>=2.25.0",
"paddleocr>=2.9.0",
"paddlepaddle>=2.6.0,<3.1.0",
"pymupdf>=1.24.0",
# ... see pyproject.toml for full list
]Benefits:
- ✅ Eliminates pip deprecation warnings
- ✅ Modern Python packaging standards
- ✅ Proper dependency management
- ✅ Development and analysis dependency groups
Frappe Assistant Core implements a custom MCP server instead of using generic MCP libraries. This decision was made to provide better integration with Frappe and solve critical compatibility issues.
frappe_assistant_core/
├── mcp/
│ ├── server.py # Custom MCPServer class
│ └── tool_adapter.py # BaseTool to MCP adapter
└── api/
├── fac_endpoint.py # Main MCP endpoint with OAuth validation
├── oauth_discovery.py # OAuth discovery endpoints
└── ...
Problem with Generic Libraries:
-
JSON Serialization Issues: Generic MCP libraries can't handle Frappe's data types
# Frappe returns datetime, Decimal, frappe._dict, etc. result = frappe.get_doc("Sales Invoice", "INV-00001") json.dumps(result) # ❌ TypeError: Object of type datetime is not JSON serializable
-
Pydantic Overhead: Heavy dependencies not needed for Frappe integration
-
Limited Frappe Integration: No built-in session, permission, or ORM integration
-
Poor Error Handling: Generic errors instead of Frappe-specific context
Our Solution:
# mcp/server.py - The key innovation
import json
def _format_tool_result(result):
"""Format tool result with proper JSON serialization"""
if isinstance(result, str):
result_text = result
else:
# CRITICAL FIX: Use default=str to handle ANY Python type
result_text = json.dumps(result, default=str, indent=2)
return {
"content": [{"type": "text", "text": result_text}],
"isError": False
}Benefits:
- ✅ Handles datetime, Decimal, frappe._dict automatically
- ✅ No Pydantic dependency (lighter, simpler)
- ✅ Full error tracebacks for debugging
- ✅ Direct Frappe session integration
- ✅ Optimized for Frappe's architecture
class MCPServer:
"""Custom MCP server implementation for Frappe"""
def __init__(self, name: str, version: str = "1.0.0"):
self.name = name
self.version = version
self.tools = {} # Registered tools
self._endpoint_registered = False
def register(self, allow_guest=False, xss_safe=True):
"""Decorator to register the main MCP endpoint with Frappe"""
def decorator(func):
@frappe.whitelist(allow_guest=allow_guest, xss_safe=xss_safe)
def wrapper():
# Custom preprocessing (e.g., OAuth validation)
result = func()
if result is not None:
return result
# Handle MCP request
return self.handle()
return wrapper
return decorator
def tool(self, name: str = None):
"""Decorator to register MCP tools"""
def decorator(func):
tool_name = name or func.__name__
self.tools[tool_name] = {
"name": tool_name,
"description": func.__doc__ or "",
"handler": func
}
return func
return decorator
def add_tool(self, name: str, description: str, inputSchema: dict, handler):
"""Programmatically add a tool"""
self.tools[name] = {
"name": name,
"description": description,
"inputSchema": inputSchema,
"handler": handler
}
def handle(self):
"""Main request handler - routes JSON-RPC methods"""
try:
request = json.loads(frappe.request.data)
method = request.get("method")
if method == "initialize":
return self._handle_initialize(request)
elif method == "tools/list":
return self._handle_tools_list(request)
elif method == "tools/call":
return self._handle_tools_call(request)
else:
return self._error_response(request, -32601, "Method not found")
except Exception as e:
return self._error_response(request, -32603, str(e))The tool adapter bridges our BaseTool classes with the MCP server:
# mcp/tool_adapter.py
def register_base_tool(mcp_server, tool_instance):
"""
Register a BaseTool with the MCP server.
This adapter:
1. Extracts tool metadata (name, description, inputSchema)
2. Creates a wrapper that calls tool_instance._safe_execute()
3. Registers with MCPServer using add_tool()
4. All BaseTool features work automatically (validation, permissions, audit)
"""
def tool_handler(arguments):
# Call the tool's safe execution wrapper
# This handles validation, permissions, audit logging, etc.
return tool_instance._safe_execute(arguments)
# Register with MCP server
mcp_server.add_tool(
name=tool_instance.name,
description=tool_instance.description,
inputSchema=tool_instance.inputSchema,
handler=tool_handler
)Usage:
from frappe_assistant_core.mcp.server import MCPServer
from frappe_assistant_core.mcp.tool_adapter import register_base_tool
from frappe_assistant_core.core.tool_registry import ToolRegistry
# Create MCP server
mcp = MCPServer("frappe-assistant-core", "2.0.0")
# Get all available tools from registry
registry = ToolRegistry()
tools = registry.get_available_tools()
# Register each tool with MCP server
for tool in tools:
register_base_tool(mcp, tool)The MCP endpoint implements OAuth 2.0 Protected Resource (RFC 9728):
Located in api/fac_endpoint.py:
def validate_oauth_token():
"""
Validate OAuth Bearer token from Authorization header.
Returns:
User object if valid, raises exception if not
"""
# Extract Bearer token
auth_header = frappe.request.headers.get("Authorization", "")
if not auth_header.startswith("Bearer "):
raise_unauthorized("Missing or invalid Authorization header")
token = auth_header[7:] # Remove "Bearer " prefix
try:
# Query OAuth Bearer Token doctype
bearer_token = frappe.get_doc("OAuth Bearer Token", {"access_token": token})
# Validate token status
if bearer_token.status != "Active":
raise_unauthorized("Token is not active")
# Validate expiration (use Frappe's timezone-aware now_datetime)
from frappe.utils import now_datetime
if bearer_token.expiration_time < now_datetime():
raise_unauthorized("Token has expired")
# Set user session
frappe.set_user(bearer_token.user)
return bearer_token.user
except frappe.DoesNotExistError:
raise_unauthorized("Invalid token")
def raise_unauthorized(message):
"""Return RFC 9728 compliant 401 response"""
frappe.local.response.http_status_code = 401
frappe.local.response.headers = {
"WWW-Authenticate": (
f'Bearer realm="Frappe Assistant Core", '
f'error="invalid_token", '
f'error_description="{message}", '
f'resource_metadata="{get_server_url()}/.well-known/oauth-protected-resource"'
)
}
frappe.throw(message, exc=frappe.AuthenticationError)Frappe v15 Compatibility Layer:
Since Frappe v15 doesn't have built-in OAuth discovery, we backported the functionality:
-
Custom Page Renderer (api/oauth_wellknown_renderer.py)
- Intercepts
.well-known/*paths - Returns proper JSON instead of HTML
- Registered via
page_rendererhook
- Intercepts
-
Discovery Endpoints (api/oauth_discovery.py)
/.well-known/openid-configuration- OpenID Connect discovery/.well-known/oauth-authorization-server- OAuth server metadata (RFC 8414)/.well-known/oauth-protected-resource- Protected resource metadata (RFC 9728)
-
Compatibility Layer (utils/oauth_compat.py)
- Detects Frappe version (v15 vs v16+)
- Routes to native v16 OAuth Settings or custom v15 implementation
- Provides unified settings interface
Version Detection:
# utils/oauth_compat.py
def get_oauth_settings():
"""
Get OAuth settings from appropriate source.
Frappe v16+: Uses native OAuth Settings doctype
Frappe v15: Uses Assistant Core Settings with OAuth fields
"""
if frappe.db.exists("DocType", "OAuth Settings"):
# Frappe v16+ has native OAuth Settings
return frappe.get_single("OAuth Settings")
else:
# Frappe v15 - use our backported settings
return frappe.get_single("Assistant Core Settings")┌────────────────────────────────────────────────────────┐
│ HTTP POST /api/method/.../fac_endpoint.handle_mcp │
│ Authorization: Bearer <token> │
│ Content-Type: application/json │
│ Body: {"jsonrpc": "2.0", "method": "tools/call", ...} │
└────────────────────┬───────────────────────────────────┘
│
▼
┌────────────────────────────────────────────────────────┐
│ 1. OAuth Validation (fac_endpoint.py) │
│ - Extract Bearer token from header │
│ - Query OAuth Bearer Token doctype │
│ - Validate status = "Active" │
│ - Validate not expired │
│ - frappe.set_user(token.user) │
│ - Return 401 if invalid │
└────────────────────┬───────────────────────────────────┘
│
▼
┌────────────────────────────────────────────────────────┐
│ 2. MCP Server Request Handling (mcp/server.py) │
│ - Parse JSON-RPC request │
│ - Route to method handler: │
│ * initialize → _handle_initialize() │
│ * tools/list → _handle_tools_list() │
│ * tools/call → _handle_tools_call() │
└────────────────────┬───────────────────────────────────┘
│
▼
┌────────────────────────────────────────────────────────┐
│ 3. Tool Registry (core/tool_registry.py) │
│ - Get requested tool from registry │
│ - Filter by user permissions │
│ - Return tool instance │
└────────────────────┬───────────────────────────────────┘
│
▼
┌────────────────────────────────────────────────────────┐
│ 4. Tool Adapter (mcp/tool_adapter.py) │
│ - Call tool_instance._safe_execute(arguments) │
│ - Handles validation, permissions, audit │
└────────────────────┬───────────────────────────────────┘
│
▼
┌────────────────────────────────────────────────────────┐
│ 5. Tool Execution (core/base_tool.py) │
│ - Validate arguments against inputSchema │
│ - Check user permissions │
│ - Execute tool.execute(arguments) │
│ - Log to audit trail │
│ - Return result │
└────────────────────┬───────────────────────────────────┘
│
▼
┌────────────────────────────────────────────────────────┐
│ 6. JSON Serialization (mcp/server.py) │
│ - json.dumps(result, default=str) │
│ - Handles datetime, Decimal, etc. │
│ - Format as MCP response │
└────────────────────┬───────────────────────────────────┘
│
▼
┌────────────────────────────────────────────────────────┐
│ HTTP 200 OK │
│ Content-Type: application/json │
│ Body: {"jsonrpc": "2.0", "result": {...}, "id": 1} │
└────────────────────────────────────────────────────────┘
| File | Purpose | Lines | Key Features |
|---|---|---|---|
mcp/server.py |
Custom MCP server | ~400 | JSON-RPC handler, tool registry, default=str fix |
mcp/tool_adapter.py |
Tool adapter | ~50 | BaseTool to MCP bridge |
api/fac_endpoint.py |
Main endpoint | ~200 | OAuth validation, MCP integration |
api/oauth_discovery.py |
OAuth discovery | ~280 | RFC 8414, RFC 9728 compliance |
api/oauth_wellknown_renderer.py |
Page renderer | ~110 | v15 .well-known support |
utils/oauth_compat.py |
Version compat | ~100 | v15/v16 abstraction |
Request Processing Time:
- OAuth validation: ~5-10ms
- Tool registry lookup: ~1-5ms
- Tool execution: Variable (depends on tool)
- JSON serialization: ~1-3ms
- Total overhead: ~10-20ms (excluding tool execution)
Memory Usage:
- MCP server: ~500KB
- Tool registry: ~1MB (with all tools)
- Per-request: ~50-100KB
Scalability:
- Stateless design (no session state in server)
- Token validation uses database query (can be cached)
- Tool instances are singletons (shared across requests)
- Handles concurrent requests via Frappe's WSGI
Before:
- Single 1580-line monolithic API file
- Hardcoded tool discovery
- Fixed tool structure
- Legacy setup.py packaging
After:
- Complete plugin-based architecture
- 4 distinct plugin categories (core, data_science, websocket, batch_processing)
- Automatic plugin and tool discovery
- Plugin Manager with lifecycle management
- Tool Registry with dynamic registration
- BaseTool framework for standardized development
- Modern pyproject.toml packaging
Additional Improvements:
- Centralized constants in dedicated module
- Professional logging system replacing print statements
- Enhanced error handling and validation
- Improved data compatibility (fixed pandas DataFrame issues)
Data Science Plugin Enhancements:
- ✅ Fixed
invalid __array_struct__errors with pandas DataFrame creation from Frappe data - ✅ Resolved matplotlib import scope issues in visualization tools
- ✅ Enhanced SQL-based data fetching to bypass frappe._dict compatibility problems
- ✅ Improved data serialization for complex Frappe data types
- ✅ Better error handling with meaningful error messages for debugging
Impact:
- All 21 tools now functional without data conversion errors
- Seamless pandas integration with Frappe business data
- Reliable visualization creation across all chart types
- Enhanced user experience with clear error messages
| Metric | Before | After | Improvement |
|---|---|---|---|
| Main API File Size | 1580 lines | 200 lines | 87% reduction |
| Print Statements | 905 across 26 files | 0 in production code | 100% eliminated |
| Hardcoded Strings | 50+ scattered | Centralized in constants | Organized |
| Module Structure | Monolithic | Modular handlers | Clean separation |
New Architecture Files:
frappe_assistant_core/constants/definitions.pyfrappe_assistant_core/utils/logger.pyfrappe_assistant_core/api/handlers/initialize.pyfrappe_assistant_core/api/handlers/tools.pyfrappe_assistant_core/api/handlers/prompts.pyfrappe_assistant_core/tools/registry.pyfrappe_assistant_core/tools/executor.pypyproject.toml
Import Errors Fixed:
- ✅ Created missing
tools/registry.pymodule - ✅ Created missing
tools/executor.pymodule - ✅ Fixed missing notification handler
- ✅ Corrected DocType name inconsistencies
Module Structure Fixed:
- ✅ Added missing
__init__.pyfiles - ✅ Organized proper import hierarchy
- ✅ Created compatibility wrappers
Cleanup Completed:
- ✅ Removed 26 temporary test/debug files
- ✅ Removed backup files
- ✅ Cleaned empty directories
Essential Frappe operations that are always available:
Document Operations (document_*)
create_document- Create new documentsget_document- Retrieve document dataupdate_document- Update existing documentsdelete_document- Delete documentslist_documents- List documents with filters
Search Tools (search_*)
search_documents- Global search across all DocTypessearch_doctype- DocType-specific searchsearch_link- Link field search and filtering
Metadata Tools (metadata_*)
get_doctype_info- Get DocType structure informationmetadata_list_doctypes- List all available DocTypesget_doctype_info_fields- Get field definitionsmetadata_permissions- Permission informationmetadata_workflow- Workflow information
Report Tools (report_*)
generate_report- Execute any Frappe report typereport_list- Get available reportsget_report_data- Get report structure and parameters
Workflow Tools (workflow_*)
workflow_action- Execute workflow actionsworkflow_list- List available workflowsworkflow_status- Get workflow status
Advanced analytics and visualization capabilities:
Analysis Tools
run_python_code- Sandboxed Python execution with Frappe contextanalyze_business_data- Statistical data analysis with pandasquery_and_analyze- SQL queries with advanced analysisextract_file_content- Extract content from PDFs, images (OCR), CSV, Excel, DOCX for LLM processing
Dependencies: pandas, numpy, matplotlib, seaborn, plotly, scipy, pypdf, Pillow, python-docx, paddleocr, paddlepaddle, pymupdf
Professional dashboard and chart creation:
Dashboard & Chart Tools
create_dashboard- Create Frappe dashboards with multiple chartscreate_dashboard_chart- Create individual Dashboard Chart documentslist_user_dashboards- List user's accessible dashboards
Dependencies: matplotlib, pandas, numpy
Real-time communication capabilities:
- Live data streaming
- Real-time notifications
- Interactive dashboard updates
Background and bulk operations:
- Large dataset processing
- Background task management
- Bulk operation optimization
# frappe_assistant_core/core/tool_registry.py
class ToolRegistry:
"""Registry for all available tools. Handles discovery from plugins."""
def __init__(self):
self.tools: Dict[str, BaseTool] = {}
self._discover_tools()
def _discover_tools(self):
"""Discover all available tools from enabled plugins"""
self._discover_plugin_tools()
def _discover_plugin_tools(self):
"""Discover tools from enabled plugins"""
plugin_manager = get_plugin_manager()
enabled_plugins = plugin_manager.get_enabled_plugins()
for plugin_name, plugin_instance in enabled_plugins.items():
# Load tools from each enabled plugin
plugin_tools = plugin_manager.get_plugin_tools(plugin_name)
self.tools.update(plugin_tools)# frappe_assistant_core/utils/plugin_manager.py
class PluginManager:
"""Manages plugin discovery, loading, and lifecycle"""
def _discover_plugins(self):
"""Auto-discover all plugins in the plugins directory"""
plugins_dir = Path(__file__).parent.parent / "plugins"
for item in plugins_dir.iterdir():
if item.is_dir() and not item.name.startswith(('_', '.')):
self._discover_plugin(item)
def get_enabled_plugins(self) -> Dict[str, BasePlugin]:
"""Get all enabled plugin instances"""
def get_plugin_tools(self, plugin_name: str) -> Dict[str, BaseTool]:
"""Get all tools from a specific plugin"""Plugin enabled/disabled states are stored in individual DocType records for atomic operations and multi-worker consistency:
FAC Plugin Configuration
├── plugin_name: Data (Primary Key, autoname: field:plugin_name)
├── display_name: Data (read-only)
├── enabled: Check (0 or 1)
├── description: Small Text
├── discovered_at: Datetime (read-only)
└── last_toggled_at: Datetime (read-only)
Key Features:
- Atomic Toggle Operations: Single row UPDATE instead of read-modify-write JSON
- Multi-Worker Safe: Consistent state across Gunicorn workers (no race conditions)
- Automatic Cache Invalidation:
on_update()hook clears plugin and tool caches - Audit Trail: Built-in
track_changesfor modification history - Auto-Sync: New plugins automatically get configuration records on
bench migrate
State Persistence Flow:
# frappe_assistant_core/utils/plugin_manager.py
class PluginPersistence:
def load_enabled_plugins(self) -> Set[str]:
"""Load enabled plugins from FAC Plugin Configuration DocType"""
enabled = frappe.get_all(
"FAC Plugin Configuration",
filters={"enabled": 1},
pluck="plugin_name"
)
return set(enabled)
def save_plugin_state(self, plugin_name: str, enabled: bool) -> bool:
"""Atomic plugin state update"""
if frappe.db.exists("FAC Plugin Configuration", plugin_name):
doc = frappe.get_doc("FAC Plugin Configuration", plugin_name)
doc.enabled = 1 if enabled else 0
doc.last_toggled_at = frappe.utils.now()
doc.save(ignore_permissions=True)
frappe.db.commit()
return True# frappe_assistant_core/core/base_tool.py
class BaseTool:
"""Base class for all tools with standardized interface"""
def __init__(self):
self.name: str = ""
self.description: str = ""
self.inputSchema: Dict = {}
self.requires_permission: Optional[str] = None
def execute(self, arguments: Dict[str, Any]) -> Any:
"""Execute the tool - must be implemented by subclasses"""
raise NotImplementedError
def validate_arguments(self, arguments: Dict[str, Any]) -> bool:
"""Validate arguments against tool schema"""
def check_permission(self, user: str = None) -> bool:
"""Check if user has permission to use this tool"""Individual tool configuration with granular access control:
FAC Tool Configuration
├── tool_name: Data (Primary Key, autoname: field:tool_name)
├── plugin_name: Data (Source plugin)
├── enabled: Check (Show/hide from MCP)
├── tool_category: Select (read_only, write, read_write, privileged)
├── auto_detected_category: Data (AST-detected, read-only)
├── category_override: Check (Manual category override)
├── description: Small Text
├── source_app: Data (Source Frappe app, read-only)
├── module_path: Data (Python module path, read-only)
├── role_access_mode: Select (Allow All, Restrict to Listed Roles)
└── role_access: Table (FAC Tool Role Access child table)
Tool Category Auto-Detection:
# frappe_assistant_core/utils/tool_category_detector.py
class ToolCategoryDetector:
def detect_category(self, tool_instance) -> str:
"""
Detect category by analyzing tool's source code.
Uses AST parsing to find perm_type values.
Returns: 'read_only', 'write', 'read_write', or 'privileged'
"""
# 1. Check hardcoded lists (fastest)
if tool_name in PRIVILEGED_TOOLS: # delete_document, run_python_code
return "privileged"
if tool_name in READ_ONLY_TOOLS: # get_document, list_documents
return "read_only"
# 2. Parse source code for perm_type values
perm_types = self._extract_perm_types(tool_instance)
return self._categorize_from_perm_types(perm_types)Role-Based Access Control:
# FAC Tool Configuration document method
def user_has_access(self, user: str = None) -> bool:
"""Check if user has access based on role configuration."""
if not self.enabled:
return False
if self.role_access_mode == "Allow All":
return True
# System Manager always has access
user_roles = set(frappe.get_roles(user))
if "System Manager" in user_roles:
return True
# Check allowed roles
for role_access in self.role_access:
if role_access.role in user_roles and role_access.allow_access:
return True
return FalseAPI Functions:
# Toggle a single tool
from frappe_assistant_core.assistant_core.doctype.fac_tool_configuration.fac_tool_configuration import toggle_tool
toggle_tool("run_python_code", enabled=False)
# Bulk toggle multiple tools
from frappe_assistant_core.assistant_core.doctype.fac_tool_configuration.fac_tool_configuration import bulk_toggle_tools
bulk_toggle_tools(["delete_document", "run_python_code"], enabled=False)
# Check tool access
from frappe_assistant_core.assistant_core.doctype.fac_tool_configuration.fac_tool_configuration import get_tool_access_status
status = get_tool_access_status("delete_document", user="test@example.com")Multi-tool orchestration is a powerful capability that allows the run_python_code tool to call other tools directly inside the Python sandbox environment. This architectural pattern achieves 80-95% token savings for data analysis workflows by processing data in the sandbox instead of passing it through the LLM context.
┌─────────────┐
│ LLM │
└──────┬──────┘
│ "Analyze top customers"
↓
┌──────────────────────────────────────┐
│ run_python_code Tool │
│ ┌──────────────────────────────┐ │
│ │ Python Sandbox (RestrictedPython)│
│ │ ┌────────────────────────┐ │ │
│ │ │ User Code: │ │ │
│ │ │ result = tools.get_ │ │ │
│ │ │ documents(...) │ │ │
│ │ │ df = pd.DataFrame() │ │ │
│ │ │ analysis = df.groupby │ │ │
│ │ └───────────┬────────────┘ │ │
│ │ │ │ │
│ │ ┌───────────▼────────────┐ │ │
│ │ │ Tools API (tool_api.py)│ │ │
│ │ │ - get_documents() │ │ │
│ │ │ - get_document() │ │ │
│ │ │ - generate_report() │ │ │
│ │ └───────────┬────────────┘ │ │
│ └──────────────┼───────────────┘ │
│ │ │
└─────────────────┼───────────────────┘
│
┌──────────▼──────────┐
│ Tool Registry │
│ ┌────────────────┐ │
│ │ list_documents │ │
│ │ get_document │ │
│ │ generate_report│ │
│ └────────┬───────┘ │
└───────────┼─────────┘
│
┌───────────▼──────────┐
│ Frappe Framework │
│ - Database │
│ - Permissions │
│ - Reports │
└──────────────────────┘
Location: frappe_assistant_core/utils/tool_api.py
The FrappeAssistantAPI class provides a unified interface for tools to be called from within Python sandbox:
class FrappeAssistantAPI:
"""
Unified API for tool orchestration within run_python_code sandbox.
Allows LLMs to write Python code that calls other tools inside
the sandbox, achieving 80-95% token savings by processing data
in sandbox instead of passing through LLM context.
"""
def get_documents(self, doctype: str, filters: dict = None,
fields: list = None, limit: int = 100) -> dict:
"""
Fetch documents using list_documents tool.
Returns plain Python dicts (not frappe._dict) for pandas compatibility.
"""
def get_document(self, doctype: str, name: str) -> dict:
"""
Fetch single document using get_document tool.
Returns plain Python dict for pandas compatibility.
"""
def generate_report(self, report_name: str, filters: dict = None) -> dict:
"""
Execute Frappe report using generate_report tool.
Returns report data with plain Python dicts.
"""Location: frappe_assistant_core/plugins/data_science/tools/run_python_code.py
The tools object is automatically injected into the sandbox environment:
def execute(self, arguments: Dict[str, Any]) -> Dict[str, Any]:
"""Execute Python code in restricted sandbox"""
# Create tools API instance
from frappe_assistant_core.utils.tool_api import FrappeAssistantAPI
tools = FrappeAssistantAPI()
# Define restricted globals with tools API
restricted_globals = {
'__builtins__': safe_builtins,
'tools': tools, # Injected tools API
'frappe': frappe,
'pd': pd,
'np': np,
# ... other safe imports
}
# Execute user code with tools API available
exec(compiled_code, restricted_globals, restricted_locals)To ensure pandas compatibility, all data returned by the tools API is converted from frappe._dict to plain Python dicts:
# In tool_api.py - get_documents()
raw_data = frappe.get_all(doctype, filters=filters, fields=fields, limit=limit)
# Convert frappe._dict objects to plain Python dicts
data = [dict(item) for item in raw_data]
return {"success": True, "data": data, "count": len(data)}This prevents ValueError: invalid __array_struct__ errors when creating pandas DataFrames.
User: "Analyze top 10 customers by revenue"
┌─────────────┐
│ LLM │ 1. Calls list_documents
└──────┬──────┘ (fetches 100 invoices)
│
↓ [5000 tokens: 100 invoice records sent to LLM]
┌─────────────┐
│ LLM │ 2. Manually analyzes in context
└──────┬──────┘ (burns 5000 tokens processing)
│
↓ [100 tokens: insights returned]
┌─────────────┐
│ User │ Total: 5100 tokens used
└─────────────┘
User: "Analyze top 10 customers by revenue"
┌─────────────┐
│ LLM │ 1. Writes Python code with
└──────┬──────┘ tools.get_documents() call
│
↓ [50 tokens: Python code sent to sandbox]
┌──────────────────────────┐
│ Python Sandbox │
│ - Fetches 100 invoices │ Data stays in sandbox
│ - Processes with pandas │ (0 tokens to LLM)
│ - Returns only insights │
└──────┬───────────────────┘
│
↓ [50 tokens: insights returned]
┌─────────────┐
│ User │ Total: 100 tokens used
└─────────────┘
Token Savings: 98%
The orchestration system maintains security through multiple layers:
Tools called via the API run with the same permissions as the authenticated user:
# In tool_api.py
def get_documents(self, doctype: str, ...):
# Permission checks happen in underlying tool
# User can only access documents they have permission for
tool = get_tool_registry().get_tool("list_documents")
result = tool.execute(arguments)
return resultThe tools API only exposes read operations:
get_documents()- Read documentsget_document()- Read single documentgenerate_report()- Read report data
No write operations (create/update/delete) are exposed to prevent accidental data modification.
Python code runs in a RestrictedPython environment:
- No file system access
- No network access (except via tools API)
- No dangerous imports (os, sys, subprocess)
- No exec/eval on user strings
- Execution timeout limits
All tool calls through the API are logged:
# Audit trail includes:
- User who executed code
- Tools called via API
- Arguments passed to tools
- Execution time
- Success/failure status| Operation | Traditional | Orchestrated | Difference |
|---|---|---|---|
| Fetch 100 records | 500ms | 500ms | Same |
| LLM processing | 2000ms | 0ms | -2000ms |
| Total | 2500ms | 500ms | 80% faster |
| Data Size | Traditional Tokens | Orchestrated Tokens | Savings |
|---|---|---|---|
| 10 records | 500 | 30 | 94% |
| 50 records | 3000 | 100 | 96.7% |
| 100 records | 5000 | 50 | 99% |
| 500 records | 15000 | 200 | 98.7% |
Average: 80-95% token savings
See the Python Code Orchestration Guide for:
- Complete examples with code
- Best practices for data handling
- Common patterns (aggregation, joining, time-series)
- Error handling strategies
- Performance optimization tips
The tools API returns structured responses with success/error status:
{
"success": True,
"data": [...],
"count": 100
}
# Or on error:
{
"success": False,
"error": "Permission denied for DocType 'Confidential Doc'",
"error_type": "PermissionError"
}User code should always check for success:
result = tools.get_documents(doctype="Customer")
if result["success"]:
df = pd.DataFrame(result["data"])
# Process data...
else:
print(f"Error: {result['error']}")The tools API returns data with potential null values. User code must handle these:
# Handle None values before operations
df['amount'] = df['amount'].fillna(0)
df['status'] = df['status'].fillna('Unknown')
# Check before formatting
if pd.notna(value):
print(f"{value:,.2f}")
# Safe division
rate = (paid / total * 100) if total > 0 else 0Potential future additions to the orchestration system:
- Write Operations - Controlled create/update operations through API
- Batch Processing - Bulk operations for efficiency
- Caching Layer - Cache frequently accessed data
- Async Operations - Non-blocking data fetching
- Progress Callbacks - Real-time progress for long operations
- Resource Limits - Per-user resource quotas
Frappe Assistant Core implements a comprehensive multi-layer security framework designed to ensure secure AI assistant operations within business environments. The security system provides role-based access control, document-level permissions, field-level data protection, and complete audit trails while seamlessly integrating with Frappe's built-in permission system.
File: frappe_assistant_core/core/security_config.py
The system implements a sophisticated role-based access matrix that controls which tools each user role can access:
ROLE_TOOL_ACCESS = {
"System Manager": {
"allowed_tools": "*", # Full access to all tools
"restricted_tools": [],
"description": "Full access to all assistant tools including dangerous operations"
},
"Assistant Admin": {
"allowed_tools": [
*BASIC_CORE_TOOLS, # All basic tools
"metadata_permissions", "metadata_workflow",
"tool_registry_list", "audit_log_view", "workflow_action"
],
"restricted_tools": ["run_python_code", "query_and_analyze"],
"description": "Administrative access without code execution capabilities"
},
"Assistant User": {
"allowed_tools": BASIC_CORE_TOOLS,
"restricted_tools": [
"run_python_code", "query_and_analyze",
"metadata_permissions", "tool_registry_list", "audit_log_view"
],
"description": "Basic business user access with document-level permissions"
},
"Default": {
"allowed_tools": BASIC_CORE_TOOLS,
"restricted_tools": ["run_python_code", "query_and_analyze"],
"description": "Basic tool access for all users - document permissions control actual access"
}
}Tool Categories by Role:
- System Manager: Full access to all 21 tools including dangerous operations
- Assistant Admin: 16 tools excluding code execution and direct database queries
- Assistant User: 14 basic tools for standard business operations
- Default: 14 basic tools for any other Frappe user roles
Core Function: validate_document_access()
Every document operation goes through comprehensive permission validation:
def validate_document_access(user: str, doctype: str, name: str, perm_type: str = "read") -> Dict[str, Any]:
"""Multi-layer document access validation"""
# Layer 1: Role-based DocType accessibility
if not is_doctype_accessible(doctype, primary_role):
return {"success": False, "error": "Access to {doctype} is restricted for your role"}
# Layer 2: Frappe DocType-level permissions
if not frappe.has_permission(doctype, perm_type, user=user):
return {"success": False, "error": "Insufficient {perm_type} permissions for {doctype}"}
# Layer 3: Document-specific permissions (row-level security)
if name and not frappe.has_permission(doctype, perm_type, doc=name, user=user):
return {"success": False, "error": "Insufficient {perm_type} permissions for {doctype} {name}"}
# Layer 4: Submitted document state validation
if perm_type in ["write", "delete"]:
# Prevent modification of submitted documents
doc = frappe.get_doc(doctype, name)
if hasattr(doc, 'docstatus') and doc.docstatus == 1:
return {"success": False, "error": "Cannot modify submitted document"}Permission Layers:
- Role-based DocType Access: Certain DocTypes restricted by user role
- Frappe DocType Permissions: Standard Frappe permission checking
- Row-Level Security: Document-specific access (company/user filters)
- Document State Validation: Submitted document protection
Sensitive Field Filtering: Automatic filtering of sensitive data based on user roles
SENSITIVE_FIELDS = {
"all_doctypes": [
"password", "new_password", "api_key", "api_secret", "secret_key",
"private_key", "access_token", "refresh_token", "reset_password_key",
"unsubscribe_key", "email_signature", "bank_account_no", "iban",
"encryption_key"
],
"User": [
"password", "api_key", "api_secret", "reset_password_key",
"login_after", "user_type", "simultaneous_sessions", "restrict_ip",
"last_password_reset_date", "last_login", "last_active"
],
"Email Account": [
"password", "smtp_password", "access_token", "refresh_token"
]
# ... 50+ sensitive fields across 15+ DocTypes
}
def filter_sensitive_fields(doc_dict: Dict[str, Any], doctype: str, user_role: str) -> Dict[str, Any]:
"""Filter sensitive fields based on user role"""
if user_role == "System Manager":
return doc_dict # System Manager can see all fields
# Replace sensitive values with "***RESTRICTED***"
for field in sensitive_fields:
if field in filtered_doc:
filtered_doc[field] = "***RESTRICTED***"Data Protection Features:
- Global Sensitive Fields: 18 universally sensitive fields across all DocTypes
- DocType-Specific Protection: Custom sensitive field lists for 15+ DocTypes
- Admin-Only Fields: System metadata hidden from Assistant Users
- Automatic Masking: Sensitive values replaced with
***RESTRICTED***
Restricted DocTypes for Assistant Users:
RESTRICTED_DOCTYPES = {
"Assistant User": [
# System administration DocTypes
"System Settings", "Print Settings", "Email Domain", "LDAP Settings",
"OAuth Settings", "Social Login Key", "Dropbox Settings",
# Security and permissions DocTypes
"Role", "User Permission", "Role Permission", "Custom Role",
"Module Profile", "Role Profile", "Custom DocPerm", "DocShare",
# System logs and audit DocTypes
"Error Log", "Activity Log", "Access Log", "View Log",
"Scheduler Log", "Integration Request",
# System customization DocTypes
"Server Script", "Client Script", "Custom Script", "Property Setter",
"DocType", "DocField", "DocPerm", "Custom Field",
# Development and maintenance DocTypes
"Package", "Data Import", "Data Export", "Bulk Update"
# ... 30+ restricted DocTypes total
]
}Access Control:
- 30+ Restricted DocTypes for Assistant Users to prevent system tampering
- Administrative Protection: Core system DocTypes only accessible to System Managers
- Security-Critical Access: Permission and role management restricted to admins
- Development Tool Restriction: System customization tools restricted appropriately
Company-Based Filtering: Automatic enforcement of company-based data access
# Example: User DocType access control
if doctype == "User" and user_role in ["Assistant User", "Default"]:
filters["name"] = current_user # Users can only see themselves
# Example: Audit log access control
def get_permission_query_conditions(user=None):
if "System Manager" in frappe.get_roles(user):
return "" # See all records
elif "Assistant Admin" in frappe.get_roles(user):
return "" # See all records
elif "Assistant User" in frappe.get_roles(user):
return f"`tabAssistant Audit Log`.user = '{user}'" # Only own records
return "1=0" # No access for othersRow-Level Security Features:
- Company-Based Access: Automatic filtering based on user's company permissions
- User-Scoped Data: Users can only access their own audit logs and connection logs
- Permission Query Integration: Uses Frappe's permission query system
- Dynamic Filtering: Contextual data filtering based on user roles and permissions
Security Event Logging: Complete audit trail of all assistant operations
def audit_log_tool_access(user: str, tool_name: str, arguments: Dict[str, Any], result: Dict[str, Any]):
"""Log all tool access attempts for security monitoring"""
audit_log = frappe.get_doc({
"doctype": "Assistant Audit Log",
"user": user,
"action": "tool_execution",
"tool_name": tool_name,
"arguments": frappe.as_json(arguments),
"success": result.get("success", False),
"error": result.get("error", ""),
"ip_address": frappe.local.request_ip,
"timestamp": frappe.utils.now()
})
audit_log.insert(ignore_permissions=True)Audit Features:
- Complete Tool Logging: Every tool execution logged with full context
- Success/Failure Tracking: Both successful and failed operations recorded
- IP Address Tracking: Security monitoring with source IP logging
- User-Scoped Access: Users can only view their own audit entries
- Admin Oversight: System Managers can view all audit entries
- Argument Preservation: Complete record of tool arguments for security analysis
graph TD
A[Tool Execution Request] --> B[Extract User Context]
B --> C[Check Tool Permission]
C --> D{Tool Allowed?}
D -->|No| E[Access Denied]
D -->|Yes| F[Check DocType Access]
F --> G{DocType Accessible?}
G -->|No| H[DocType Restricted]
G -->|Yes| I[Frappe Permission Check]
I --> J{Frappe Permissions?}
J -->|No| K[Permission Denied]
J -->|Yes| L[Document-Level Check]
L --> M{Document Access?}
M -->|No| N[Document Access Denied]
M -->|Yes| O[Execute Tool]
O --> P[Filter Sensitive Fields]
P --> Q[Log Audit Entry]
Q --> R[Return Filtered Result]
File: frappe_assistant_core/core/base_tool.py
All tools inherit security validation:
class BaseTool:
"""Base class with built-in security validation"""
def execute_with_security(self, arguments: Dict[str, Any]) -> Any:
"""Execute tool with comprehensive security validation"""
# 1. User context validation
current_user = frappe.session.user
if not current_user or current_user == "Guest":
return {"success": False, "error": "Authentication required"}
# 2. Tool permission validation
user_role = get_user_primary_role(current_user)
if not check_tool_access(user_role, self.name):
return {"success": False, "error": "Insufficient permissions for this tool"}
# 3. Argument validation
if not self.validate_arguments(arguments):
return {"success": False, "error": "Invalid arguments provided"}
# 4. Execute with error handling
try:
result = self.execute(arguments)
# 5. Filter sensitive data
if isinstance(result, dict) and "data" in result:
result["data"] = self.filter_result_data(result["data"], user_role)
# 6. Audit logging
audit_log_tool_access(current_user, self.name, arguments, result)
return result
except Exception as e:
error_result = {"success": False, "error": str(e)}
audit_log_tool_access(current_user, self.name, arguments, error_result)
return error_resultExample: Document Get Tool with Security
# frappe_assistant_core/plugins/core/tools/get_document.py
def execute(self, arguments):
doctype = arguments.get("doctype")
name = arguments.get("name")
current_user = frappe.session.user
# Security validation
validation_result = validate_document_access(
user=current_user,
doctype=doctype,
name=name,
perm_type="read"
)
if not validation_result["success"]:
return validation_result
# Special Administrator protection
if name == "Administrator" and current_user != "Administrator":
return {"success": False, "error": "Access denied: Cannot access Administrator record"}
# Get document data
doc = frappe.get_doc(doctype, name)
doc_dict = doc.as_dict()
# Filter sensitive fields based on user role
user_role = get_user_primary_role(current_user)
filtered_doc = filter_sensitive_fields(doc_dict, doctype, user_role)
return {"success": True, "data": filtered_doc}- Multiple Security Layers: Role → DocType → Document → Field validation
- Permission Redundancy: Both custom and Frappe permission checking
- Fail-Safe Defaults: Restrictive permissions by default
- Role-Based Minimum Access: Each role gets only necessary tool access
- Granular Permissions: Fine-grained control over operations
- Data Scoping: Users see only data they're authorized to access
- Comprehensive Logging: All operations logged with full context
- Security Monitoring: Failed access attempts recorded
- Forensic Capability: Complete audit trail for security investigations
- Sensitive Field Masking: Automatic filtering of sensitive information
- Role-Based Filtering: Data visibility based on user roles
- No Data Leakage: Secure error messages without sensitive data exposure
# Standard roles provided by the system
standard_roles = [
{"role": "Assistant User", "role_color": "#3498db"},
{"role": "Assistant Admin", "role_color": "#e74c3c"}
]
# Role assignment in Frappe
# System Manager: Built-in Frappe role
# Assistant Admin: Custom administrative role
# Assistant User: Custom business user role
# Default: Any other Frappe role gets basic access# Custom permission query conditions
permission_query_conditions = {
"Assistant Audit Log": "frappe_assistant_core.utils.permissions.get_audit_permission_query_conditions"
}
# Security settings can be customized through DocTypes:
# - Assistant Core Settings: General security configuration
# - Assistant Tool Registry: Tool-specific permission settings- Tool Usage Patterns: Track which tools are used most frequently
- Access Attempt Monitoring: Monitor failed access attempts for security threats
- User Activity Analysis: Analyze user behavior for anomaly detection
- Role Effectiveness: Evaluate if role permissions are appropriately configured
- Permission Denial Rate: Percentage of requests denied due to permissions
- Sensitive Data Access: Monitor access to sensitive DocTypes and fields
- Tool Usage by Role: Understand tool usage patterns across different user roles
- Security Incident Tracking: Track and analyze security-related events
- frappe.has_permission(): Deep integration with Frappe's permission engine
- Permission Query Conditions: Custom query filters for row-level security
- User Permissions: Automatic enforcement of user-specific data restrictions
- Company-Based Filtering: Seamless multi-company security support
- Session Management: Leverages Frappe's session handling
- IP Restriction: Integration with Frappe's IP-based access control
- Two-Factor Authentication: Compatible with Frappe's 2FA system
- Password Policies: Honors Frappe's password complexity requirements
Frappe Assistant Core provides dual-backend OCR (Optical Character Recognition) for extracting text from scanned documents and images. The system supports both local processing via PaddleOCR and AI-powered extraction via Ollama vision models, with automatic fallback.
File: frappe_assistant_core/plugins/data_science/tools/extract_file_content.py
Local OCR using PaddleOCR 3.x with PaddlePaddle inference engine. Processes images and PDFs entirely on the server.
Key characteristics:
- Supports 80+ languages
- No external service dependency
- Fast processing (~5-6 seconds per page)
- Model caching — loaded once per Gunicorn worker
- C++ crash recovery with automatic instance reset
PaddleOCR 3.x API:
from paddleocr import PaddleOCR
ocr = PaddleOCR(lang='en') # No use_angle_cls or show_log params
result = ocr.predict(image_or_path) # No cls param
# Result is list of OCRResult dict-like objects (one per page)
page = result[0]
texts = page['rec_texts'] # list of strings
scores = page['rec_scores'] # list of floats
polys = page['dt_polys'] # list of bounding box polygonsC++ Crash Recovery:
PaddleOCR's C++ inference backend can occasionally crash (e.g., segfault). The tool handles this with a two-attempt strategy:
for attempt in range(2):
ocr = _get_paddle_ocr(lang) if attempt == 0 else _get_paddle_ocr_fresh(lang)
try:
result = ocr.predict(image)
# success
except Exception:
if attempt == 0:
continue # retry with fresh instance
raise # second failure — actual errorAI-powered OCR using an Ollama vision model (e.g., deepseek-ocr). Better at understanding document context and complex layouts.
Key characteristics:
- Requires a running Ollama instance
- Uses
/api/generateendpoint with base64-encoded images - PDF pages rendered to images via PyMuPDF before sending to Ollama
- Falls back to PaddleOCR if Ollama fails or returns empty results
- Configurable model, URL, and timeout
Ollama PDF OCR Flow:
PDF bytes
→ PyMuPDF renders each page to pixmap (150 DPI)
→ Convert to PIL Image → JPEG → Base64
→ POST to Ollama /api/generate with vision model
→ Collect text per page → Combined result
OCR settings are managed via Assistant Core Settings DocType:
| Field | Description | Default |
|---|---|---|
ocr_backend |
OCR engine: paddleocr or ollama |
paddleocr |
ocr_language |
Default language for PaddleOCR | en |
ollama_api_url |
Ollama server URL | http://localhost:11434 |
ollama_vision_model |
Ollama vision model name | deepseek-ocr:latest |
ollama_request_timeout |
Timeout in seconds | 120 |
| Package | Purpose |
|---|---|
paddleocr>=2.9.0 |
OCR engine |
paddlepaddle>=2.6.0,<3.1.0 |
PaddlePaddle inference (pinned — v3.3.0 has oneDNN bug) |
pymupdf>=1.24.0 |
PDF page rendering for Ollama vision OCR |
Pillow>=10.0.0 |
Image processing |
_perform_ocr(file_content, arguments, file_type)
│
├─ if backend == "ollama":
│ ├─ _try_ollama_ocr()
│ │ ├─ PDF → _perform_ollama_pdf_ocr() (PyMuPDF + Ollama)
│ │ └─ Image → _ollama_extract_from_image()
│ │
│ └─ if Ollama fails or empty → fall through to PaddleOCR
│
└─ _perform_paddle_ocr()
├─ PDF → _perform_paddle_pdf_ocr() (temp file + ocr.predict)
└─ Image → PIL → numpy array → ocr.predict
PaddleOCR returns bounding box coordinates for each text region. The tool reconstructs document layout:
- Extract positions: Get (y_center, x_left) from each bounding box
- Sort vertically: Order text regions top-to-bottom
- Group into lines: Regions within 10px vertical distance form a line
- Sort horizontally: Within each line, sort left-to-right
- Join with tabs: Preserves table column alignment
This ensures OCR output maintains the original document's spatial structure.
POST /api/method/frappe_assistant_core.api.fac_endpoint.handle_mcp
Content-Type: application/json
{
"jsonrpc": "2.0",
"method": "initialize",
"params": {
"protocolVersion": "2025-06-18",
"capabilities": {}
},
"id": 1
}List Tools:
{
"jsonrpc": "2.0",
"method": "tools/list",
"id": 2
}Execute Tool:
{
"jsonrpc": "2.0",
"method": "tools/call",
"params": {
"name": "run_python_code",
"arguments": {
"code": "import pandas as pd\nprint('Hello World')"
}
},
"id": 3
}{
"jsonrpc": "2.0",
"method": "prompts/list",
"id": 4
}Centralized Error Responses:
{
"jsonrpc": "2.0",
"error": {
"code": -32603,
"message": "Internal error",
"data": "Detailed error information"
},
"id": 1
}Error Code Constants:
-32700: Parse Error-32600: Invalid Request-32601: Method Not Found-32602: Invalid Params-32603: Internal Error-32000: Authentication Required
- Frappe Framework 14+
- Python 3.8+
- MariaDB/MySQL
- Required Python packages (auto-installed via pyproject.toml)
# Navigate to your Frappe bench
cd frappe-bench
# Get the app
bench get-app https://github.com/paulclinton/frappe-assistant-core
# Install on site
bench --site [site-name] install-app frappe_assistant_core
# Run database migrations
bench --site [site-name] migrateThe app now uses modern Python packaging with pyproject.toml:
# Development installation (editable)
pip install -e .
# Production installation
pip install .
# With analysis dependencies
pip install .[analysis]
# With development dependencies
pip install .[dev]# Configure settings through UI
bench --site [site-name] set-config assistant_enabled 1
# Or via assistant admin interface
https://your-site.com/desk#/fac-adminversion: "3.8"
services:
frappe:
image: frappe/erpnext:latest
volumes:
- ./frappe_assistant_core:/home/frappe/frappe-bench/apps/frappe_assistant_core
environment:
- INSTALL_APPS=frappe_assistant_corePrevious Issues: 26 temporary test files were cluttering the repository Resolution: All temporary test files removed, proper test structure implemented
tests/
├── unit/
│ ├── test_api_handlers.py # API handler tests
│ ├── test_tool_registry.py # Tool registry tests
│ └── test_logging.py # Logging system tests
├── integration/
│ ├── test_tool_execution.py # End-to-end tool tests
│ └── test_mcp_protocol.py # MCP protocol tests
└── fixtures/
└── test_data.json # Test data
# Run all tests
bench --site [site-name] run-tests frappe_assistant_core
# Run specific test categories
pytest tests/unit/
pytest tests/integration/
# With coverage
pytest --cov=frappe_assistant_core tests/- Enhanced Analytics: Advanced statistical analysis tools
- Real-time Collaboration: Multi-user sessions
- Plugin System: Third-party tool extensions
- API Rate Limiting: Advanced throttling mechanisms
- Webhook Integration: External service notifications
- Use Modular Handlers: Add new functionality in separate handler modules
- Leverage Constants: All strings and configuration in
constants/definitions.py - Professional Logging: Use
api_loggerinstead of print statements - Follow Patterns: Maintain consistency with existing code structure
- Test Coverage: Include comprehensive tests for new features
The Skills subsystem adds first-class MCP Resources support to FAC. Skills are markdown knowledge documents stored in the FAC Skill DocType and surfaced to MCP clients via resources/list + resources/read. They can be authored by users through the FAC Admin UI or shipped by external Frappe apps via an assistant_skills hook.
For end-user and developer-facing docs see the Skills User Guide and the Skills Developer Guide. This section focuses on the technical implementation.
Skills live in the FAC Skill DocType (fac_skill.json). Notable columns:
skill_id(Data, unique) — matches^[a-z0-9_-]+$; used as the URI suffix infac://skills/<skill_id>.content(Markdown Editor) — the full markdown body returned byresources/read.status(Select) —Draft/Published/Deprecated.visibility(Select) —Private/Shared/Public.skill_type(Select) —Tool Usage/Workflow.linked_tool(Data) — MCP tool name; only meaningful for Tool Usage skills.is_system(Check, read-only) /source_app(Data, read-only) — set by the installation path for hook-installed skills.owner_user(Link → User) — row owner; default__user.shared_with_roles(Table MultiSelect → Has Role) — populated whenvisibility="Shared".use_count(Int),last_used(Datetime) — analytics, bumped insideSkillManager.read_skill_content.
DocType-level validation is in fac_skill.py: skill_id regex + uniqueness, shared_with_roles required when Shared, and a system-skill deletion guard that only admits deletes from the hook loader (via flags.allow_system_delete) or when the owning source_app is no longer installed.
SkillManager is a stateless helper — one instance per request, no shared state. Key methods:
get_user_accessible_skills(user=None)— returns skills visible to the user. A singleor_filtersquery fetches owner + Public + system skills; a separate SQL query joinsHas Rolefor Shared skills. Results are deduplicated byskill_id.get_skill_as_resource(skill_info)— converts a skill row to an MCP resource descriptor (uri,name,description,mimeType).read_skill_content(skill_id)— resolves the skill, enforces the permission model (Drafts only readable by owner; Published subject to visibility + role matching), and increments the usage counter before returning the markdown. Raisesfrappe.PermissionErroron deny; returnsNonefor missing skills.get_skill_by_tool(tool_name)— finds a Published skill linked to a tool name that the caller can see.get_tool_skill_map()— returns{tool_name: {description, skill_id}}for every Published Tool Usage skill that has alinked_tool. Drives thereplaceskill mode.increment_usage(skill_name)— atomicUPDATE ... SET use_count = use_count + 1, last_used = NOW(); errors are logged and swallowed so analytics never break a read.
A module-level get_skill_manager() factory is kept for backwards compatibility.
The MCP server exposes two resource endpoints, both declared in initialize capabilities:
resources/list(resources.py:237) — callsSkillManager.get_user_accessible_skills(), filters tostatus == "Published", and returns each as an MCP resource descriptor. Returns{"resources": []}when theFAC Skilltable doesn't exist yet (e.g. before first migrate).resources/read(resources.py:256) — validates the URI againstfac://skills/<slug>and the^[a-z0-9_-]+$slug regex, then delegates toSkillManager.read_skill_content(). RaisesValueErrorfor malformed URIs or missing skills and re-raisesfrappe.PermissionErrorso the MCP layer can map it to the right JSON-RPC error.
Both handlers are wired into the MCPServer dispatcher at mcp/server.py:201-206 (resources/list, resources/read, and a stub resources/templates/list that returns empty).
tools/list consults Assistant Core Settings.skill_mode:
settings = frappe.get_single("Assistant Core Settings")
if getattr(settings, "skill_mode", "supplementary") == "replace":
skill_replace_map = get_skill_manager().get_tool_skill_map()For each registered tool in the response, if the tool name is in the replace map, its description is rewritten to:
<tool_name>: <skill description>. Detailed guidance: fac://skills/<skill_id>
See mcp/server.py:303-334. Tools without a linked Published skill keep their original descriptions regardless of mode.
The assistant_skills hook is processed during after_migrate by _install_app_skills (migration_hooks.py:660-812). For each entry:
- Resolve
manifestandcontent_diragainstfrappe.get_app_path(app). - Parse the manifest JSON.
- Delete every existing
FAC Skillwithis_system=1, source_app=<app>whoseskill_idis no longer in the manifest. Usesflags.allow_system_deleteto bypass the system-skill deletion guard. - For each manifest entry: read the markdown from
content_dir/<content_file>, then upsert. On create, fields are set from the manifest with these defaults:status=Published,visibility=Public,skill_type=Workflow(note: differs from the UI default of Tool Usage),owner_user=Administrator,is_system=1. On update, only manifest-owned fields (title,description,status,visibility,skill_type,linked_tool,category,content,source_app) are re-synced;owner_userandis_systemare preserved.
Uninstall cleanup is handled by before_app_uninstall (migration_hooks.py:550-578): every row with source_app=<uninstalling_app> is deleted. FAC itself is skipped to avoid wiping system skills during self-uninstall.
Two layers enforce visibility:
- DocType layer —
get_skill_permission_query_conditionsinutils/permissions.py:135-180, wired in viahooks.py:119. Constructs a SQL WHERE clause covering owner, Public + Published, System + Published, and Shared + Published (with aHas RoleEXISTS subquery). System Managers see everything (empty conditions). Other roles are denied (1=0). - MCP read layer —
SkillManager._user_can_access_skill(resources.py:191-213) runs the same logic in Python when fetching a single skill, with the additional rule that the owner can read their own Drafts.
The two layers agree for Published skills; Drafts are only reachable through the owner path.
Skill queries use a Redis cache under the "skills" key (site-scoped). The cache is invalidated on:
FAC Skill.on_update— every save (fac_skill.py:56-58)FAC Skill.on_trash— every delete (fac_skill.py:60-69)- Admin UI publish/unpublish toggles
This makes skills safe under multi-worker Gunicorn deployments without extra synchronization.
use_count and last_used are updated only on resources/read, not on resources/list. The update is a single atomic SQL statement inside SkillManager.increment_usage and failures are logged but never surfaced to the caller — analytics never block reads. Use these fields to prune unused skills that are adding noise to resources/list.
- Tool Reference: Complete catalog of all available tools
- Development Guide: How to create custom tools
- Architecture Overview: System design and structure
- API Reference: MCP protocol and API endpoints
- External App Development: Create tools in your apps
- Internal Plugin Development: Create internal plugins
- Test Case Creation Guide: Testing patterns
- GitHub Repository: frappe-assistant-core
- License: AGPLV3 License
- Issues: GitHub Issues for bug reports and feature requests
Last Updated: February 2026 - Version 2.3.1 Architecture: Modular, Modern, Maintainable