[RFC] Agentic AI Eval Platform: Agent-eval-scheduler plugin Design

## Purpose & Motivation

The Agentic AI Eval Platform is built natively on OpenSearch, using OTel Collector for span ingestion and OpenSearch indices as the sole data store. The platform needs an asynchronous evaluation engine that detects newly ingested spans, runs evaluations (LLM-as-a-Judge, RAG metrics, deterministic checks), and writes scores back to OpenSearch — all without manual intervention.

OpenSearch Job Scheduler provides the scheduling infrastructure (interval triggers, distributed locking, job persistence), but it is an SPI framework — it requires a consumer plugin to define job types and execution logic. No existing OpenSearch plugin provides evaluation-specific orchestration. The agent-eval-scheduler plugin fills this gap: it implements the Job Scheduler SPI to schedule two recurring jobs that bridge span ingestion and evaluation execution.

The plugin's core architectural principle is the **connection-based evaluation routing model**. Each evaluation request is routed to a specific backend (Python Agent Service or ML Commons) using a specific protocol (REST or AG-UI) as determined by the Eval Agent Connection linked in the trigger configuration. There are no global backend or protocol settings — the connection is the single source of truth for how an evaluation is executed.

---

## Key Concepts

- **Eval Agent Connection** — A registered connection to an evaluation backend, defining the backend type (`PYTHON_AGENT_SERVICE` or `ML_COMMONS`), communication protocol (`REST` or `AGUI`), endpoint, and timeout. Stored in the `eval_agent_connections` index.

- **Eval Search Filter** — A saved query configuration that matches spans by criteria (agent name, operation, tags) and links them to evaluators via evaluator assignments. Each assignment pairs an evaluator with a specific Eval Agent Connection, determining the backend and protocol at execution time.

- **Job Document** — A document in the `eval_job_metrics` index representing a single evaluation job. Tracks status (`PENDING` → `RUNNING` → `COMPLETED`/`FAILED`), priority, retry count, and timing metadata.

- **Evaluator Template** — A configuration specifying which OSS library, metric, model, and parameters to use for an LLM-dependent evaluation. Does not specify backend or protocol — those come from the connection.

- **Deterministic Evaluator** — An evaluator that runs directly in the plugin without external calls (regex match, JSON validity, exact match, contains). Produces scores with minimal latency.

---

## Architecture Overview

```mermaid
graph TB
 subgraph "OpenSearch Cluster"
 subgraph "Every Data Node"
 PLUGIN[agent-eval-scheduler-plugin]
 TS[Trigger Sweeper polls for new spans]
 JE[Job Executor runs evaluations]
 DET[Deterministic Engine regex, JSON, exact match]
 PLUGIN --> TS
 PLUGIN --> JE
 JE --> DET
 end

 JS[Job Scheduler SPI scheduling + LockService]
 PLUGIN -->|registers via SPI| JS

 subgraph "Indices"
 SPANS[otel-v1-apm-span-*]
 SCORES[eval_scores]
 JOBS[eval_job_metrics]
 CONNS[eval_agent_connections]
 FILTERS[eval_search_filters]
 TEMPLATES[eval_evaluator_templates]
 end
 end

 subgraph "Evaluation Backends"
 PAS[Python Agent Service Strands-based, external]
 MLC[ML Commons Agent Framework in-cluster]
 end

 LLM[LLM Providers Bedrock / OpenAI / Anthropic]

 TS -->|poll new spans| SPANS
 TS -->|read trigger configs| FILTERS
 TS -->|create PENDING jobs| JOBS

 JE -->|pick up PENDING jobs| JOBS
 JE -->|read span data| SPANS
 JE -->|resolve connection| CONNS
 JE -->|load evaluator config| TEMPLATES
 JE -->|write scores| SCORES

 JE -->|REST or AG-UI| PAS
 JE -->|Execute Agent API or AG-UI| MLC

 PAS --> LLM
 MLC --> LLM
```

The plugin runs on every data node. Job Scheduler's LockService ensures each job is executed by exactly one node — no external coordination required.

---

## How It Works

**1. Set up connections.** An operator registers one or more Eval Agent Connections via the plugin's REST API (`/_plugins/_eval/connections`). Each connection defines a backend type, protocol, endpoint, and timeout. For example: a Python Agent Service over REST at `http://eval-agent:8080`, or an ML Commons agent via AG-UI with agent ID `agent-abc123`.

**2. Create evaluation triggers.** The operator creates Eval Search Filters (`/_plugins/_eval/search-filters`) that define which spans to evaluate and how. Each filter specifies span matching criteria (e.g., `gen_ai.agent.name = "my-agent"`), an evaluation mode (online or offline), and a list of evaluator assignments. Each assignment pairs an evaluator template with a specific connection — different evaluators in the same filter can use different backends.

**3. Trigger Sweeper detects spans.** The Trigger Sweeper runs on a configurable interval (default 5s). It loads all active Eval Search Filters, queries `otel-v1-apm-span-*` for root spans indexed since each filter's `lastSweepTime` watermark, and creates PENDING Job Documents for each span × evaluator match. Online jobs get `priority: HIGH`; offline jobs get `priority: NORMAL`. Before creating a job, the sweeper checks for existing `targetSpanId + evaluatorId` combinations to prevent duplicates.

**4. Job Executor processes jobs.** The Job Executor runs on a configurable interval (default 2s). It queries for PENDING jobs ordered by priority (HIGH before NORMAL before LOW), acquires a distributed lock per job, and executes the evaluation. For LLM-dependent evaluators, it resolves the Eval Agent Connection from the job, selects the matching backend implementation, and calls it using the connection's protocol. For deterministic evaluators, it runs the check in-plugin (no external call). On success, it writes scores to `eval_scores` and marks the job COMPLETED. On failure, it retries with exponential backoff (`2^retryCount × 1000ms`) up to a configurable max.

**5. Deterministic evaluators skip the network.** Simple checks — exact match, regex, JSON validity, contains — execute directly in the Java plugin. No backend call, no LLM invocation. This keeps latency minimal for straightforward quality gates.

---

## Evaluation Backends

The plugin uses an `Evaluation_Backend` abstraction with two implementations, making backends interchangeable from the Job Executor's perspective.

**Python Agent Service** — An external Python service hosting a Strands-based eval agent. It invokes OSS evaluation libraries (Strands Eval, DeepEval, Ragas) and manages LLM provider connections. The plugin communicates with it over REST (synchronous HTTP) or AG-UI (streaming events), depending on the connection's protocol setting.

**ML Commons Agent Framework** — OpenSearch's native ML framework, running in-cluster. The plugin sends evaluation requests via the Execute Agent API (`_plugins/_ml/agents/{agent_id}/_execute`) or opens an AG-UI stream. No external service deployment required.

Both backends support both protocols. The protocol is a property of the connection, not a global setting — an operator can have one connection using REST and another using AG-UI, even to the same backend type. The Job Executor calls `evaluateRest()` or `evaluateAgui()` based on the resolved connection, and both return the same normalized score format.

---

## Data Models

The plugin manages three core data models, each stored in its own OpenSearch index.

### Entity Relationships

```mermaid
erDiagram
 EVAL_SEARCH_FILTER ||--o{ EVALUATOR_ASSIGNMENT : contains
 EVALUATOR_ASSIGNMENT }o--|| EVAL_AGENT_CONNECTION : "routes via"
 EVALUATOR_ASSIGNMENT }o--|| EVALUATOR_TEMPLATE : references
 EVAL_SEARCH_FILTER ||--o{ JOB_DOCUMENT : "triggers creation of"
 JOB_DOCUMENT }o--|| EVAL_AGENT_CONNECTION : "resolved at execution"

 EVAL_SEARCH_FILTER {
 keyword id
 text name
 keyword evaluationMode
 object spanMatchCriteria
 nested evaluatorAssignments
 date lastSweepTime
 }

 EVAL_AGENT_CONNECTION {
 keyword id
 text name
 keyword backendType
 keyword protocol
 keyword endpoint
 integer timeoutMs
 keyword status
 }

 JOB_DOCUMENT {
 keyword jobId
 keyword jobType
 keyword status
 integer priority
 keyword evaluatorId
 keyword connectionId
 keyword targetSpanId
 integer retryCount
 date nextEligibleTime
 }

 EVALUATOR_TEMPLATE {
 keyword id
 keyword library
 keyword metric
 object modelConfig
 }
```

### Eval Agent Connection (`eval_agent_connections`)

Represents a registered connection to an evaluation backend.

| Field | Description |
|-------|-------------|
| `id` | Unique identifier |
| `name` | Human-readable connection name |
| `backendType` | `PYTHON_AGENT_SERVICE` or `ML_COMMONS` |
| `protocol` | `REST` or `AGUI` |
| `endpoint` | For Python Agent Service: HTTP URL (e.g., `http://eval-agent:8080`). For ML Commons: agent ID (e.g., `agent-abc123`) |
| `timeoutMs` | Request timeout in milliseconds |
| `status` | `ACTIVE` or `INACTIVE` — only active connections can be used for new jobs |

### Eval Search Filter (`eval_search_filters`)

Defines which spans to evaluate and how, linking span criteria to evaluators via connections.

| Field | Description |
|-------|-------------|
| `id` | Unique identifier |
| `name` | Human-readable filter name |
| `evaluationMode` | `ONLINE` (real-time) or `OFFLINE` (batch/experiment) |
| `spanMatchCriteria` | Query criteria: agent name, operation name, tags, custom filters |
| `evaluatorAssignments` | List of `{ evaluatorId, connectionId }` pairs — each assignment routes an evaluator through a specific connection |
| `lastSweepTime` | Per-filter watermark tracking the most recent sweep position |

### Job Document (`eval_job_metrics`)

Represents a single evaluation job created by the Trigger Sweeper and processed by the Job Executor.

| Field | Description |
|-------|-------------|
| `jobId` | Unique identifier |
| `jobType` | `online_agent_trace_eval`, `offline_agent_trace_eval_item`, `offline_agent_trace_eval_run`, or `annotation_lock_release` |
| `status` | Current state: `PENDING`, `RUNNING`, `COMPLETED`, or `FAILED` |
| `priority` | `HIGH` (3) for online, `NORMAL` (2) for offline, `LOW` (1) for annotation lock release |
| `evaluatorId` | Reference to the evaluator template |
| `connectionId` | Reference to the Eval Agent Connection used for execution |
| `targetSpanId` | The span being evaluated |
| `retryCount` | Number of retry attempts so far |
| `nextEligibleTime` | Earliest time the job can be picked up (supports exponential backoff) |

### Job Status Lifecycle

```mermaid
stateDiagram-v2
 [*] --> PENDING: Trigger Sweeper creates job
 PENDING --> RUNNING: Job Executor acquires lock
 RUNNING --> COMPLETED: Evaluation succeeds
 RUNNING --> FAILED: Max retries exceeded
 RUNNING --> PENDING: Transient failure, retry eligible
 note right of PENDING: nextEligibleTime = now + 2^retryCount × 1000ms
 FAILED --> [*]
 COMPLETED --> [*]
```

---

## Plugin Name Suggestions

| Name | Rationale |
|------|-----------|
| `agent-eval-scheduler-plugin` (current) | Descriptive, clear purpose. Slightly generic — "scheduler" understates the evaluation execution responsibility. |
| `eval-engine-plugin` | Emphasizes that the plugin is the evaluation execution engine, not just a scheduler. Captures both orchestration and execution. |
| `eval-orchestrator-plugin` | Highlights the orchestration role (sweep → create jobs → delegate → write scores). Aligns with the multi-step pipeline nature. |
| `eval-worker-plugin` | Emphasizes the async worker pattern. Common in job-processing systems. |
| `eval-pipeline-plugin` | Captures the sweep → detect → evaluate → score pipeline. May conflict with OpenSearch's existing "pipeline" concept (ingest pipelines). |
| `eval-runner-plugin` | Simple, action-oriented. "Runner" aligns with Job Scheduler's `ScheduledJobRunner` SPI interface. |

**Recommendation:** `eval-engine-plugin` — it best captures the dual responsibility of job orchestration and evaluation execution, and distinguishes the plugin from a pure scheduling wrapper.

---

## Key Design Decisions

1. **Job Scheduler SPI as infrastructure** — The plugin does not implement its own scheduling, locking, or job persistence. It implements the three SPI interfaces (`JobSchedulerExtension`, `ScheduledJobParameter`, `ScheduledJobRunner`) and lets Job Scheduler handle the rest.

2. **Evaluation_Backend abstraction** — A common interface with two implementations (Python Agent Service, ML Commons), each supporting both REST and AG-UI protocols. New backends can be added without changing core job execution logic.

3. **Priority-based job pickup** — Jobs are queried by `priority DESC, createdAt ASC`, ensuring online evaluations (HIGH) always execute before offline batch work (NORMAL) and annotation lock releases (LOW).

4. **Deduplication at creation time** — The Trigger Sweeper checks for existing `targetSpanId + evaluatorId` combinations before creating jobs, preventing duplicate evaluations regardless of job status.

5. **Deterministic evaluators run in-plugin** — Simple evaluations (regex, JSON validity, exact match, contains) execute directly in Java without external service calls, minimizing latency.

6. **Horizontal scaling via LockService** — Every data node runs the plugin. Job Scheduler's distributed LockService ensures exactly-once execution per job across the cluster. Throughput scales linearly with cluster size.

---

## Configuration

All scheduling behavior is configurable via `opensearch.yml` under the `eval.scheduler.*` namespace, including sweep intervals, executor batch size, lock TTL, retry limits, and per-type concurrency limits. Sensible defaults are provided (e.g., 5s sweep interval, 2s executor interval, 3 max retries). Invalid values are rejected at startup with descriptive errors.

---

## FAQ - New Repository Proposal: `agent-eval-scheduler-plugin`

This section proposes creating a new repository under the `opensearch-project` GitHub organization to host the plugin described in this RFC. The content below addresses the questions from the [opensearch-project proposal template](https://github.com/opensearch-project/.github/issues/new?template=PROPOSAL_TEMPLATE.md).

### What are you proposing?

A new repository under `opensearch-project` for the `agent-eval-scheduler-plugin` — the Java-based OpenSearch plugin described in this RFC. The plugin implements asynchronous evaluation job orchestration for the Agentic AI Eval Platform by consuming the OpenSearch Job Scheduler SPI. It is a core backend component of the platform described in the [high-level design RFC (dashboards-observability#2592)](https://github.com/opensearch-project/dashboards-observability/issues/2592).

### What users have asked for this feature?

- The [Agentic AI Eval Platform high-level design RFC](https://github.com/opensearch-project/dashboards-observability/issues/2592) identifies the need for an async evaluation engine that runs natively within OpenSearch.
- The observability community has expressed interest in LLM evaluation capabilities integrated into the OpenSearch ecosystem — specifically for scoring agent traces using LLM-as-a-Judge, RAG metrics, and deterministic checks.
- Existing evaluation tools (DeepEval, Ragas, Strands Eval) run externally and require separate infrastructure. Users want evaluation orchestration that leverages OpenSearch's native scheduling, indexing, and distributed execution capabilities.

### What problems are you trying to solve?

When agent traces are ingested into OpenSearch via OTel Collector, there is no automated way to evaluate their quality (correctness, faithfulness, relevance) without external orchestration. OpenSearch Job Scheduler provides scheduling infrastructure but is an SPI framework — it requires a consumer plugin to define job types and execution logic. No existing OpenSearch plugin provides evaluation-specific orchestration.

> When **new agent traces are ingested into OpenSearch**, **a platform operator** wants to **automatically detect and evaluate those traces using configured evaluators and backends**, so they **get quality scores written back to OpenSearch without manual intervention or external orchestration**.

### What is the developer experience going to be?

The plugin exposes REST APIs under the `/_plugins/_eval/` namespace:

- `/_plugins/_eval/connections` — CRUD for Eval Agent Connections (register evaluation backends with backend type, protocol, endpoint, timeout)
- `/_plugins/_eval/search-filters` — CRUD for Eval Search Filters (configure span matching criteria and evaluator-to-connection assignments)

No changes to existing OpenSearch APIs. The plugin depends on the Job Scheduler plugin (existing SPI dependency). All configuration is via `opensearch.yml` under the `eval.scheduler.*` namespace. See the [Architecture Overview](#architecture-overview) and [Data Models](#data-models) sections above for full details.

#### Security considerations

- The plugin integrates with OpenSearch's security plugin for index-level access control. All plugin indices (`eval_job_metrics`, `eval_agent_connections`, `eval_search_filters`) are subject to standard OpenSearch security policies.
- Eval Agent Connection endpoints (external URLs, agent IDs) are stored as configuration — operators control which backends are reachable.
- No new authentication mechanisms are introduced; the plugin relies on OpenSearch's existing security model.

#### Breaking changes to the API

None. This is a new plugin with new REST endpoints. No existing OpenSearch APIs are modified.

### What is the user experience going to be?

1. Operator installs the plugin alongside Job Scheduler.
2. Operator registers Eval Agent Connections via REST API (e.g., a Python Agent Service over REST, or an ML Commons agent via AG-UI).
3. Operator creates Eval Search Filters that define which spans to evaluate and which evaluator + connection to use.
4. The plugin automatically detects new spans, creates evaluation jobs, executes them via the configured backends, and writes scores to `eval_scores`.
5. Operators monitor job status and metrics via the `eval_job_metrics` index.

No breaking changes to existing user experience. The plugin is entirely additive.

### Why should it be built? Any reason not to?

**Why build it:**
- The Agentic AI Eval Platform needs an async evaluation engine that runs natively within OpenSearch. Without this plugin, evaluation orchestration would require external infrastructure (Airflow, Step Functions, custom cron jobs), adding operational complexity.
- The plugin leverages Job Scheduler's existing distributed locking and scheduling infrastructure, avoiding reinventing these capabilities.
- It enables horizontal scaling — the plugin runs on every data node, and Job Scheduler's LockService ensures exactly-once execution per job. Throughput scales linearly with cluster size.
- The connection-based architecture allows operators to manage multiple evaluation backends independently, each with its own protocol and endpoint configuration.

**Why a separate repository:**
- The plugin is a standalone OpenSearch server-side plugin (Java) with its own build lifecycle, release cadence, and SPI dependency on Job Scheduler.
- It does not belong in OpenSearch Core — it is domain-specific evaluation orchestration, not core search/indexing functionality.
- It does not belong in `dashboards-observability` — that repository hosts OpenSearch Dashboards UI components, not backend OpenSearch plugins.
- It follows the same pattern as other standalone OpenSearch plugins in the organization (e.g., [`job-scheduler`](https://github.com/opensearch-project/job-scheduler), [`anomaly-detection`](https://github.com/opensearch-project/anomaly-detection), [`index-management`](https://github.com/opensearch-project/index-management)).

**Potential concern:** The plugin introduces new OpenSearch indices and REST endpoints. However, these are isolated to the `eval.*` namespace and do not affect existing functionality.

### What will it take to execute?

- The plugin will be bootstrapped from the [opensearch-plugin-template-java](https://github.com/opensearch-project/opensearch-plugin-template-java).
- **Dependencies:** OpenSearch Core (build dependency), Job Scheduler plugin (SPI runtime dependency).
- **Key components:** `EvalSchedulerExtension` (plugin entry point), `EvalTriggerSweeper`, `EvalJobExecutor`, `EvaluationBackend` interface with `PythonAgentServiceBackend` and `MLCommonsBackend` implementations, `DeterministicEvaluatorEngine`, REST API handlers for connections and search filters.
- **Testing:** Property-based tests using jqwik for correctness properties, unit tests for edge cases and error handling, integration tests with embedded OpenSearch cluster.
- **License:** Apache License 2.0. No third-party dependencies that are incompatible with Apache-2.0.
- **Publication targets:** Maven Snapshots / Sonatype Nexus, Maven Central.
- **Initial maintainers:** [To be confirmed — list proposed maintainers here]

### Any remaining open questions?

- **Final plugin name:** The working name is `agent-eval-scheduler-plugin`, but alternatives like `eval-engine-plugin` or `eval-orchestrator-plugin` may better capture the plugin's dual responsibility. Community input is welcome — see the [Plugin Name Suggestions](#plugin-name-suggestions) section above.
- **AG-UI protocol specification:** The plugin supports AG-UI as a streaming communication protocol alongside REST. The AG-UI integration details will be finalized as the protocol matures.
- **ML Commons Agent Framework integration:** The exact request/response format for the Execute Agent API integration will be finalized during implementation.
- **Release coordination:** The plugin is a backend component of the broader Agentic AI Eval Platform. The UI components live in `dashboards-observability`. Coordination on release cadence and compatibility will be needed.


---

## Related

- **High-Level Design RFC**: [opensearch-project/dashboards-observability#2592](https://github.com/opensearch-project/dashboards-observability/issues/2592)

Field	Description
`id`	Unique identifier
`name`	Human-readable connection name
`backendType`	`PYTHON_AGENT_SERVICE` or `ML_COMMONS`
`protocol`	`REST` or `AGUI`
`endpoint`	For Python Agent Service: HTTP URL (e.g., `http://eval-agent:8080`). For ML Commons: agent ID (e.g., `agent-abc123`)
`timeoutMs`	Request timeout in milliseconds
`status`	`ACTIVE` or `INACTIVE` — only active connections can be used for new jobs

Field	Description
`id`	Unique identifier
`name`	Human-readable filter name
`evaluationMode`	`ONLINE` (real-time) or `OFFLINE` (batch/experiment)
`spanMatchCriteria`	Query criteria: agent name, operation name, tags, custom filters
`evaluatorAssignments`	List of `{ evaluatorId, connectionId }` pairs — each assignment routes an evaluator through a specific connection
`lastSweepTime`	Per-filter watermark tracking the most recent sweep position

Field	Description
`jobId`	Unique identifier
`jobType`	`online_agent_trace_eval`, `offline_agent_trace_eval_item`, `offline_agent_trace_eval_run`, or `annotation_lock_release`
`status`	Current state: `PENDING`, `RUNNING`, `COMPLETED`, or `FAILED`
`priority`	`HIGH` (3) for online, `NORMAL` (2) for offline, `LOW` (1) for annotation lock release
`evaluatorId`	Reference to the evaluator template
`connectionId`	Reference to the Eval Agent Connection used for execution
`targetSpanId`	The span being evaluated
`retryCount`	Number of retry attempts so far
`nextEligibleTime`	Earliest time the job can be picked up (supports exponential backoff)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Agentic AI Eval Platform: Agent-eval-scheduler plugin Design #2599

Purpose & Motivation

Key Concepts

Architecture Overview

How It Works

Evaluation Backends

Data Models

Entity Relationships

Eval Agent Connection (`eval_agent_connections`)

Eval Search Filter (`eval_search_filters`)

Job Document (`eval_job_metrics`)

Job Status Lifecycle

Plugin Name Suggestions

Key Design Decisions

Configuration

FAQ - New Repository Proposal: `agent-eval-scheduler-plugin`

What are you proposing?

What users have asked for this feature?

What problems are you trying to solve?

What is the developer experience going to be?

Security considerations

Breaking changes to the API

What is the user experience going to be?

Why should it be built? Any reason not to?

What will it take to execute?

Any remaining open questions?

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Name	Rationale
`agent-eval-scheduler-plugin` (current)	Descriptive, clear purpose. Slightly generic — "scheduler" understates the evaluation execution responsibility.
`eval-engine-plugin`	Emphasizes that the plugin is the evaluation execution engine, not just a scheduler. Captures both orchestration and execution.
`eval-orchestrator-plugin`	Highlights the orchestration role (sweep → create jobs → delegate → write scores). Aligns with the multi-step pipeline nature.
`eval-worker-plugin`	Emphasizes the async worker pattern. Common in job-processing systems.
`eval-pipeline-plugin`	Captures the sweep → detect → evaluate → score pipeline. May conflict with OpenSearch's existing "pipeline" concept (ingest pipelines).
`eval-runner-plugin`	Simple, action-oriented. "Runner" aligns with Job Scheduler's `ScheduledJobRunner` SPI interface.

[RFC] Agentic AI Eval Platform: Agent-eval-scheduler plugin Design #2599

Description

Purpose & Motivation

Key Concepts

Architecture Overview

How It Works

Evaluation Backends

Data Models

Entity Relationships

Eval Agent Connection (eval_agent_connections)

Eval Search Filter (eval_search_filters)

Job Document (eval_job_metrics)

Job Status Lifecycle

Plugin Name Suggestions

Key Design Decisions

Configuration

FAQ - New Repository Proposal: agent-eval-scheduler-plugin

What are you proposing?

What users have asked for this feature?

What problems are you trying to solve?

What is the developer experience going to be?

Security considerations

Breaking changes to the API

What is the user experience going to be?

Why should it be built? Any reason not to?

What will it take to execute?

Any remaining open questions?

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Eval Agent Connection (`eval_agent_connections`)

Eval Search Filter (`eval_search_filters`)

Job Document (`eval_job_metrics`)

FAQ - New Repository Proposal: `agent-eval-scheduler-plugin`