Skip to content

Commit 8040d7a

Browse files
authored
fix: Review large PR (#507)
* fix: rename `test-cases.json` to `test_cases.json` across repo for consistency * refactor: set default `ServerMode` to 'default' across functions, enforce explicit handling of empty tool selectors, and freeze tool result objects. Add new tests for mode-specific behavior.
1 parent 2c5c60b commit 8040d7a

File tree

17 files changed

+81
-24
lines changed

17 files changed

+81
-24
lines changed

evals/README.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -44,13 +44,13 @@ export OPENROUTER_API_KEY="your_key"
4444
export OPENROUTER_BASE_URL="https://openrouter.ai/api/v1"
4545

4646
npm ci
47-
npm run evals:create-dataset # one-time: creates dataset from test-cases.json
47+
npm run evals:create-dataset # one-time: creates dataset from test_cases.json
4848
npm run evals:run # runs evaluation on default dataset (v1.4)
4949
```
5050

5151
### Using a specific dataset version
5252

53-
By default, the evaluation uses the dataset version from `test-cases.json` (`v1.4`). To use a different dataset:
53+
By default, the evaluation uses the dataset version from `test_cases.json` (`v1.4`). To use a different dataset:
5454

5555
```bash
5656
# Create a new dataset with custom name
@@ -285,4 +285,3 @@ NOTES:
285285
// System prompt - instructions mainly cursor (very similar instructions in copilot)
286286
// https://github.com/x1xhlol/system-prompts-and-models-of-ai-tools/blob/main/Cursor%20Prompts/Agent%20Prompt%20v1.2.txt
287287
// https://github.com/x1xhlol/system-prompts-and-models-of-ai-tools/blob/main/VSCode%20Agent/Prompt.txt
288-

evals/config.ts

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,10 +11,10 @@ import log from '@apify/log';
1111
// Re-export shared config
1212
export { OPENROUTER_CONFIG, sanitizeHeaderValue, validateEnvVars, getRequiredEnvVars } from './shared/config.js';
1313

14-
// Read version from test-cases.json
14+
// Read the version from test-cases.json
1515
function getTestCasesVersion(): string {
1616
const dir = dirname(fileURLToPath(import.meta.url));
17-
const raw = readFileSync(join(dir, 'test-cases.json'), 'utf-8');
17+
const raw = readFileSync(join(dir, 'test_cases.json'), 'utf-8');
1818
return JSON.parse(raw).version;
1919
}
2020

@@ -28,7 +28,7 @@ export type EvaluatorName = typeof EVALUATOR_NAMES[keyof typeof EVALUATOR_NAMES]
2828

2929
// Models to evaluate
3030
// 'openai/gpt-4.1-mini', // DO NOT USE - it has much worse performance than gpt-4o-mini and other models
31-
// 'openai/gpt-4o-mini', // Neither used in cursor nor copilot
31+
// 'openai/gpt-4o-mini', // Neither used in cursor nor copilot
3232
// 'openai/gpt-4.1',
3333
export const MODELS_TO_EVALUATE = [
3434
'anthropic/claude-haiku-4.5',

evals/workflows/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ npm run evals:workflow -- --category search
3232
# Run specific test
3333
npm run evals:workflow -- --id search-google-maps
3434

35-
# Filter by line range in test-cases.json
35+
# Filter by line range in test_cases.json
3636
npm run evals:workflow -- --lines 277-283
3737

3838
# Show detailed conversation logs
@@ -242,7 +242,7 @@ export OPENROUTER_API_KEY="your_openrouter_key" # Get from https://openrouter.ai
242242
| `--id <id>` | | Run specific test by ID | All tests |
243243
| `--lines <range>` | `-l` | Filter by line range in test-cases.json | All tests |
244244
| `--verbose` | | Show detailed conversation logs | `false` |
245-
| `--test-cases-path <path>` | | Custom test cases file path | `test-cases.json` |
245+
| `--test-cases-path <path>` | | Custom test cases file path | `test_cases.json` |
246246
| `--agent-model <model>` | | Override agent model | `anthropic/claude-haiku-4.5` |
247247
| `--judge-model <model>` | | Override judge model | `x-ai/grok-4.1-fast` |
248248
| `--tool-timeout <seconds>` | | Tool call timeout | `60` |
@@ -252,7 +252,7 @@ export OPENROUTER_API_KEY="your_openrouter_key" # Get from https://openrouter.ai
252252

253253
### Line Range Filtering
254254

255-
The `--lines` (or `-l`) option filters test cases by their line numbers in the `test-cases.json` file.
255+
The `--lines` (or `-l`) option filters test cases by their line numbers in the `test_cases.json` file.
256256

257257
**Format options:**
258258
- **Single line:** `--lines 100` (includes tests that contain line 100)

evals/workflows/test_cases_loader.ts

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ export type WorkflowTestCaseWithLineNumbers = WorkflowTestCase & TestCaseWithLin
2222
* Load workflow test cases from JSON file with validation
2323
*/
2424
export function loadTestCases(filePath?: string): WorkflowTestCase[] {
25-
const testCasesPath = filePath || path.join(process.cwd(), 'evals/workflows/test-cases.json');
25+
const testCasesPath = filePath || path.join(process.cwd(), 'evals/workflows/test_cases.json');
2626

2727
if (!fs.existsSync(testCasesPath)) {
2828
throw new Error(`Test cases file not found: ${testCasesPath}`);
@@ -89,7 +89,7 @@ export function loadTestCasesWithLineNumbers(filePath?: string): {
8989
testCases: WorkflowTestCaseWithLineNumbers[];
9090
totalLines: number;
9191
} {
92-
const testCasesPath = filePath || path.join(process.cwd(), 'evals/workflows/test-cases.json');
92+
const testCasesPath = filePath || path.join(process.cwd(), 'evals/workflows/test_cases.json');
9393

9494
if (!fs.existsSync(testCasesPath)) {
9595
throw new Error(`Test cases file not found: ${testCasesPath}`);

src/mcp/utils.ts

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,8 +42,14 @@ export function getProxyMCPServerToolName(url: string, toolName: string): string
4242
* @param url The URL to process
4343
* @param apifyClient The Apify client instance
4444
* @param mode Server mode for tool variant resolution
45+
* @param actorStore
4546
*/
46-
export async function processParamsGetTools(url: string, apifyClient: ApifyClient, mode: ServerMode, actorStore?: ActorStore) {
47+
export async function processParamsGetTools(
48+
url: string,
49+
apifyClient: ApifyClient,
50+
mode: ServerMode = 'default',
51+
actorStore?: ActorStore,
52+
) {
4753
const input = parseInputParamsFromUrl(url);
4854
return await loadToolsFromInput(input, apifyClient, mode, actorStore);
4955
}

src/resources/resource_service.ts

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,12 +23,12 @@ type ResourceService = {
2323

2424
type ResourceServiceOptions = {
2525
skyfireMode?: boolean;
26-
mode: ServerMode;
26+
mode?: ServerMode;
2727
getAvailableWidgets: () => Map<string, AvailableWidget>;
2828
};
2929

3030
export function createResourceService(options: ResourceServiceOptions): ResourceService {
31-
const { skyfireMode, mode, getAvailableWidgets } = options;
31+
const { skyfireMode, mode = 'default', getAvailableWidgets } = options;
3232

3333
const listResources = async (): Promise<ListResourcesResult> => {
3434
const resources: Resource[] = [];

src/tools/categories.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -146,7 +146,7 @@ function resolveCategoryEntries(entries: readonly CategoryToolEntry[], mode: Ser
146146
* @param mode - Required. Use `'default'` or `'openai'`.
147147
* Made explicit (no default value) to prevent accidentally serving wrong-mode tools.
148148
*/
149-
export function getCategoryTools(mode: ServerMode): ToolCategoryMap {
149+
export function getCategoryTools(mode: ServerMode = 'default'): ToolCategoryMap {
150150
return Object.fromEntries(
151151
CATEGORY_NAMES.map((name) => [name, resolveCategoryEntries(toolCategories[name], mode)]),
152152
) as ToolCategoryMap;

src/tools/common/add_actor.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ USAGE EXAMPLES:
5050
const tools = await apifyMcpServer.loadActorsAsTools([parsed.actor], apifyClient);
5151
/**
5252
* If no tools were found, return a message that the Actor was not found
53-
* instead of returning that non existent tool was added since the
53+
* instead of returning that non-existent tool was added since the
5454
* loadActorsAsTools method returns an empty array and does not throw an error.
5555
*/
5656
if (tools.length === 0) {

src/tools/core/call_actor_common.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ For MCP server Actors, use format "actorName:toolName" to call a specific tool (
5252
.describe('The input JSON to pass to the Actor. Required.'),
5353
async: z.boolean()
5454
.optional()
55-
.describe(`When true: starts the run and returns immediately with runId. When false or not provided: waits for completion and returns results immediately. Default: true when UI mode is enabled (enforced), false otherwise. IMPORTANT: Only set async to true if the user explicitly asks to run the Actor in the background or does not need immediate results. When the user asks for data or results, always use async: false (default) so the results are returned immediately.`),
55+
.describe(`When true, starts the run and returns immediately with runId. When false or omitted, behavior depends on the active server mode/tool variant. IMPORTANT: use async=true only when the user explicitly asks to run in the background or does not need immediate results.`),
5656
previewOutput: z.boolean()
5757
.optional()
5858
.describe('When true (default): includes preview items. When false: metadata only (reduces context). Use when fetching fields via get-actor-output.'),

src/tools/index.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ export { CATEGORY_NAME_SET, CATEGORY_NAMES, getCategoryTools, toolCategories, to
2020
* Returns the tool entries for the default-enabled categories resolved for the given mode.
2121
* Computed here (not in helper file) to avoid module initialization issues.
2222
*/
23-
export function getDefaultTools(mode: ServerMode): ToolEntry[] {
23+
export function getDefaultTools(mode: ServerMode = 'default'): ToolEntry[] {
2424
return getExpectedToolsByCategories(toolCategoriesEnabledByDefault, mode);
2525
}
2626

0 commit comments

Comments
 (0)