Skip to content
This repository was archived by the owner on May 20, 2026. It is now read-only.

Commit be24308

Browse files
authored
Refactor thinking and effort control: per-request opt-in (#4515)
* Refactor thinking and effort control: make per-request opt-in via enableThinking and reasoningEffort - Add reasoning_effort to IChatModelCapabilities from CAPI model list - Add supportsReasoningEffort on ChatEndpoint/IChatEndpoint - Add enableThinking and reasoningEffort to IMakeChatRequestOptions - Build configurationSchema on VS Code LM API models for model picker effort dropdown - Remove disableThinking, AnthropicThinkingEffort, ResponsesApiReasoningEffort configs - Thinking is off by default; callers opt in with enableThinking: true - Agent mode (toolCallingLoop): enables thinking, passes reasoningEffort from modelConfiguration - ResponsesProxy / MessagesProxy: enables thinking - Inline chat, utility requests, LM wrapper: thinking off (default) - Effort level driven by configurationSchema in model picker (no default, user must choose) - BYOK Anthropic provider reads effort from options.modelConfiguration * refactor: Improve reasoningEffort handling across multiple components * Fix tests: add enableThinking: true to Agent location tests, restore maxThinkingBudget cap * Add defaultReasoningEffort, thread enableThinking/reasoningEffort to subagent loops and proxy endpoints - Add defaultReasoningEffort to IChatEndpoint (computed per model family: high for Anthropic/Gemini, medium for OpenAI) - Use defaultReasoningEffort as fallback in responsesApi, messagesApi, and configurationSchema - Delegate supportsReasoningEffort/defaultReasoningEffort in pass-through endpoints - Thread enableThinking/reasoningEffort through execution and search subagent loops - Add enableThinking: true to oaiLanguageModelServer and claudeLanguageModelServer - Restore maxThinkingBudget cap in customizeCapiBody * refactor: Adjust thinking budget calculation to use endpoint's maxThinkingBudget * Address PR feedback: fix comment, validate effort, remove defaultReasoningEffort - Fix misleading comment in messagesApi (thinking gated by enableThinking, not reasoningEffort) - Validate reasoningEffort against known values before sending to Messages API - Remove defaultReasoningEffort from IChatEndpoint and ChatEndpoint - Compute picker default locally in buildConfigurationSchema (UI concern only) - Remove effort fallbacks from messagesApi and responsesApi (pure caller control) * Address PR feedback round 2: validate effort, conditional schema default, location-gated thinking in fetch - Validate reasoningEffort against known values in messagesApi before sending - Fix comment to reflect enableThinking gating (not reasoningEffort) - Remove defaultReasoningEffort from endpoint (picker default is UI-only concern) - Compute picker default locally in buildConfigurationSchema - Gate thinking by location in DefaultToolCallingLoop.fetch() (Agent/MessagesProxy only) - Remove enableThinking from IToolCallingLoopOptions (decision made at fetch level) - Validate effort in BYOK anthropicProvider * refactor: Enable effort picker only for Claude and GPT models in configuration schema
1 parent 4c7aeaf commit be24308

18 files changed

Lines changed: 140 additions & 88 deletions

File tree

package.json

Lines changed: 0 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -3224,19 +3224,6 @@
32243224
"onExp"
32253225
]
32263226
},
3227-
"github.copilot.chat.anthropic.thinking.effort": {
3228-
"type": "string",
3229-
"markdownDescription": "%github.copilot.config.anthropic.thinking.effort%",
3230-
"enum": [
3231-
"low",
3232-
"medium",
3233-
"high"
3234-
],
3235-
"default": "high",
3236-
"tags": [
3237-
"preview"
3238-
]
3239-
},
32403227
"github.copilot.chat.anthropic.thinking.forceExtendedThinking": {
32413228
"type": "boolean",
32423229
"markdownDescription": "%github.copilot.config.anthropic.thinking.forceExtendedThinking%",
@@ -3790,22 +3777,6 @@
37903777
"split"
37913778
]
37923779
},
3793-
"github.copilot.chat.responsesApiReasoningEffort": {
3794-
"type": "string",
3795-
"default": "default",
3796-
"markdownDescription": "%github.copilot.config.responsesApiReasoningEffort%",
3797-
"tags": [
3798-
"experimental",
3799-
"onExp"
3800-
],
3801-
"enum": [
3802-
"low",
3803-
"medium",
3804-
"high",
3805-
"xhigh",
3806-
"default"
3807-
]
3808-
},
38093780
"github.copilot.chat.responsesApiReasoningSummary": {
38103781
"type": "string",
38113782
"default": "detailed",

package.nls.json

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -343,12 +343,10 @@
343343
"github.copilot.config.anthropic.toolSearchTool.enabled": "Enable tool search tool for Anthropic models. When enabled, tools are dynamically discovered and loaded on-demand using natural language search, reducing context window usage when many tools are available.",
344344
"github.copilot.config.anthropic.toolSearchTool.mode": "Controls how tool search works for Anthropic models. 'server' uses Anthropic's built-in regex-based tool search. 'client' uses local embeddings-based semantic search for more accurate tool discovery.",
345345
"github.copilot.config.useResponsesApi": "Use the Responses API instead of the Chat Completions API when supported. Enables reasoning and reasoning summaries.\n\n**Note**: This is an experimental feature that is not yet activated for all users.\n\n**Important**: URL API path resolution for custom OpenAI-compatible and Azure models is independent of this setting and fully determined by `url` property of `#github.copilot.chat.customOAIModels#` or `#github.copilot.chat.azureModels#` respectively.",
346-
"github.copilot.config.responsesApiReasoningEffort": "Sets the reasoning effort used for the Responses API. Requires `#github.copilot.chat.useResponsesApi#`.",
347346
"github.copilot.config.responsesApiReasoningSummary": "Sets the reasoning summary style used for the Responses API. Requires `#github.copilot.chat.useResponsesApi#`.",
348347
"github.copilot.config.responsesApiContextManagement.enabled": "Enables context management for the Responses API. Requires `#github.copilot.chat.useResponsesApi#`.",
349348
"github.copilot.config.updated53CodexPrompt.enabled": "Enables the updated prompt for gpt-5.3-codex model.",
350349
"github.copilot.config.anthropic.thinking.budgetTokens": "Maximum number of tokens to allocate for extended thinking in Anthropic models. Setting this value enables extended thinking. Valid range is `1,024` to `max_tokens-1`.",
351-
"github.copilot.config.anthropic.thinking.effort": "Controls how much thinking Claude does for models that support adaptive thinking. `high` (default) provides deep reasoning, `medium` offers a balance of speed and quality, `low` minimizes thinking for simpler tasks.",
352350
"github.copilot.config.anthropic.thinking.forceExtendedThinking": "Force extended thinking for models that support adaptive thinking (e.g., Sonnet 4.6, Opus 4.6). When enabled, uses explicit token budgets instead of adaptive thinking.",
353351
"github.copilot.config.anthropic.promptCaching.extendedTtl": "Enable extended prompt cache TTL for Anthropic models.",
354352
"github.copilot.config.anthropic.tools.websearch.enabled": "Enable Anthropic's native web search tool for BYOK Claude models. When enabled, allows Claude to search the web for current information. \n\n**Note**: This is an experimental feature only available for BYOK Anthropic Claude models.",

src/extension/byok/node/test/openAIEndpoint.spec.ts

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -141,7 +141,6 @@ describe('OpenAIEndpoint - Reasoning Properties', () => {
141141

142142
describe('Responses API mode (useResponsesApi = true)', () => {
143143
it('should preserve reasoning object when thinking is supported', () => {
144-
accessor.get(IConfigurationService).setConfig(ConfigKey.ResponsesApiReasoningEffort, 'medium');
145144
accessor.get(IConfigurationService).setConfig(ConfigKey.ResponsesApiReasoningSummary, 'detailed');
146145
const endpoint = instaService.createInstance(OpenAIEndpoint,
147146
modelMetadata,
@@ -171,7 +170,6 @@ describe('OpenAIEndpoint - Reasoning Properties', () => {
171170
}
172171
};
173172

174-
accessor.get(IConfigurationService).setConfig(ConfigKey.ResponsesApiReasoningEffort, 'medium');
175173
accessor.get(IConfigurationService).setConfig(ConfigKey.ResponsesApiReasoningSummary, 'detailed');
176174
const endpoint = instaService.createInstance(OpenAIEndpoint,
177175
modelWithoutThinking,

src/extension/byok/vscode-node/anthropicProvider.ts

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -263,8 +263,9 @@ export class AnthropicLMProvider extends AbstractLanguageModelChatProvider {
263263
betas.push('advanced-tool-use-2025-11-20');
264264
}
265265

266-
const effort = supportsAdaptiveThinking
267-
? this._configurationService.getConfig(ConfigKey.AnthropicThinkingEffort)
266+
const rawEffort = options.modelConfiguration?.reasoningEffort;
267+
const effort = supportsAdaptiveThinking && typeof rawEffort === 'string'
268+
? rawEffort as 'low' | 'medium' | 'high'
268269
: undefined;
269270

270271
const params: Anthropic.Beta.Messages.MessageCreateParamsStreaming = {

src/extension/chatSessions/claude/node/claudeLanguageModelServer.ts

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -230,6 +230,7 @@ export class ClaudeLanguageModelServer extends Disposable {
230230
messages: messagesForLogging,
231231
finishedCb: async () => undefined,
232232
location: ChatLocation.MessagesProxy,
233+
enableThinking: true,
233234
userInitiatedRequest: isUserInitiatedMessage
234235
}, tokenSource.token);
235236

@@ -615,6 +616,10 @@ class ClaudeStreamingPassThroughEndpoint implements IChatEndpoint {
615616
return this.base.maxThinkingBudget;
616617
}
617618

619+
public get supportsReasoningEffort(): string[] | undefined {
620+
return this.base.supportsReasoningEffort;
621+
}
622+
618623
public get supportsToolCalls(): boolean {
619624
return this.base.supportsToolCalls;
620625
}

src/extension/conversation/vscode-node/languageModelAccess.ts

Lines changed: 57 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,50 @@ import { PromptRenderer } from '../../prompts/node/base/promptRenderer';
4444
import { isImageDataPart } from '../common/languageModelChatMessageHelpers';
4545
import { LanguageModelAccessPrompt } from './languageModelAccessPrompt';
4646

47+
/**
48+
* Builds a configurationSchema for the model picker based on the endpoint's supported capabilities.
49+
* Models that support reasoning_effort get a "Thinking Effort" dropdown in the model picker UI.
50+
*/
51+
function buildConfigurationSchema(endpoint: IChatEndpoint): { configurationSchema?: vscode.LanguageModelConfigurationSchema } {
52+
const effortLevels = endpoint.supportsReasoningEffort;
53+
if (!effortLevels || effortLevels.length === 0) {
54+
return {};
55+
}
56+
57+
// Only enable effort picker for Claude and GPT models
58+
const family = endpoint.family.toLowerCase();
59+
if (!family.startsWith('claude') && !family.startsWith('gpt-')) {
60+
return {};
61+
}
62+
63+
const preferred = family.startsWith('claude') ? 'high' : 'medium';
64+
const defaultEffort = effortLevels.includes(preferred) ? preferred : undefined;
65+
66+
return {
67+
configurationSchema: {
68+
properties: {
69+
reasoningEffort: {
70+
type: 'string',
71+
title: vscode.l10n.t('Thinking Effort'),
72+
enum: effortLevels,
73+
enumItemLabels: effortLevels.map(level => level.charAt(0).toUpperCase() + level.slice(1)),
74+
enumDescriptions: effortLevels.map(level => {
75+
switch (level) {
76+
case 'none': return vscode.l10n.t('No reasoning applied');
77+
case 'low': return vscode.l10n.t('Faster responses with less reasoning');
78+
case 'medium': return vscode.l10n.t('Balanced reasoning and speed');
79+
case 'high': return vscode.l10n.t('Maximum reasoning depth');
80+
default: return level;
81+
}
82+
}),
83+
default: defaultEffort,
84+
group: 'navigation',
85+
}
86+
}
87+
}
88+
};
89+
}
90+
4791
/**
4892
* Returns a description of the model's capabilities and intended use cases.
4993
* This is shown in the rich hover when selecting models.
@@ -291,7 +335,8 @@ export class LanguageModelAccess extends Disposable implements IExtensionContrib
291335
capabilities: {
292336
imageInput: endpoint instanceof AutoChatEndpoint ? true : endpoint.supportsVision,
293337
toolCalling: endpoint.supportsToolCalls,
294-
}
338+
},
339+
...buildConfigurationSchema(endpoint),
295340
};
296341

297342
models.push(model);
@@ -566,7 +611,17 @@ export class CopilotLanguageModelWrapper extends Disposable {
566611
// This links the wrapper's chat span back to the original invoke_agent trace.
567612
const parentTraceContext = (_options as { modelOptions?: OTelModelOptions }).modelOptions?._otelTraceContext ?? undefined;
568613

569-
const makeRequest = () => endpoint.makeChatRequest('copilotLanguageModelWrapper', messages, callback, token, ChatLocation.Other, { extensionId }, options, !!extensionId, telemetryProperties);
614+
const makeRequest = () => endpoint.makeChatRequest2({
615+
debugName: 'copilotLanguageModelWrapper',
616+
messages,
617+
finishedCb: callback,
618+
location: ChatLocation.Other,
619+
source: { extensionId },
620+
requestOptions: options,
621+
userInitiatedRequest: !!extensionId,
622+
telemetryProperties,
623+
reasoningEffort: typeof _options.modelConfiguration?.reasoningEffort === 'string' ? _options.modelConfiguration.reasoningEffort : undefined,
624+
}, token);
570625

571626
// Run request within the parent OTel context (no extra span) so chat spans in chatMLFetcher inherit the agent trace
572627
const wrappedRequest = parentTraceContext

src/extension/externalAgents/node/oaiLanguageModelServer.ts

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -205,6 +205,7 @@ export class OpenAILanguageModelServer extends Disposable {
205205
messages: messagesForLogging,
206206
finishedCb: async () => undefined,
207207
location: ChatLocation.ResponsesProxy,
208+
enableThinking: true,
208209
userInitiatedRequest: isUserInitiatedMessage
209210
}, tokenSource.token);
210211

@@ -420,6 +421,10 @@ class StreamingPassThroughEndpoint implements IChatEndpoint {
420421
return this.base.maxThinkingBudget;
421422
}
422423

424+
public get supportsReasoningEffort(): string[] | undefined {
425+
return this.base.supportsReasoningEffort;
426+
}
427+
423428
public get supportsToolCalls(): boolean {
424429
return this.base.supportsToolCalls;
425430
}

src/extension/intents/node/toolCallingLoop.ts

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,7 @@ export interface IToolCallingBuiltPromptEvent {
100100
tools: LanguageModelToolInformation[];
101101
}
102102

103-
export type ToolCallingLoopFetchOptions = Required<Pick<IMakeChatRequestOptions, 'messages' | 'finishedCb' | 'requestOptions' | 'userInitiatedRequest' | 'turnId'>> & Pick<IMakeChatRequestOptions, 'disableThinking'>;
103+
export type ToolCallingLoopFetchOptions = Required<Pick<IMakeChatRequestOptions, 'messages' | 'finishedCb' | 'requestOptions' | 'userInitiatedRequest' | 'turnId'>> & Pick<IMakeChatRequestOptions, 'enableThinking' | 'reasoningEffort'>;
104104

105105
interface StartHookResult {
106106
/**
@@ -1139,7 +1139,10 @@ export abstract class ToolCallingLoop<TOptions extends IToolCallingLoopOptions =
11391139
let statefulMarker: string | undefined;
11401140
const toolCalls: IToolCall[] = [];
11411141
let thinkingItem: ThinkingDataItem | undefined;
1142-
const disableThinking = isContinuation && isAnthropicFamily(endpoint) && !ToolCallingLoop.messagesContainThinking(effectiveBuildPromptResult.messages);
1142+
const rawEffort = this.options.request.modelConfiguration?.reasoningEffort;
1143+
const reasoningEffort = typeof rawEffort === 'string' ? rawEffort : undefined;
1144+
const shouldDisableThinking = isContinuation && isAnthropicFamily(endpoint) && !ToolCallingLoop.messagesContainThinking(effectiveBuildPromptResult.messages);
1145+
const enableThinking = !shouldDisableThinking;
11431146
let phase: string | undefined;
11441147
let compaction: OpenAIContextManagementResponse | undefined;
11451148
const fetchResult = await this.fetch({
@@ -1187,7 +1190,8 @@ export abstract class ToolCallingLoop<TOptions extends IToolCallingLoopOptions =
11871190
})),
11881191
},
11891192
userInitiatedRequest: (iterationNumber === 0 && !isContinuation && !this.options.request.subAgentInvocationId) || this.stopHookUserInitiated,
1190-
disableThinking,
1193+
enableThinking,
1194+
reasoningEffort,
11911195
}, token).finally(() => {
11921196
this.stopHookUserInitiated = false;
11931197
});

src/extension/prompt/node/defaultIntentRequestHandler.ts

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -694,16 +694,19 @@ class DefaultToolCallingLoop extends ToolCallingLoop<IDefaultToolLoopOptions> {
694694
const debugName = this.options.request.subAgentInvocationId ?
695695
`tool/runSubagent${this.options.request.subAgentName ? `-${this.options.request.subAgentName}` : ''}` :
696696
`${ChatLocation.toStringShorter(this.options.location)}/${this.options.intent?.id}`;
697+
const location = this.options.overrideRequestLocation ?? this.options.location;
698+
const isThinkingLocation = location === ChatLocation.Agent || location === ChatLocation.MessagesProxy;
697699
return this.options.invocation.endpoint.makeChatRequest2({
698700
...opts,
701+
enableThinking: isThinkingLocation && opts.enableThinking,
699702
debugName,
700703
conversationId: this.options.conversation.sessionId,
701704
turnId: opts.turnId,
702705
finishedCb: (text, index, delta) => {
703706
this.telemetry.markReceivedToken();
704707
return opts.finishedCb!(text, index, delta);
705708
},
706-
location: this.options.overrideRequestLocation ?? this.options.location,
709+
location,
707710
requestOptions: {
708711
...opts.requestOptions,
709712
tools: normalizeToolSchema(

src/extension/prompt/node/executionSubagentToolCallingLoop.ts

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -121,13 +121,15 @@ export class ExecutionSubagentToolCallingLoop extends ToolCallingLoop<IExecution
121121
return allTools.filter(tool => allowedExecutionTools.has(tool.name as ToolName));
122122
}
123123

124-
protected async fetch({ messages, finishedCb, requestOptions }: ToolCallingLoopFetchOptions, token: CancellationToken): Promise<ChatResponse> {
124+
protected async fetch({ messages, finishedCb, requestOptions, enableThinking, reasoningEffort }: ToolCallingLoopFetchOptions, token: CancellationToken): Promise<ChatResponse> {
125125
const endpoint = await this.getEndpoint();
126126
return endpoint.makeChatRequest2({
127127
debugName: ExecutionSubagentToolCallingLoop.ID,
128128
messages,
129129
finishedCb,
130130
location: this.options.location,
131+
enableThinking,
132+
reasoningEffort,
131133
requestOptions: {
132134
...(requestOptions ?? {}),
133135
temperature: 0

0 commit comments

Comments
 (0)