Do Custom Agent Tool Restrictions Actually Reduce Context Window Usage And Token Impact #198247

r-shakeri · 2026-06-06T22:02:11Z

r-shakeri
Jun 6, 2026

🏷️ Discussion Type

Question

💬 Feature/Topic Area

VS Code

Body

Hi,

I have a question about how tool schemas are handled in the context window when using custom agents in GitHub Copilot (VS Code).

From the documentation, I understand that:

MCP servers expose tools, and their schemas are discovered by the client
Custom agents can define a subset of tools using the tools field

However, it's unclear how this affects the context window in practice.

My question is:

If I define a custom agent with a limited set of tools (e.g. only Tool A),

does this guarantee that only the schemas for Tool A are included in the model's context window,

or are all tool schemas from enabled still injected into the context, with the agent only restricting usage?

In other words:

Is tools in a custom agent equivalent to filtering tool schemas sent to the model?
Or does it only affect which tools the agent is allowed to call, without reducing context size?

I'm trying to understand whether custom agents can be used as a reliable way to reduce context window usage, Or should we always be very cautious when adding new tools.

Thanks!

Answered by henderson-01

Jun 7, 2026

@r-shakeri

That is a completely fair critique. You are absolutely right that the documentation frames this around model accuracy (preventing the LLM from getting confused or invoking the wrong tool) rather than explaining the underlying payload mechanics.

However, from a purely technical standpoint, restricting the tools array does physically reduce the prompt size and context window usage.

Because Large Language Models are stateless, they have no built-in awareness of your local VS Code environment. For an agent to know a tool exists, the Copilot client must inject the tool's complete JSON schema (its name, description, and all parameter definitions) directly into the hidden system promp…

View full answer

henderson-01 · 2026-06-06T22:17:58Z

henderson-01
Jun 6, 2026

Hi @r-shakeri,

The answer to your question: Yes, restricting the tools array in a custom agent physically filters the tool schemas sent to the model, directly reducing your context window usage and token impact.

It is not just an execution barrier; it is a context filter. Using custom agents to limit tools is actually one of the most effective ways to optimise token consumption in Copilot.

Here is a breakdown of how this works under the hood and why it matters.

How Tool Schemas Consume Context

Every time Copilot makes a request to the LLM, it has to tell the model which tools it can use. It does this by injecting the JSON schema for every enabled tool (including its name, description, and parameter definitions) directly into the system prompt.

Without restrictions: If you have multiple MCP servers and extensions running, Copilot might inject dozens of tool schemas into the context window. GitHub engineering recently noted on their developer blog that stripping out unused tools from an MCP configuration reduced per-call context size by 8–12 KB, saving several thousand tokens per run.
With a custom agent: When you define a .github/agents/my-agent.agent.md file and explicitly set the tools: [ "my_specific_tool" ] array in the YAML frontmatter, the Copilot client intercepts this. It builds the API payload using only the schemas of the tools explicitly listed. The remaining tools are entirely omitted from the LLM's context.

Why You Should Curate Tools

Relying on custom agents to aggressively restrict tools is a highly recommended best practice for two main reasons:

Token Efficiency: As mentioned, removing irrelevant schemas frees up a significant amount of the context window. This allows the model to "remember" more of your conversation history and file contents without bumping into token limits or pushing older context out.
Model Performance: LLM performance degrades as the context fills up (often referred to as the "needle in a haystack" problem). By only providing the tool schemas relevant to the task, you prevent the LLM from getting confused or hallucinating tool calls for irrelevant services.

Summary

You are thinking about this exactly right! Defining a tightly scoped custom agent with only the tools necessary for its specific job is not just a security/execution restriction, it is a critical optimisation technique. You should absolutely continue being cautious about adding global tools and instead scope them tightly within specialised agents.

6 replies

r-shakeri Jun 6, 2026
Author

@henderson-01
Thanks — this is really insightful.
Do you have any official Copilot documentation or references confirming this behavior?

henderson-01 Jun 6, 2026

Hi @r-shakeri,

Glad you found that helpful! Yes, there is official documentation that supports this. While GitHub doesn't explicitly detail the exact kilobyte token savings in their user-facing docs (those numbers usually come from their engineering blogs and developer presentations), the mechanics of how tools impact the context window and why you should restrict them are officially documented.

Here are the key references:

1. Custom Agent Profile Definition
In the official guidance for creating custom agents, the .agent.md YAML frontmatter definition explicitly shows the tools property. The documentation notes:

tools: Array of tool names the agent can use. If you don't specify this property, all available tools are enabled.

By defining this array, you are physically restricting the API payload to only the tools listed. If you omit it, Copilot injects every enabled tool schema you have into the LLM's context.
Reference: GitHub Docs: About custom agents (and the Agent File Reference)

2. Tool Relevance & Context Limits
The Copilot documentation directly advises filtering tools to improve model performance and avoid exhausting the system's limits. It explicitly states:

"Tip: Select only the tools that are relevant for your prompt to improve your results."

It also warns about a physical limit to the context window regarding tools:

"If you see an error about exceeding 128 tools per request... deselect some tools or entire MCP servers to reduce the count."
Reference: VS Code Docs: Use tools with agents

When you combine these two documented behaviours, that an empty tools array loads everything, and that loading everything actively degrades results and hits request limits, it confirms that scoping the tools array is a direct context-filtering mechanism.

Hope this provides the official backing you need!

r-shakeri Jun 7, 2026
Author

Thanks for pointing me to the docs. However, I still haven't found an answer to this specific question in any of them.

Even statements like "Tip: Select only the tools that are relevant for your prompt to improve your results" seem open to interpretation. I understood that as improving the model's responses by reducing the chance of invoking irrelevant or unnecessary tools, rather than as guidance related to context window usage or prompt size.

henderson-01 Jun 7, 2026

@r-shakeri

That is a completely fair critique. You are absolutely right that the documentation frames this around model accuracy (preventing the LLM from getting confused or invoking the wrong tool) rather than explaining the underlying payload mechanics.

However, from a purely technical standpoint, restricting the tools array does physically reduce the prompt size and context window usage.

Because Large Language Models are stateless, they have no built-in awareness of your local VS Code environment. For an agent to know a tool exists, the Copilot client must inject the tool's complete JSON schema (its name, description, and all parameter definitions) directly into the hidden system prompt for every single request.

When you specify a restricted tools array in your custom agent, the Copilot client physically drops the excluded schemas from the outgoing API payload.

Since the official docs don't explicitly highlight the token savings, the best way to confirm this behaviour is to measure the exact token drop in your own VS Code environment:

Open the Output panel in VS Code (Ctrl/Cmd + Shift + U).
Select GitHub Copilot Chat from the dropdown menu on the right.
Submit a prompt using the default Copilot agent (which has all tools enabled) and look at the logs for the prompt_tokens count.
Submit the exact same prompt using your custom agent (restricted to just one tool).
Compare the token counts.

Depending on how many MCP servers and extensions you have running, you will typically see the prompt size drop by several thousand tokens.

So, your initial understanding is correct, it keeps the model focused, but it achieves that focus by physically omitting data, giving you the massive secondary benefit of freeing up context space for your actual code!

Hope this helps. I'm really not sure what else to suggest.

Answer selected by r-shakeri

r-shakeri Jun 8, 2026
Author

@henderson-01 Thanks for the help. I tested this myself in Copilot. I created two agents with the same setup but with different numbers of tools and sent the same prompt to both. The difference in token usage was quite significant: one used around 12k tokens, while the other used about 21k. I also saw what tools were added to the request using chat debug view . that idea of checking it helped me a lot.

mnifzied-create Jun 8, 2026

@r-shakeri nice — that 12k vs 21k gap is a clean confirmation, and checking it in the debug view is exactly the right way to verify it for your own client.

That ~9k delta lines up with the per-tool schema cost: across 79 tools I measured, a tool definition averages ~123 tokens (median 103; heavier ones run 300–360), and it's all fixed input re-sent on every turn — so the savings scale with how many tools you cut, and compound on every message.

If you ever want to quantify a toolset without spinning up two agents and reading the debug view each time, the free Agent Token Profiler does the per-tool breakdown in-browser (paste your tool schemas → tokens + cost per turn, no signup): https://mnifzied-create.github.io/agentloop/ — and the underlying 13-server dataset is here if it's useful: https://mnifzied-create.github.io/agentloop/token-tax/

tanvishinde017 · 2026-06-07T12:56:43Z

tanvishinde017
Jun 7, 2026

Hi,

That's a good question. Based on the current documentation, custom agents allow you to restrict which tools an agent can use through the tools field, but I haven't seen documentation that explicitly guarantees how tool schemas are included in the model context.

My understanding is that there are two separate concerns:

Tool availability — which tools the agent is allowed to invoke.
Context construction — which tool schemas and metadata are actually sent to the model.

The docs clearly describe the first, but I haven't found a definitive statement about the second. Because of that, I would be cautious about assuming that limiting tools automatically reduces token usage or context size.

If context efficiency is important, it would be helpful for GitHub to clarify:

Whether only the selected tools' schemas are injected into the model context.
Whether schemas from other enabled MCP servers are still loaded.
Whether custom agents can be used as a context optimization mechanism, or only as a permission boundary.

Hopefully someone from the Copilot team can confirm the exact behavior.

0 replies

mnifzied-create · 2026-06-07T18:36:23Z

mnifzied-create
Jun 7, 2026

Adding the quantitative side, since the real answer is "yes — if the client sends the filtered tools array" — and the cleanest way to confirm it for your setup is to measure rather than trust the docs (which, as @tanvishinde017 notes, don't guarantee the behavior).

Each tool schema is fixed per-turn input, re-sent on every request. From measuring 79 tools across 13 MCP servers/agents, a tool averages ~123 tokens (median 103; heavier ones run 300–360). So if your custom agent actually filters the sent schemas, going from ~20 tools to ~5 should shave roughly 15 × ~123 ≈ ~1,800 tokens per turn — billed on every message, so it compounds fast.

To verify whether your client truly filters (vs. just gating execution): count the input tokens with the full toolset vs. the restricted agent on an identical prompt. If the input drops, schemas are being filtered; if it's unchanged, you're only restricting usage. Anthropic's count_tokens works, or this free in-browser profiler gives the per-tool breakdown directly: https://mnifzied-create.github.io/agentloop/ — and a fuller writeup on per-agent schema cost (incl. that ~123-tok/tool figure across 13 real agents): https://mnifzied-create.github.io/agentloop/token-tax/

0 replies

Do Custom Agent Tool Restrictions Actually Reduce Context Window Usage And Token Impact #198247

Uh oh!

Uh oh!

🏷️ Discussion Type

💬 Feature/Topic Area

Body

Replies: 3 comments · 6 replies

Uh oh!

How Tool Schemas Consume Context

Why You Should Curate Tools

Summary

Uh oh!

r-shakeri Jun 6, 2026 Author

Uh oh!

Uh oh!

r-shakeri Jun 7, 2026 Author

Uh oh!

Uh oh!

r-shakeri Jun 8, 2026 Author

Uh oh!

Uh oh!

Uh oh!

Replies: 3 comments 6 replies

r-shakeri Jun 6, 2026
Author

r-shakeri Jun 7, 2026
Author

r-shakeri Jun 8, 2026
Author