Skip to content
This repository was archived by the owner on May 20, 2026. It is now read-only.

Kevin m kent/v2 domain intent taxonomy#4071

Merged
digitarald merged 11 commits into
microsoft:mainfrom
kevin-m-kent:kevin-m-kent/v2-domain-intent-taxonomy
Mar 2, 2026
Merged

Kevin m kent/v2 domain intent taxonomy#4071
digitarald merged 11 commits into
microsoft:mainfrom
kevin-m-kent:kevin-m-kent/v2-domain-intent-taxonomy

Conversation

@kevin-m-kent

@kevin-m-kent kevin-m-kent commented Feb 27, 2026

Copy link
Copy Markdown
Contributor

PR that builds up domain and intent from prompt clustering and labeling in our 1p data. Also adds some prompting that improves the classifier's error rate (in particular swapping domain and intent). This approach also reduces the number of categories by about 50%.

FYI @digitarald , pending our discussion later.

Kevin Kent and others added 3 commits February 27, 2026 11:34
Replace the original domain (20 categories) and intent (38 categories) definitions
with v2 taxonomy derived from clustering analysis:
- 16 domain categories (cicd_cloud_infra, cli_scripting, automated_testing, etc.)
- 13 intent categories (explain, find_content, research, review, etc.)
- Updated classification guidance with domain vs intent independence framing
- Updated prompt generation to match new category format

Scope and time estimate dimensions are unchanged.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the prompt-categorization taxonomy and prompting used by the panel prompt classifier to a new “v2” domain/intent scheme derived from clustering, with additional guidance intended to reduce domain/intent swaps.

Changes:

  • Replaces the intent and domain category definitions with a reduced v2 taxonomy.
  • Updates taxonomy prompt formatting and adds explicit “domain vs intent” separation guidance (and changes ordering to domain-first).
  • Tweaks the system prompt wording for the categorization prompt.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/extension/prompts/node/panel/promptCategorization.tsx Updates the system prompt text used to instruct the classifier and includes the taxonomy prompt.
src/extension/prompt/common/promptCategorizationTaxonomy.ts Reworks domain/intent taxonomy definitions and regenerates the taxonomy prompt/guidance formatting.
Comments suppressed due to low confidence (1)

src/extension/prompt/common/promptCategorizationTaxonomy.ts:327

  • The guidance explicitly instructs using unknown for intent/domain, but it doesn't mention what to do when scope is unclear. Since the scope taxonomy uses unknown_scope (not unknown), the current guidance can increase invalid tool outputs (rejected by isValidScope). Consider adding an explicit rule like "If scope is unclear, use unknown_scope" to keep tool calls valid.
**Domain** is the technical subject area or problem space the user is operating in.
- It describes a system, architecture, technology area, or problem space — never an activity.
- Think of it as answering: "What area of technology is this about?"
- If the prompt does not clearly indicate a technical domain, use \`unknown\`.

**Intent** is the developer action or goal being performed within that domain.
- It describes what the user is trying to accomplish — the verb, not the noun.
- Think of it as answering: "What is the user trying to do?"
- If the prompt does not clearly indicate an intent, use \`unknown\`.

Comment thread src/extension/prompts/node/panel/promptCategorization.tsx
Comment thread src/extension/prompt/common/promptCategorizationTaxonomy.ts Outdated
Kevin Kent and others added 7 commits February 27, 2026 14:32
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…cking

Remove configuration files and documentation references from description and keywords.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…header

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@digitarald digitarald self-requested a review March 2, 2026 17:59
@digitarald digitarald enabled auto-merge March 2, 2026 18:04

@digitarald digitarald left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taxonomy changes look good — data-driven consolidation is a solid approach. Left a few non-blocking nits for follow-up (telemetry versioning, prompt boundary, dead code).


// ============================================================================
// INTENTS - What action the user wants
// ============================================================================

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (non-blocking, follow-up): The promptCategorization telemetry event in promptCategorizer.ts emits intent and domain as raw strings. Since v2 keys are completely different from v1 (e.g., code_fixingtroubleshoot_debug, frontendweb_ui), queries on this data will get mixed v1/v2 values with no way to distinguish them once this ships.

Suggestion: Add a taxonomyVersion: 'v2' property to the telemetry event and its GDPR annotation so dashboards can filter correctly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea. Let me update.

'You are an expert classifier for AI coding assistant prompts. Classify developer requests in context of their workspace and active file across domain, intent, time estimate, and scope.',
'You MUST use the categorize_prompt tool to provide your classification.',
generateTaxonomyPrompt(),
].join('\n\n');

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (non-blocking): systemPrompt is joined with \n\n but then rendered immediately before <SafetyRules /> with no separator, so the last line of the taxonomy can run together with the safety rules text.

Consider adding a trailing \n\n:

].join('\n\n') + '\n\n';

Or a <br /> before <SafetyRules /> in the JSX.

parts.push(`- Keywords: ${def.keywords.join(', ')}`);
}
if (def.signals?.length) {
parts.push(`Signals: ${def.signals.join(', ')}`);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (non-blocking): def.signals rendering is now dead code — no v2 definitions use signals anymore (all switched to keywords). Could be cleaned up in a follow-up.

@vs-code-engineering vs-code-engineering Bot added this to the March 2026 milestone Mar 2, 2026
@digitarald digitarald added this pull request to the merge queue Mar 2, 2026
Merged via the queue into microsoft:main with commit 177a295 Mar 2, 2026
9 checks passed
digitarald pushed a commit that referenced this pull request Mar 2, 2026
- Add taxonomyVersion field to telemetry events to distinguish v1/v2 data
- Fix prompt/SafetyRules boundary by adding trailing newlines
- Consistent bullet formatting for signals in scope definitions
github-merge-queue Bot pushed a commit that referenced this pull request Mar 2, 2026
- Add taxonomyVersion field to telemetry events to distinguish v1/v2 data
- Fix prompt/SafetyRules boundary by adding trailing newlines
- Consistent bullet formatting for signals in scope definitions

Co-authored-by: Harald Kirschner <digitarald@gmail.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants