Skip to content

feat: database detection + AKS workload identity support#630

Open
gambtho wants to merge 8 commits intomainfrom
thgamble/dbmigration
Open

feat: database detection + AKS workload identity support#630
gambtho wants to merge 8 commits intomainfrom
thgamble/dbmigration

Conversation

@gambtho
Copy link
Copy Markdown
Contributor

@gambtho gambtho commented Mar 12, 2026

Summary

  • Database detector (analyze-repo): new detectDatabases() function identifies databases from dependency lists across Node, Python, Java, .NET, Go, and Rust ecosystems. Results surface as detectedDatabases on each module in the analysis output and in the summary line.
  • Gradle parser fix: expanded dependency regex to capture runtimeOnly, api, and compileOnly scopes (previously only implementation), so DB drivers like org.postgresql:postgresql declared as runtimeOnly are no longer missed.
  • K8s manifest generation: when database dependencies are detected, automatically includes serviceaccount.yaml and appends workload identity guidance (annotations, pod labels, passwordless auth) to the manifest plan.
  • AKS loop prompt: inserts a new "Check for database dependencies" step after repo analysis, and passes DB types as detectedDependencies to manifest generation.
  • Knowledge packs: 3 new Kubernetes entries (workload identity SA, pod label, passwordless DB auth) and 1 new database entry (Azure managed DB migration pattern with ConfigMap + workload identity).

Test plan

  • 27 unit tests for detectDatabases covering all ecosystems, groupId:artifactId (Java), Go module paths, dedup, and case-insensitivity
  • Existing analyze-repo tests pass (41 total)
  • Manual validation against tmp/spring-petclinic — detects MySQL + PostgreSQL from Gradle runtimeOnly deps
  • Run npm run validate (pre-existing embedded-packs build issue unrelated to this PR)
  • End-to-end test via MCP inspector: analyze-repogenerate-k8s-manifests with detected DB types

Copilot AI review requested due to automatic review settings March 12, 2026 21:32
@gambtho gambtho changed the title feat: enhance db detection and associated manifests feat: database detection + AKS workload identity support Mar 12, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds database dependency detection to analyze-repo and threads that signal into AKS/Kubernetes manifest generation guidance (including workload identity ServiceAccount hints), plus supporting prompt steps and knowledge-pack recommendations.

Changes:

  • Introduces a detectDatabases() mapper and exposes results as modules[].detectedDatabases in analyze-repo output (and summary text).
  • Updates k8s manifest planning to optionally include serviceaccount.yaml and workload identity instructions when DB types are detected.
  • Expands Gradle dependency extraction and adds knowledge-pack entries + new unit tests for DB detection.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
test/unit/tools/database-detector.test.ts Adds unit coverage for the new DB detection function.
src/tools/generate-k8s-manifests/tool.ts Conditionally adds ServiceAccount manifest + workload identity instruction based on detected DB types.
src/tools/analyze-repo/tool.ts Calls DB detector and adds detected DB types to the user-facing analysis summary.
src/tools/analyze-repo/schema.ts Extends module schema with optional detectedDatabases output.
src/tools/analyze-repo/parsers.ts Broadens Gradle dependency keyword matching.
src/tools/analyze-repo/database-detector.ts New dependency-to-database detection/normalization logic (core of feature).
src/prompts/shared/steps.ts Adds a prompt step instructing the workflow to check detected DBs.
src/prompts/aks-loop/prompt.ts Updates AKS loop prompt to pass detected DB info into manifest generation.
knowledge/packs/kubernetes-pack.json Adds AKS workload identity ServiceAccount + pod-label recommendations.
knowledge/packs/database-pack.json Adds Azure-managed DB migration/config guidance for k8s + workload identity.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

@gambtho gambtho force-pushed the thgamble/dbmigration branch from 4247bfe to a0ecd1c Compare March 12, 2026 21:45
Copilot AI review requested due to automatic review settings March 13, 2026 14:33
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Copilot AI review requested due to automatic review settings March 13, 2026 15:01
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

gambtho and others added 4 commits March 25, 2026 15:04
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 25, 2026 19:33
@gambtho gambtho force-pushed the thgamble/dbmigration branch from 3124d49 to 54b63ee Compare March 25, 2026 19:33
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 7 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

… quality

- Return { vars, warning? } from detectEnvVarsFromDockerCompose instead
  of silently swallowing YAML parse errors
- Add debug logging for undefined vs empty detectedDatabases in
  generate-k8s-manifests to aid orchestration debugging
- Extract partitionEnvVarNames helper to deduplicate env var
  classification across generate-dockerfile and generate-k8s-manifests
- Pre-compile regex patterns in env-detector for better performance
@gambtho gambtho force-pushed the thgamble/dbmigration branch from 6c48f57 to 49864b8 Compare March 25, 2026 19:43
@gambtho gambtho enabled auto-merge (squash) March 26, 2026 01:08
' 1. Confirm the classifications (secret, database, config) are correct.',
' 2. For secret-classified vars, confirm they will be injected at runtime (not baked into the image).',
' 3. For config-classified vars, confirm default values or ask for correct values.',
'- Pass the confirmed `detectedEnvVars` to downstream tools (generate-dockerfile, generate-k8s-manifests).',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use TOOL_NAME var properties to ref the tools here so they don't drift
ex: TOOL_NAME.VERIFY_DEPLOY

case 'javascript':
case 'typescript':
add('NODE_ENV', 'production');
add('PORT', '3000');
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can probably move this out into a separate detection task within analyze repo next

Copilot AI review requested due to automatic review settings April 9, 2026 19:53
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +63 to +77
/**
* Partition env vars into config names and secret/database names in a single pass.
*/
export function partitionEnvVarNames(vars: DetectedEnvVar[]): { configNames: string[]; secretNames: string[] } {
const configNames: string[] = [];
const secretNames: string[] = [];
for (const v of vars) {
if (v.classification === 'config') {
configNames.push(v.name);
} else {
secretNames.push(v.name);
}
}
return { configNames, secretNames };
}
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partitionEnvVarNames currently groups both 'secret' and 'database' classifications into secretNames. This causes DB_* vars like DB_HOST/DB_PORT to be treated as secrets downstream (Secret generation + 'do not bake' warnings), which conflicts with the separate 'database' classification and common practice of putting non-secret DB connection details into ConfigMaps. Consider returning separate arrays (config/database/secret) or at least grouping 'database' with configNames for downstream guidance.

Copilot uses AI. Check for mistakes.
Comment on lines +190 to +212
export function detectEnvVarsFromSpringConfig(content: string): DetectedEnvVar[] {
const vars: DetectedEnvVar[] = [];
const source = 'application.properties';
const seen = new Set<string>();

// Match ${VAR_NAME} or ${VAR_NAME:defaultValue}
const pattern = /\$\{([A-Za-z_][A-Za-z0-9_.]*?)(?::([^}]*))?\}/g;
let match: RegExpExecArray | null;

while ((match = pattern.exec(content)) !== null) {
const name = match[1]!;
if (seen.has(name)) continue;
seen.add(name);

const defaultValue = match[2];
vars.push({
name,
classification: classifyEnvVar(name),
source,
required: defaultValue === undefined,
...(defaultValue !== undefined && { defaultValue }),
});
}
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

detectEnvVarsFromSpringConfig hard-codes source='application.properties', so vars extracted from application.yml will be attributed to the wrong file. Also, the placeholder regex allows dots in the variable name ([A-Za-z0-9_.]), which will capture property placeholders like ${spring.profiles.active} as "env vars". Consider accepting a source param and restricting names to env-var-like tokens (e.g., [A-Za-z_][A-Za-z0-9_]), or filtering out keys with '.' to reduce false positives.

Copilot uses AI. Check for mistakes.
Comment on lines +485 to +488
envVarInstruction += ` Create ConfigMap with config vars: ${configNames.join(', ')}.`;
}
if (secretNames.length > 0) {
envVarInstruction += ` Reference Secrets for secret vars via envFrom: ${secretNames.join(', ')}.`;
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The instruction text says "Reference Secrets for secret vars via envFrom: <VAR1, VAR2>" but envFrom references Secret/ConfigMap resources, not individual environment variable names/keys. This is likely to generate invalid manifests. Rephrase to something like "Create a Secret with keys: ... and reference it via envFrom/secretKeyRef" (and similarly for ConfigMap) so the generated YAML is syntactically correct.

Suggested change
envVarInstruction += ` Create ConfigMap with config vars: ${configNames.join(', ')}.`;
}
if (secretNames.length > 0) {
envVarInstruction += ` Reference Secrets for secret vars via envFrom: ${secretNames.join(', ')}.`;
envVarInstruction += ` Create a ConfigMap with keys for config vars: ${configNames.join(', ')} and reference it from the workload via envFrom/configMapRef or individual env entries with configMapKeyRef as appropriate.`;
}
if (secretNames.length > 0) {
envVarInstruction += ` Create a Secret with keys for secret vars: ${secretNames.join(', ')} and reference it from the workload via envFrom/secretRef or individual env entries with secretKeyRef as appropriate.`;

Copilot uses AI. Check for mistakes.
Comment on lines +437 to +448
// Partition env vars once for manifest decisions and instruction building
const { configNames, secretNames } = partitionEnvVarNames(input.detectedEnvVars ?? []);

// Add configmap if there are ports or config-classified environment variables
if ((input.ports && input.ports.length > 0) || configNames.length > 0) {
manifestFiles.push({ path: './k8s/configmap.yaml', purpose: 'Configuration management' });
}

// Add secret if secret-classified environment variables are detected
if (secretNames.length > 0) {
manifestFiles.push({ path: './k8s/secret.yaml', purpose: 'Secret management' });
}
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because partitionEnvVarNames treats classification==='database' as a secret, secretNames will include DB_HOST/DB_PORT/etc. This will (1) create secret.yaml instead of configmap.yaml for non-secret DB settings, and (2) omit configmap.yaml entirely when only database vars exist and ports are empty. Once partitioning is fixed, consider adjusting these manifest file decisions so 'database' vars map to a ConfigMap (and only credentials/connection strings map to a Secret).

Copilot uses AI. Check for mistakes.
Comment on lines +12 to +19
import { detectDatabases } from './database-detector';
import {
detectEnvVarsFromEnvFile,
detectEnvVarsFromDockerCompose,
detectEnvVarsFromSpringConfig,
inferFrameworkEnvVars,
deduplicateEnvVars,
} from './env-detector';
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description focuses on database detection + workload identity, but this PR also introduces a full env var detection/inference pipeline (new analyze-repo output fields, prompt step, Dockerfile/K8s instruction changes). Please update the PR description/test plan to explicitly call out env var detection changes and any expected user-facing behavior (e.g., always-inferred vars) so reviewers/users know what’s being introduced.

Copilot uses AI. Check for mistakes.
- Merge main (1 commit behind, no conflicts)
- Use TOOL_NAME constants in envVarCheckStep (steps.ts)
- Separate database from secret in partitionEnvVarNames (3-way split)
- Fix detectEnvVarsFromSpringConfig: accept source param, reject
  dotted Spring property names from regex
- Fix envFrom instruction syntax in generate-k8s-manifests to use
  proper configMapRef/secretRef K8s syntax
- Add start anchors to unanchored regex patterns in database-detector
- Expand CONFIG_FILE_PATTERN to match .yaml extension variants
- Redact defaultValue for secret-classified env vars to prevent
  credential leakage in tool output
- Update generate-dockerfile to handle databaseNames as config
- Add TODO for future env-detector extraction
- Fix pre-existing ESLint warnings in env-detector.ts

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants