Skip to content

Latest commit

 

History

History
280 lines (227 loc) · 7.74 KB

File metadata and controls

280 lines (227 loc) · 7.74 KB

Implementation Tasks for Vendor Surface Extraction

Based on research findings in RESEARCH_VENDOR_SURFACE.md.

Status: Ready for implementation
Estimated Total Effort: 1 week
Dependencies: Issue #35 (vendor detection)


Task 1: Foundation - Package Detection & Entry Point Resolution

Effort: ~2 days (junior developer)
File: src/vendor_surface.rs (new module)
PR Title: feat: add vendor package detection and entry point resolution (#37)

Scope

  • Create VendorSurfaceConfig struct
  • Implement get_package_name(path: &Path) -> Option<String>
  • Implement get_package_entry_point(package_name: &str, repo_root: &Path) -> Result<PathBuf>
  • Parse package.json for main, module, exports fields
  • Handle scoped packages (@types/react)
  • Handle conditional exports (import/require)

Tests

#[test]
fn test_extract_package_name() {
    assert_eq!(
        get_package_name("node_modules/react/index.js"),
        Some("react".to_string())
    );
    assert_eq!(
        get_package_name("node_modules/@types/react/index.d.ts"),
        Some("@types/react".to_string())
    );
}

#[test]
fn test_resolve_package_entry() {
    // Mock package.json with "main": "dist/index.js"
    // Assert resolves to correct path
}

Definition of Done

  • All tests passing
  • Handles common package.json formats
  • Logs warnings for unresolvable entries
  • No changes to existing indexing logic (foundation only)

Task 2: Export-Only Extraction

Effort: ~3 days (medior developer)
Files:

  • src/detectors/javascript/exports.rs (new)
  • src/commands/index.rs (modified)
  • src/schema.rs (add is_vendor flag)

PR Title: feat: implement lightweight export-only extraction for vendor code (#37)

Scope

  • Implement extract_exports_lightweight(source: &str) -> Vec<SymbolInfo>
    • Parse only export statements (skip function bodies)
    • Extract: name, arguments, return_type from TypeScript annotations
    • Mark with is_vendor: true
  • Modify src/commands/index.rs:
    • Detect vendor files via vendor_detection::classify_file()
    • If --vendor-surface flag set and file is vendor entry point:
      • Call extract_exports_lightweight() instead of full analysis
    • Store in vendor_symbols/<package>/exports.toon
  • Add --vendor-surface CLI flag

Performance Target

  • 10x faster than full analysis for vendor files
  • Measure: time to index node_modules/react/index.js (exports only vs full)

Tests

#[test]
fn test_extract_exports_only() {
    let source = r#"
        export function useState<T>(initial: T): [T, (value: T) => void] {}
        export const useEffect = (fn, deps) => {};
        function internal() {} // Should NOT be extracted
    "#;
    let exports = extract_exports_lightweight(source);
    assert_eq!(exports.len(), 2);
    assert!(exports[0].is_vendor);
}

Definition of Done

  • All tests passing
  • 10x speed improvement vs full analysis
  • Only extracts exports, skips internals
  • CLI flag --vendor-surface works
  • Documentation in CLI help

Task 3: Type Definition Support (.d.ts)

Effort: ~2-3 days (medior developer)
Files:

  • src/detectors/javascript/types.rs (new)
  • src/schema.rs (add TypeInfo struct)

PR Title: feat: add TypeScript type definition extraction for vendor surface (#37)

Scope

  • Create TypeInfo struct for type definitions
  • Implement .d.ts parsing:
    • interface Foo { ... }
    • type Bar = ...
    • declare module "..." { ... }
    • declare class ...
  • Store in vendor_symbols/<package>/types.toon (lazy-loaded)
  • Add --vendor-surface-mode=types CLI flag

Example

// Input: node_modules/@types/react/index.d.ts
interface FunctionComponent<P = {}> {
    (props: P): ReactElement | null;
}

// Output: TypeInfo
TypeInfo {
    name: "FunctionComponent",
    kind: TypeKind::Interface,
    type_params: ["P"],
    properties: [...],
}

Tests

#[test]
fn test_extract_interface() {
    let source = r#"
        interface User {
            name: string;
            age: number;
        }
    "#;
    let types = extract_type_definitions(source);
    assert_eq!(types[0].name, "User");
    assert_eq!(types[0].properties.len(), 2);
}

Definition of Done

  • All tests passing
  • Parses common TypeScript type constructs
  • Lazy-loads on-demand (not in main index)
  • CLI flag --vendor-surface-mode=types works
  • Memory usage under 5 MB for 200 packages

Task 4: Configuration, Limits & Statistics

Effort: ~1 day (junior developer)
Files:

  • src/cli.rs (add flags)
  • src/vendor_surface.rs (add config)
  • src/commands/index.rs (add statistics)

PR Title: feat: add vendor surface configuration and indexing statistics (#37)

Scope

  • Add CLI flags:
    • --vendor-surface-limit=<N> (default: 500)
    • --vendor-surface-whitelist=<packages>
    • --vendor-surface-blacklist=<packages>
    • --vendor-surface-depth=<N> (default: 1)
  • Implement limits:
    • Extract top N most-imported packages
    • Whitelist overrides limit
    • Blacklist excludes even if in top N
  • Add index statistics:
    Index Statistics:
      First-party: 1,500 symbols (300 files)
      Vendor surface: 2,300 exports (42 packages)
      Excluded: 145 packages (not in top 500)
      Memory overhead: 2.1 MB (+3.5%)
    

Tests

#[test]
fn test_whitelist_override() {
    let config = VendorSurfaceConfig {
        max_packages: 10,
        whitelist: vec!["react".to_string()],
        ..Default::default()
    };
    // "react" should be extracted even if not in top 10
}

Definition of Done

  • All CLI flags work
  • Whitelist/blacklist respected
  • Statistics printed after indexing
  • Documentation updated

Future Enhancements (Not in MVP)

Python Support

  • Detect __init__.py entry points
  • Parse setup.py / pyproject.toml
  • Extract type stubs (.pyi files)

JSDoc Parsing

  • Extract @param, @returns, @type from comments
  • Use for JavaScript without TypeScript

Rust/Go/Java Entry Points

  • Cargo.toml / lib.rs (Rust)
  • go.mod / module path (Go)
  • Maven/Gradle POM parsing (Java)

Testing Strategy

Unit Tests

  • Each task has dedicated unit tests (see above)

Integration Tests

  • End-to-end: Index a real React project with --vendor-surface
  • Verify: Call graph contains ext:react:useState with signature
  • Verify: Memory usage under 5% overhead

Performance Tests

  • Benchmark: Index 200-package project with/without vendor surface
  • Target: <10% slowdown, <5% disk overhead

Regression Tests

  • Ensure --first-party-only still works (excludes vendor entirely)
  • Ensure no --vendor-surface flag = old behavior (no vendor surface)

Rollout Plan

  1. Phase 1 (Task 1): Foundation only, no indexing changes
  2. Phase 2 (Task 2): Export extraction behind --vendor-surface flag (opt-in)
  3. Phase 3 (Task 3): Type definitions behind --vendor-surface-mode=types
  4. Phase 4 (Task 4): Configuration & limits
  5. Beta: Test with semfora-engine itself, OpenClaw, other Rust/TS projects
  6. GA: Enable by default for MCP server, keep opt-in for CLI

Risk Mitigation

Risk Mitigation Owner
Entry point resolution fails Log warnings, fall back to index.js Task 1
Memory explosion Enforce max_packages limit Task 4
Vendor symbol pollution Default exclude in search Task 2
Performance regression Benchmark before merge All tasks

Success Metrics

  • Overhead <5% for 200-dep project ✅ (estimated 3.5%)
  • No vendor→vendor call graph edges
  • Entry point detection works for 95%+ of npm packages
  • Call graph shows first-party→vendor edges with signatures
  • Documentation clear and complete