Based on research findings in RESEARCH_VENDOR_SURFACE.md.
Status: Ready for implementation
Estimated Total Effort: 1 week
Dependencies: Issue #35 (vendor detection)
Effort: ~2 days (junior developer)
File: src/vendor_surface.rs (new module)
PR Title: feat: add vendor package detection and entry point resolution (#37)
- Create
VendorSurfaceConfigstruct - Implement
get_package_name(path: &Path) -> Option<String> - Implement
get_package_entry_point(package_name: &str, repo_root: &Path) -> Result<PathBuf> - Parse
package.jsonformain,module,exportsfields - Handle scoped packages (
@types/react) - Handle conditional exports (import/require)
#[test]
fn test_extract_package_name() {
assert_eq!(
get_package_name("node_modules/react/index.js"),
Some("react".to_string())
);
assert_eq!(
get_package_name("node_modules/@types/react/index.d.ts"),
Some("@types/react".to_string())
);
}
#[test]
fn test_resolve_package_entry() {
// Mock package.json with "main": "dist/index.js"
// Assert resolves to correct path
}- All tests passing
- Handles common package.json formats
- Logs warnings for unresolvable entries
- No changes to existing indexing logic (foundation only)
Effort: ~3 days (medior developer)
Files:
src/detectors/javascript/exports.rs(new)src/commands/index.rs(modified)src/schema.rs(addis_vendorflag)
PR Title: feat: implement lightweight export-only extraction for vendor code (#37)
- Implement
extract_exports_lightweight(source: &str) -> Vec<SymbolInfo>- Parse only
exportstatements (skip function bodies) - Extract: name, arguments, return_type from TypeScript annotations
- Mark with
is_vendor: true
- Parse only
- Modify
src/commands/index.rs:- Detect vendor files via
vendor_detection::classify_file() - If
--vendor-surfaceflag set and file is vendor entry point:- Call
extract_exports_lightweight()instead of full analysis
- Call
- Store in
vendor_symbols/<package>/exports.toon
- Detect vendor files via
- Add
--vendor-surfaceCLI flag
- 10x faster than full analysis for vendor files
- Measure: time to index
node_modules/react/index.js(exports only vs full)
#[test]
fn test_extract_exports_only() {
let source = r#"
export function useState<T>(initial: T): [T, (value: T) => void] {}
export const useEffect = (fn, deps) => {};
function internal() {} // Should NOT be extracted
"#;
let exports = extract_exports_lightweight(source);
assert_eq!(exports.len(), 2);
assert!(exports[0].is_vendor);
}- All tests passing
- 10x speed improvement vs full analysis
- Only extracts exports, skips internals
- CLI flag
--vendor-surfaceworks - Documentation in CLI help
Effort: ~2-3 days (medior developer)
Files:
src/detectors/javascript/types.rs(new)src/schema.rs(addTypeInfostruct)
PR Title: feat: add TypeScript type definition extraction for vendor surface (#37)
- Create
TypeInfostruct for type definitions - Implement
.d.tsparsing:interface Foo { ... }type Bar = ...declare module "..." { ... }declare class ...
- Store in
vendor_symbols/<package>/types.toon(lazy-loaded) - Add
--vendor-surface-mode=typesCLI flag
// Input: node_modules/@types/react/index.d.ts
interface FunctionComponent<P = {}> {
(props: P): ReactElement | null;
}
// Output: TypeInfo
TypeInfo {
name: "FunctionComponent",
kind: TypeKind::Interface,
type_params: ["P"],
properties: [...],
}#[test]
fn test_extract_interface() {
let source = r#"
interface User {
name: string;
age: number;
}
"#;
let types = extract_type_definitions(source);
assert_eq!(types[0].name, "User");
assert_eq!(types[0].properties.len(), 2);
}- All tests passing
- Parses common TypeScript type constructs
- Lazy-loads on-demand (not in main index)
- CLI flag
--vendor-surface-mode=typesworks - Memory usage under 5 MB for 200 packages
Effort: ~1 day (junior developer)
Files:
src/cli.rs(add flags)src/vendor_surface.rs(add config)src/commands/index.rs(add statistics)
PR Title: feat: add vendor surface configuration and indexing statistics (#37)
- Add CLI flags:
--vendor-surface-limit=<N>(default: 500)--vendor-surface-whitelist=<packages>--vendor-surface-blacklist=<packages>--vendor-surface-depth=<N>(default: 1)
- Implement limits:
- Extract top N most-imported packages
- Whitelist overrides limit
- Blacklist excludes even if in top N
- Add index statistics:
Index Statistics: First-party: 1,500 symbols (300 files) Vendor surface: 2,300 exports (42 packages) Excluded: 145 packages (not in top 500) Memory overhead: 2.1 MB (+3.5%)
#[test]
fn test_whitelist_override() {
let config = VendorSurfaceConfig {
max_packages: 10,
whitelist: vec!["react".to_string()],
..Default::default()
};
// "react" should be extracted even if not in top 10
}- All CLI flags work
- Whitelist/blacklist respected
- Statistics printed after indexing
- Documentation updated
- Detect
__init__.pyentry points - Parse
setup.py/pyproject.toml - Extract type stubs (
.pyifiles)
- Extract
@param,@returns,@typefrom comments - Use for JavaScript without TypeScript
- Cargo.toml / lib.rs (Rust)
- go.mod / module path (Go)
- Maven/Gradle POM parsing (Java)
- Each task has dedicated unit tests (see above)
- End-to-end: Index a real React project with
--vendor-surface - Verify: Call graph contains
ext:react:useStatewith signature - Verify: Memory usage under 5% overhead
- Benchmark: Index 200-package project with/without vendor surface
- Target: <10% slowdown, <5% disk overhead
- Ensure
--first-party-onlystill works (excludes vendor entirely) - Ensure no
--vendor-surfaceflag = old behavior (no vendor surface)
- Phase 1 (Task 1): Foundation only, no indexing changes
- Phase 2 (Task 2): Export extraction behind
--vendor-surfaceflag (opt-in) - Phase 3 (Task 3): Type definitions behind
--vendor-surface-mode=types - Phase 4 (Task 4): Configuration & limits
- Beta: Test with semfora-engine itself, OpenClaw, other Rust/TS projects
- GA: Enable by default for MCP server, keep opt-in for CLI
| Risk | Mitigation | Owner |
|---|---|---|
| Entry point resolution fails | Log warnings, fall back to index.js | Task 1 |
| Memory explosion | Enforce max_packages limit |
Task 4 |
| Vendor symbol pollution | Default exclude in search | Task 2 |
| Performance regression | Benchmark before merge | All tasks |
- Overhead <5% for 200-dep project ✅ (estimated 3.5%)
- No vendor→vendor call graph edges
- Entry point detection works for 95%+ of npm packages
- Call graph shows first-party→vendor edges with signatures
- Documentation clear and complete