FileSynchronizer tracks binary files — merkle DAG bypasses supportedExtensions filter

## Summary

`FileSynchronizer.generateFileHashes()` hashes **all** non-ignored files regardless of extension. It does not receive or use `supportedExtensions`, so the merkle DAG tracks binary files (PDFs, PNGs, tar.gz, etc.). During `reindexByChange()`, changed binary files are passed to the embedding provider as utf-8 text.

## Root cause

Two independent file-traversal systems with different filters:

| Component | Filters by extension? | Result |
|-----------|----------------------|--------|
| `Context.getCodeFiles()` | Yes — checks `supportedExtensions.includes(ext)` | Correct: only indexes supported files |
| `FileSynchronizer.generateFileHashes()` | No — only checks `ignorePatterns` | Bug: hashes PDFs, images, archives |

`FileSynchronizer` constructor accepts `ignorePatterns` but has no concept of `supportedExtensions`. Since `DEFAULT_IGNORE_PATTERNS` doesn't block `.pdf`, `.png`, `.tar.gz`, etc., they get tracked.

## Impact

1. **Binary content sent to embedding provider** — `reindexByChange()` detects "changes" to binary files, reads them as utf-8 (`readFile(filePath, 'utf-8')`), and sends garbled content to the embedding API. Wasted tokens and potentially corrupted vector space.
2. **Wasted I/O on every sync cycle** — merkle DAG hashes binary files on every `checkForChanges()` call, even though they'll never be indexed.
3. **Privacy risk** — sensitive binary files (e.g., confidential PDFs) get hashed and their paths stored in `~/.context/merkle/` snapshots.

## Reproduction

1. Create or use a codebase containing PDF files not listed in `.gitignore`
2. Index the codebase with `index_codebase`
3. Inspect the merkle snapshot:
   ```bash
   jq -r '.fileHashes[][0]' ~/.context/merkle/<hash>.json | grep -E '\.(pdf|png|jpg|zip|tar|gz)$'
   ```
4. Observe binary files are tracked despite not being in `supportedExtensions`
5. Modify a tracked binary file and trigger a sync — embedding provider receives garbled utf-8 content

## Suggested fix

Pass `supportedExtensions` to `FileSynchronizer` and filter in `generateFileHashes()`:

```typescript
// In FileSynchronizer constructor
constructor(rootDir: string, ignorePatterns: string[] = [], supportedExtensions: string[] = []) {
    // ...
    this.supportedExtensions = supportedExtensions;
}

// In generateFileHashes(), after isFile() check
} else if (stat.isFile()) {
    const ext = path.extname(entry.name);
    if (this.supportedExtensions.length > 0 && !this.supportedExtensions.includes(ext)) {
        continue; // Skip unsupported extensions
    }
    // ... existing hash logic
}
```

Update callers in `Context.indexCodebase()` and `Context.reindexByChange()` to pass `this.supportedExtensions` when constructing `FileSynchronizer`.

## Environment

- `@zilliz/claude-context-mcp@latest`
- Local Milvus
- Local embedding provider
- macOS

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FileSynchronizer tracks binary files — merkle DAG bypasses supportedExtensions filter #286

Summary

Root cause

Impact

Reproduction

Suggested fix

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Component	Filters by extension?	Result
`Context.getCodeFiles()`	Yes — checks `supportedExtensions.includes(ext)`	Correct: only indexes supported files
`FileSynchronizer.generateFileHashes()`	No — only checks `ignorePatterns`	Bug: hashes PDFs, images, archives

FileSynchronizer tracks binary files — merkle DAG bypasses supportedExtensions filter #286

Description

Summary

Root cause

Impact

Reproduction

Suggested fix

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions