Chunk size (800) exceeds embedding model token limit (256 tokens / ~512 chars)

## Chunk size (800) exceeds embedding model token limit (256 tokens / ~512 chars)

### Problem

The default `CHUNK_SIZE = 800` in `miner.py` exceeds the token limit of the default embedding model.

**Details:**
- **Chunk size:** 800 characters (line 56 in `miner.py`)
- **Embedding model:** `all-MiniLM-L6-v2` (ChromaDB default via ONNX)
- **Model token limit:** 256 tokens (~512 characters)
- **Result:** Content beyond ~512 chars is silently truncated before embedding

### Impact

1. **Lost context:** Important information in positions 512-800 of each chunk is not included in the embedding
2. **Reduced recall:** Semantic search may miss relevant content in truncated portions
3. **Silent degradation:** No errors or warnings — users don't know this is happening

### Suggested Fixes

**Option 1: Reduce default chunk size**
```python
# miner.py line 56-57
CHUNK_SIZE = 400  # Was 800 - safer for 256 token limit
CHUNK_OVERLAP = 50  # Was 100
```

**Option 2: Add CLI configuration**
```bash
mempalace mine <path> --chunk-size 400 --chunk-overlap 50
```

**Option 3: Auto-detect model limits**
Query the embedding function for its max tokens and adjust chunking accordingly.

### Environment

- mempalace version: 3.0.0
- ChromaDB version: 1.5.7
- Embedding: Default (all-MiniLM-L6-v2 via ONNX)

### Priority

Medium — search works but quality could be significantly improved.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chunk size (800) exceeds embedding model token limit (256 tokens / ~512 chars) #390