feat(index): piggyback indexing on PCH/PCM builds and open-file compiles#402
feat(index): piggyback indexing on PCH/PCM builds and open-file compiles#40216bit-ykiko merged 6 commits intomainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughWorkers now build and serialize TUIndex payloads (optionally with an interested_only flag) and send them to the master. The master stores per-open-file in-memory indices, merges non-empty TUIndex shards into project indices while skipping open files, and query/lookup flows prefer open-file indices before consulting merged data. Changes
Sequence DiagramsequenceDiagram
participant Worker as Worker
participant Protocol as ProtocolLayer
participant Server as MasterServer
participant Project as ProjectIndex
participant Client as QueryHandler
Worker->>Worker: compile / build unit
Worker->>Worker: index::TUIndex::build(interested_only?)
Worker->>Worker: serialize TUIndex
Worker->>Protocol: Send Result + tu_index_data
Protocol->>Server: Deliver result
Server->>Server: if tu_index_data non-empty -> deserialize TUIndex
Server->>Server: store OpenFileIndex and mark path in open_proj_path_ids
Server->>Project: merge tu_index_data into persistent shards (exclude open files)
rect rgba(100,150,200,0.5)
Client->>Server: query at position / relations
Server->>Server: check open_file_indices first
alt file covered by OpenFileIndex
Server->>Server: use OpenFileIndex.content + relations/symbols
else
Server->>Project: query merged ProjectIndex (skip open_proj_path_ids)
end
end
Note over Server: on didClose -> erase OpenFileIndex, remove from open_proj_path_ids, queue background indexing
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/server/master_server.cpp`:
- Around line 919-933: The current open-file index cache (open_file_indices and
open_proj_path_ids) is not invalidated when a document is edited or when a
compile produces no TU index; tie the cache to DocumentState::generation by
storing the generation alongside OpenFileIndex (or as part of OpenFileIndex)
when you populate it from tu_index_data, and on lookups (e.g., in index-first
query paths and in didChange handling) compare stored generation vs
DocumentState::generation and clear/bypass the cached entry if they differ; also
ensure that after a failed/empty tu_index_data compile you remove any existing
open_file_indices[pid] and erase its path id from open_proj_path_ids so stale
results are not served.
In `@src/server/stateless_worker.cpp`:
- Around line 119-127: TUIndex::build is being called after the compilation Unit
has already been reset, so it builds an index from a dead unit and yields no
header data; move the TU index construction/serialization (the call to
index::TUIndex::build(unit) and subsequent clearing of tu_index.main_file_index
and serialize into tu_index_serialized) to run before the code that resets or
destroys the `unit` (the place where `unit` is cleared/reset), ensuring you
still clear the main file index via tu_index.main_file_index =
index::FileIndex{} before serialization so only PCH headers are kept.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 743cb666-144e-4583-8489-1bb290599fa5
📒 Files selected for processing (7)
src/index/tu_index.cppsrc/index/tu_index.hsrc/server/master_server.cppsrc/server/master_server.hsrc/server/protocol.hsrc/server/stateful_worker.cppsrc/server/stateless_worker.cpp
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/server/stateless_worker.cpp`:
- Around line 115-123: The current code clears tu_index.main_file_index but
still serializes it, causing master_server::merge_index_result to merge an empty
FileIndex for PCH builds; instead mark the TU as a PCH-only result and have the
master skip main-file merges. Concretely: in stateless_worker.cpp where you call
index::TUIndex::build(unit) set a PCH flag on the TUIndex (e.g.,
tu_index.is_pch_build = true or similar) and avoid serializing/sending
main_file_index for PCH units, and then update master_server::merge_index_result
(and the merge path that always merges main_file_index) to check that
TUIndex::is_pch_build and skip merging main_file_index when true so PCH results
only contribute headers.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 9a8c4690-6975-4ee7-9520-a3328db767af
📒 Files selected for processing (2)
src/server/master_server.cppsrc/server/stateless_worker.cpp
🚧 Files skipped from review as they are similar to previous changes (1)
- src/server/master_server.cpp
There was a problem hiding this comment.
🧹 Nitpick comments (3)
src/server/master_server.cpp (3)
2354-2358: Consider braces for clarity around conditionalforloop.The
if(sym_it != project_index.symbols.end())without braces immediately followed byforis valid but can be misread.♻️ Clearer formatting
auto sym_it = project_index.symbols.find(info->hash); - if(sym_it != project_index.symbols.end()) - for(auto file_id: sym_it->second.reference_files) { + if(sym_it != project_index.symbols.end()) { + for(auto file_id: sym_it->second.reference_files) {And close with an extra
}after the loop body.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/server/master_server.cpp` around lines 2354 - 2358, The conditional checking project_index.symbols (sym_it != project_index.symbols.end()) is written without braces before the following for-loop which harms readability; add explicit braces around the if block so the for-loop and its body are clearly scoped (i.e., wrap the for(auto file_id: sym_it->second.reference_files) { ... } inside an if { ... } block) and ensure a matching closing brace is placed after the loop body; reference symbols: project_index.symbols, sym_it, reference_files, and open_proj_path_ids to locate the code to update.
1519-1525:whileloop never iterates—simplify toif.The
whileloop with an unconditionalbreakat the end is misleading; it executes at most one iteration. Replace with a clearer conditional.♻️ Suggested simplification
- auto it = std::ranges::lower_bound(occs, *offset_opt, {}, [](const index::Occurrence& o) { - return o.range.end; - }); - while(it != occs.end()) { - if(it->range.contains(*offset_opt)) { - symbol_hash = it->target; - break; - } - break; - } + auto it = std::ranges::lower_bound(occs, *offset_opt, {}, [](const index::Occurrence& o) { + return o.range.end; + }); + if(it != occs.end() && it->range.contains(*offset_opt)) { + symbol_hash = it->target; + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/server/master_server.cpp` around lines 1519 - 1525, The loop using it over occs is misleading because it always breaks after one iteration; replace the while(it != occs.end()) { ... break; } with a simple conditional: check if it != occs.end() and if it->range.contains(*offset_opt) then assign symbol_hash = it->target; remove the unnecessary iterator loop and the unreachable code paths around occs/it/offset_opt/range.contains to make intent clear.
1673-1680: Samewhile-with-immediate-breakpattern—simplify.Same issue as in
query_index_relations. Thiswhileloop exits after one check regardless of the condition result.♻️ Suggested simplification
- while(it != occs.end()) { - if(it->range.contains(*offset_opt)) { - symbol_hash = it->target; - occ_range = it->range; - break; - } - break; - } + if(it != occs.end() && it->range.contains(*offset_opt)) { + symbol_hash = it->target; + occ_range = it->range; + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/server/master_server.cpp` around lines 1673 - 1680, The loop uses a while(it != occs.end()) with an unconditional break so it only ever checks the first element; replace it with a simple conditional or proper loop: check if it != occs.end() and it->range.contains(*offset_opt) then assign symbol_hash = it->target and occ_range = it->range (or iterate with a for/while until a match is found). Update the block around variables it, occs, offset_opt, symbol_hash and occ_range (same pattern as in query_index_relations) to remove the redundant break and use a single if or a correct search loop.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/server/master_server.cpp`:
- Around line 2354-2358: The conditional checking project_index.symbols (sym_it
!= project_index.symbols.end()) is written without braces before the following
for-loop which harms readability; add explicit braces around the if block so the
for-loop and its body are clearly scoped (i.e., wrap the for(auto file_id:
sym_it->second.reference_files) { ... } inside an if { ... } block) and ensure a
matching closing brace is placed after the loop body; reference symbols:
project_index.symbols, sym_it, reference_files, and open_proj_path_ids to locate
the code to update.
- Around line 1519-1525: The loop using it over occs is misleading because it
always breaks after one iteration; replace the while(it != occs.end()) { ...
break; } with a simple conditional: check if it != occs.end() and if
it->range.contains(*offset_opt) then assign symbol_hash = it->target; remove the
unnecessary iterator loop and the unreachable code paths around
occs/it/offset_opt/range.contains to make intent clear.
- Around line 1673-1680: The loop uses a while(it != occs.end()) with an
unconditional break so it only ever checks the first element; replace it with a
simple conditional or proper loop: check if it != occs.end() and
it->range.contains(*offset_opt) then assign symbol_hash = it->target and
occ_range = it->range (or iterate with a for/while until a match is found).
Update the block around variables it, occs, offset_opt, symbol_hash and
occ_range (same pattern as in query_index_relations) to remove the redundant
break and use a single if or a correct search loop.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d644e415-24c0-415b-a806-eb577f9346c9
📒 Files selected for processing (1)
src/server/master_server.cpp
There was a problem hiding this comment.
♻️ Duplicate comments (1)
src/server/stateless_worker.cpp (1)
118-123:⚠️ Potential issue | 🟠 MajorSuppress main-file merge semantics for PCH TUIndex payloads.
BuildPCHstill serializes and sends a fullTUIndexafter clearingmain_file_index. The master’s unified TUIndex merge path still processes main-file merge logic for incoming payloads, so this can create/overwrite a shard for the PCH request file instead of remaining strictly header-only.Suggested direction
- auto tu_index = index::TUIndex::build(unit); - tu_index.main_file_index = index::FileIndex(); - llvm::raw_string_ostream os(tu_index_serialized); - tu_index.serialize(os); + auto tu_index = index::TUIndex::build(unit); + // Mark payload as PCH/header-only and let master skip main_file_index merge for this kind. + // (Requires protocol/TUIndex metadata + merge-path branch in master_server.) + tu_index.main_file_index = index::FileIndex(); + llvm::raw_string_ostream os(tu_index_serialized); + tu_index.serialize(os);Also applies to: 151-151
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/server/stateless_worker.cpp` around lines 118 - 123, BuildPCH currently serializes a TUIndex built via index::TUIndex::build(unit) and clears tu_index.main_file_index before tu_index.serialize, but the master still applies main-file merge logic; update BuildPCH to mark these payloads so the master will skip main-file merging (e.g., add/set a PCH/header-only flag on the TUIndex before serialization). Specifically, when creating tu_index in BuildPCH, keep the cleared tu_index.main_file_index and also set a clear suppress-main-file-merge indicator on the TUIndex (a new or existing flag the master merge path will check), so the master’s unified TUIndex merge code can detect and ignore main-file merge semantics for PCH payloads.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/server/stateless_worker.cpp`:
- Around line 118-123: BuildPCH currently serializes a TUIndex built via
index::TUIndex::build(unit) and clears tu_index.main_file_index before
tu_index.serialize, but the master still applies main-file merge logic; update
BuildPCH to mark these payloads so the master will skip main-file merging (e.g.,
add/set a PCH/header-only flag on the TUIndex before serialization).
Specifically, when creating tu_index in BuildPCH, keep the cleared
tu_index.main_file_index and also set a clear suppress-main-file-merge indicator
on the TUIndex (a new or existing flag the master merge path will check), so the
master’s unified TUIndex merge code can detect and ignore main-file merge
semantics for PCH payloads.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a613e544-d20b-498a-a4fe-229e4df58015
📒 Files selected for processing (2)
src/server/master_server.cppsrc/server/stateless_worker.cpp
🚧 Files skipped from review as they are similar to previous changes (1)
- src/server/master_server.cpp
Instead of re-compiling files from scratch in a stateless worker just for indexing, extract index data during the compilations that already happen: - PCH build: index all preamble headers, merge into MergedIndex - PCM build: index module interface, merge into MergedIndex - Stateful worker compile: index main file only (interested_only=true), keep in-memory as OpenFileIndex — no disk persistence Query paths (GoToDefinition, FindReferences, CallHierarchy, TypeHierarchy) now check OpenFileIndex for open files and fall back to MergedIndex for disk-indexed files. Background indexing skips open files; on didClose the file is re-queued for proper MergedIndex merging. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move TUIndex::build before unit destruction in PCH handler (was a use-after-destroy bug) - Use is_one_of() instead of operator& for RelationKind::Kind enum values to avoid ambiguous overload with built-in operator& Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PCM indexing with interested_only=false traversed the entire AST (including transitive headers), making it too slow. Use interested_only=true to only index the module interface's own content. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Open files store symbols only in OpenFileIndex, not ProjectIndex. Query handlers for workspace/symbol, callHierarchy, and typeHierarchy now also search open_file_indices for symbol info and relations, so they work correctly when only open files have been indexed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Apply clang-format 21 formatting
- Use FileIndex() instead of FileIndex{} to avoid explicit constructor
copy-init error on Clang
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Extract lookup_occurrence() helper with proper innermost-match logic, replacing broken while/break/break pattern that never iterated - Extract find_symbol_info() to deduplicate symbol lookup across open file indices and ProjectIndex (6 call sites consolidated) - Fix resolve_hierarchy_item to check open file indices, not just ProjectIndex (symbols may not be in ProjectIndex yet) - Re-queue non-open dependents in didSave for background re-indexing after header changes - Add detailed comments explaining PCH indexing trade-offs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
d5bafbc to
93d1b9d
Compare
Summary
Piggyback index construction onto existing compilation steps, eliminating redundant recompilation in background indexing:
TUIndex::buildgainsinterested_onlyparameter:truetraverses only the main file's top-level decls;false(default) traverses the full ASTTUIndex::build(unit)(full traversal) after successfulBuildPCH, clearsmain_file_index, serializes and sends back; master merges into MergedIndexTUIndex::build(unit, true)after successfulBuildPCM; master merges into MergedIndexTUIndex::build(unit, true)after successfulCompile, serialized inCompileResultOpenFileIndexin-memory structure: master holdsFileIndex + SymbolTable + buffer textper open file — not persisted to disk, not merged, discarded on closequery_index_relations,lookup_symbol_at_position,find_symbol_definition_location, all hierarchy handlers, andworkspace/symbolcheckOpenFileIndexfirst (fresher), then fall back toMergedIndex(disk-indexed)documents.count(); ondidClosethe file is re-queued intoindex_queuedidSavere-queues non-open dependents: dirtied files fromcompile_graph->update()that are not open get pushed intoindex_queuefor background re-indexinglookup_occurrencehelper: binary search + forward scan picking the innermost (narrowest) match, replacing a brokenwhile/break/breakpatternfind_symbol_infohelper: consolidates 6 duplicated "search open file indices then ProjectIndex" lookups into one methodresolve_hierarchy_itemchecks open file indices: no longer limited to ProjectIndex onlyTest plan
test_indexcases: GoToDefinition, FindReferences, CallHierarchy, TypeHierarchy, WorkspaceSymbol)🤖 Generated with Claude Code