Skip to content

feat(index): piggyback indexing on PCH/PCM builds and open-file compiles#402

Merged
16bit-ykiko merged 6 commits intomainfrom
feat/improve-index
Apr 6, 2026
Merged

feat(index): piggyback indexing on PCH/PCM builds and open-file compiles#402
16bit-ykiko merged 6 commits intomainfrom
feat/improve-index

Conversation

@16bit-ykiko
Copy link
Copy Markdown
Member

@16bit-ykiko 16bit-ykiko commented Apr 6, 2026

Summary

Piggyback index construction onto existing compilation steps, eliminating redundant recompilation in background indexing:

  • TUIndex::build gains interested_only parameter: true traverses only the main file's top-level decls; false (default) traverses the full AST
  • PCH build indexes preamble headers: stateless worker calls TUIndex::build(unit) (full traversal) after successful BuildPCH, clears main_file_index, serializes and sends back; master merges into MergedIndex
  • PCM build indexes module interface: stateless worker calls TUIndex::build(unit, true) after successful BuildPCM; master merges into MergedIndex
  • Open-file compile indexes main file: stateful worker calls TUIndex::build(unit, true) after successful Compile, serialized in CompileResult
  • New OpenFileIndex in-memory structure: master holds FileIndex + SymbolTable + buffer text per open file — not persisted to disk, not merged, discarded on close
  • Dual-source query path: query_index_relations, lookup_symbol_at_position, find_symbol_definition_location, all hierarchy handlers, and workspace/symbol check OpenFileIndex first (fresher), then fall back to MergedIndex (disk-indexed)
  • Background indexing skips open files: checked via documents.count(); on didClose the file is re-queued into index_queue
  • didSave re-queues non-open dependents: dirtied files from compile_graph->update() that are not open get pushed into index_queue for background re-indexing
  • Extract lookup_occurrence helper: binary search + forward scan picking the innermost (narrowest) match, replacing a broken while/break/break pattern
  • Extract find_symbol_info helper: consolidates 6 duplicated "search open file indices then ProjectIndex" lookups into one method
  • resolve_hierarchy_item checks open file indices: no longer limited to ProjectIndex only

Test plan

  • 465 unit tests pass
  • 105 integration tests pass (including all test_index cases: GoToDefinition, FindReferences, CallHierarchy, TypeHierarchy, WorkspaceSymbol)
  • Manual: open a file and immediately use GoToDefinition — should work without waiting for background indexing
  • Manual: close a file and verify background indexing picks it up and produces a MergedIndex shard

🤖 Generated with Claude Code

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 6, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Workers now build and serialize TUIndex payloads (optionally with an interested_only flag) and send them to the master. The master stores per-open-file in-memory indices, merges non-empty TUIndex shards into project indices while skipping open files, and query/lookup flows prefer open-file indices before consulting merged data.

Changes

Cohort / File(s) Summary
TUIndex Build API
src/index/tu_index.h, src/index/tu_index.cpp
Added bool interested_only parameter to TUIndex::build (default false) and forwarded it into Builder/SemanticVisitor construction and call sites.
Protocol & Worker Serialization
src/server/protocol.h, src/server/stateful_worker.cpp, src/server/stateless_worker.cpp
Added tu_index_data field to compile/PCH/PCM results. Workers build and serialize TUIndex (compile/PCM use interested_only=true; PCH clears main_file_index) and set *_result.tu_index_data.
MasterServer State
src/server/master_server.h
Introduced OpenFileIndex (file_index, symbols, content) plus open_file_indices map and open_proj_path_ids set to track per-open-file in-memory indices.
MasterServer Behavior & Queries
src/server/master_server.cpp
On receiving non-empty tu_index_data, master deserializes TUIndex, populates open_file_indices and open_proj_path_ids, merges TUIndex shards into persistent project index (skipping open files), defers background indexing for open files, and refactors lookup/query/hierarchy to consult open-file indices first.

Sequence Diagram

sequenceDiagram
    participant Worker as Worker
    participant Protocol as ProtocolLayer
    participant Server as MasterServer
    participant Project as ProjectIndex
    participant Client as QueryHandler

    Worker->>Worker: compile / build unit
    Worker->>Worker: index::TUIndex::build(interested_only?)
    Worker->>Worker: serialize TUIndex
    Worker->>Protocol: Send Result + tu_index_data
    Protocol->>Server: Deliver result
    Server->>Server: if tu_index_data non-empty -> deserialize TUIndex
    Server->>Server: store OpenFileIndex and mark path in open_proj_path_ids
    Server->>Project: merge tu_index_data into persistent shards (exclude open files)

    rect rgba(100,150,200,0.5)
    Client->>Server: query at position / relations
    Server->>Server: check open_file_indices first
    alt file covered by OpenFileIndex
        Server->>Server: use OpenFileIndex.content + relations/symbols
    else
        Server->>Project: query merged ProjectIndex (skip open_proj_path_ids)
    end
    end

    Note over Server: on didClose -> erase OpenFileIndex, remove from open_proj_path_ids, queue background indexing
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Poem

🐰 I hopped through code with nimble feet,

Built TUIndex parcels tidy and neat,
Open buffers kept close and bright,
Closed files queued to index by night,
Now queries bounce to the freshest seat.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: integrating indexing into PCH/PCM builds and open-file compiles to avoid redundant work.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/improve-index

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/server/master_server.cpp`:
- Around line 919-933: The current open-file index cache (open_file_indices and
open_proj_path_ids) is not invalidated when a document is edited or when a
compile produces no TU index; tie the cache to DocumentState::generation by
storing the generation alongside OpenFileIndex (or as part of OpenFileIndex)
when you populate it from tu_index_data, and on lookups (e.g., in index-first
query paths and in didChange handling) compare stored generation vs
DocumentState::generation and clear/bypass the cached entry if they differ; also
ensure that after a failed/empty tu_index_data compile you remove any existing
open_file_indices[pid] and erase its path id from open_proj_path_ids so stale
results are not served.

In `@src/server/stateless_worker.cpp`:
- Around line 119-127: TUIndex::build is being called after the compilation Unit
has already been reset, so it builds an index from a dead unit and yields no
header data; move the TU index construction/serialization (the call to
index::TUIndex::build(unit) and subsequent clearing of tu_index.main_file_index
and serialize into tu_index_serialized) to run before the code that resets or
destroys the `unit` (the place where `unit` is cleared/reset), ensuring you
still clear the main file index via tu_index.main_file_index =
index::FileIndex{} before serialization so only PCH headers are kept.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 743cb666-144e-4583-8489-1bb290599fa5

📥 Commits

Reviewing files that changed from the base of the PR and between 1627b96 and d88fb52.

📒 Files selected for processing (7)
  • src/index/tu_index.cpp
  • src/index/tu_index.h
  • src/server/master_server.cpp
  • src/server/master_server.h
  • src/server/protocol.h
  • src/server/stateful_worker.cpp
  • src/server/stateless_worker.cpp

Comment thread src/server/master_server.cpp
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/server/stateless_worker.cpp`:
- Around line 115-123: The current code clears tu_index.main_file_index but
still serializes it, causing master_server::merge_index_result to merge an empty
FileIndex for PCH builds; instead mark the TU as a PCH-only result and have the
master skip main-file merges. Concretely: in stateless_worker.cpp where you call
index::TUIndex::build(unit) set a PCH flag on the TUIndex (e.g.,
tu_index.is_pch_build = true or similar) and avoid serializing/sending
main_file_index for PCH units, and then update master_server::merge_index_result
(and the merge path that always merges main_file_index) to check that
TUIndex::is_pch_build and skip merging main_file_index when true so PCH results
only contribute headers.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9a8c4690-6975-4ee7-9520-a3328db767af

📥 Commits

Reviewing files that changed from the base of the PR and between d88fb52 and 14f27b4.

📒 Files selected for processing (2)
  • src/server/master_server.cpp
  • src/server/stateless_worker.cpp
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/server/master_server.cpp

Comment thread src/server/stateless_worker.cpp
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
src/server/master_server.cpp (3)

2354-2358: Consider braces for clarity around conditional for loop.

The if(sym_it != project_index.symbols.end()) without braces immediately followed by for is valid but can be misread.

♻️ Clearer formatting
         auto sym_it = project_index.symbols.find(info->hash);
-        if(sym_it != project_index.symbols.end())
-        for(auto file_id: sym_it->second.reference_files) {
+        if(sym_it != project_index.symbols.end()) {
+            for(auto file_id: sym_it->second.reference_files) {

And close with an extra } after the loop body.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/server/master_server.cpp` around lines 2354 - 2358, The conditional
checking project_index.symbols (sym_it != project_index.symbols.end()) is
written without braces before the following for-loop which harms readability;
add explicit braces around the if block so the for-loop and its body are clearly
scoped (i.e., wrap the for(auto file_id: sym_it->second.reference_files) { ... }
inside an if { ... } block) and ensure a matching closing brace is placed after
the loop body; reference symbols: project_index.symbols, sym_it,
reference_files, and open_proj_path_ids to locate the code to update.

1519-1525: while loop never iterates—simplify to if.

The while loop with an unconditional break at the end is misleading; it executes at most one iteration. Replace with a clearer conditional.

♻️ Suggested simplification
-        auto it = std::ranges::lower_bound(occs, *offset_opt, {}, [](const index::Occurrence& o) {
-            return o.range.end;
-        });
-        while(it != occs.end()) {
-            if(it->range.contains(*offset_opt)) {
-                symbol_hash = it->target;
-                break;
-            }
-            break;
-        }
+        auto it = std::ranges::lower_bound(occs, *offset_opt, {}, [](const index::Occurrence& o) {
+            return o.range.end;
+        });
+        if(it != occs.end() && it->range.contains(*offset_opt)) {
+            symbol_hash = it->target;
+        }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/server/master_server.cpp` around lines 1519 - 1525, The loop using it
over occs is misleading because it always breaks after one iteration; replace
the while(it != occs.end()) { ... break; } with a simple conditional: check if
it != occs.end() and if it->range.contains(*offset_opt) then assign symbol_hash
= it->target; remove the unnecessary iterator loop and the unreachable code
paths around occs/it/offset_opt/range.contains to make intent clear.

1673-1680: Same while-with-immediate-break pattern—simplify.

Same issue as in query_index_relations. This while loop exits after one check regardless of the condition result.

♻️ Suggested simplification
-        while(it != occs.end()) {
-            if(it->range.contains(*offset_opt)) {
-                symbol_hash = it->target;
-                occ_range = it->range;
-                break;
-            }
-            break;
-        }
+        if(it != occs.end() && it->range.contains(*offset_opt)) {
+            symbol_hash = it->target;
+            occ_range = it->range;
+        }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/server/master_server.cpp` around lines 1673 - 1680, The loop uses a
while(it != occs.end()) with an unconditional break so it only ever checks the
first element; replace it with a simple conditional or proper loop: check if it
!= occs.end() and it->range.contains(*offset_opt) then assign symbol_hash =
it->target and occ_range = it->range (or iterate with a for/while until a match
is found). Update the block around variables it, occs, offset_opt, symbol_hash
and occ_range (same pattern as in query_index_relations) to remove the redundant
break and use a single if or a correct search loop.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/server/master_server.cpp`:
- Around line 2354-2358: The conditional checking project_index.symbols (sym_it
!= project_index.symbols.end()) is written without braces before the following
for-loop which harms readability; add explicit braces around the if block so the
for-loop and its body are clearly scoped (i.e., wrap the for(auto file_id:
sym_it->second.reference_files) { ... } inside an if { ... } block) and ensure a
matching closing brace is placed after the loop body; reference symbols:
project_index.symbols, sym_it, reference_files, and open_proj_path_ids to locate
the code to update.
- Around line 1519-1525: The loop using it over occs is misleading because it
always breaks after one iteration; replace the while(it != occs.end()) { ...
break; } with a simple conditional: check if it != occs.end() and if
it->range.contains(*offset_opt) then assign symbol_hash = it->target; remove the
unnecessary iterator loop and the unreachable code paths around
occs/it/offset_opt/range.contains to make intent clear.
- Around line 1673-1680: The loop uses a while(it != occs.end()) with an
unconditional break so it only ever checks the first element; replace it with a
simple conditional or proper loop: check if it != occs.end() and
it->range.contains(*offset_opt) then assign symbol_hash = it->target and
occ_range = it->range (or iterate with a for/while until a match is found).
Update the block around variables it, occs, offset_opt, symbol_hash and
occ_range (same pattern as in query_index_relations) to remove the redundant
break and use a single if or a correct search loop.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d644e415-24c0-415b-a806-eb577f9346c9

📥 Commits

Reviewing files that changed from the base of the PR and between cb8e309 and 0fd401c.

📒 Files selected for processing (1)
  • src/server/master_server.cpp

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/server/stateless_worker.cpp (1)

118-123: ⚠️ Potential issue | 🟠 Major

Suppress main-file merge semantics for PCH TUIndex payloads.

BuildPCH still serializes and sends a full TUIndex after clearing main_file_index. The master’s unified TUIndex merge path still processes main-file merge logic for incoming payloads, so this can create/overwrite a shard for the PCH request file instead of remaining strictly header-only.

Suggested direction
- auto tu_index = index::TUIndex::build(unit);
- tu_index.main_file_index = index::FileIndex();
- llvm::raw_string_ostream os(tu_index_serialized);
- tu_index.serialize(os);
+ auto tu_index = index::TUIndex::build(unit);
+ // Mark payload as PCH/header-only and let master skip main_file_index merge for this kind.
+ // (Requires protocol/TUIndex metadata + merge-path branch in master_server.)
+ tu_index.main_file_index = index::FileIndex();
+ llvm::raw_string_ostream os(tu_index_serialized);
+ tu_index.serialize(os);

Also applies to: 151-151

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/server/stateless_worker.cpp` around lines 118 - 123, BuildPCH currently
serializes a TUIndex built via index::TUIndex::build(unit) and clears
tu_index.main_file_index before tu_index.serialize, but the master still applies
main-file merge logic; update BuildPCH to mark these payloads so the master will
skip main-file merging (e.g., add/set a PCH/header-only flag on the TUIndex
before serialization). Specifically, when creating tu_index in BuildPCH, keep
the cleared tu_index.main_file_index and also set a clear
suppress-main-file-merge indicator on the TUIndex (a new or existing flag the
master merge path will check), so the master’s unified TUIndex merge code can
detect and ignore main-file merge semantics for PCH payloads.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/server/stateless_worker.cpp`:
- Around line 118-123: BuildPCH currently serializes a TUIndex built via
index::TUIndex::build(unit) and clears tu_index.main_file_index before
tu_index.serialize, but the master still applies main-file merge logic; update
BuildPCH to mark these payloads so the master will skip main-file merging (e.g.,
add/set a PCH/header-only flag on the TUIndex before serialization).
Specifically, when creating tu_index in BuildPCH, keep the cleared
tu_index.main_file_index and also set a clear suppress-main-file-merge indicator
on the TUIndex (a new or existing flag the master merge path will check), so the
master’s unified TUIndex merge code can detect and ignore main-file merge
semantics for PCH payloads.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a613e544-d20b-498a-a4fe-229e4df58015

📥 Commits

Reviewing files that changed from the base of the PR and between 0fd401c and 74e14c3.

📒 Files selected for processing (2)
  • src/server/master_server.cpp
  • src/server/stateless_worker.cpp
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/server/master_server.cpp

@16bit-ykiko 16bit-ykiko changed the title feat(index): piggyback indexing on PCH/PCM/AST compilation feat(index): piggyback indexing on PCH/PCM builds and open-file compiles Apr 6, 2026
16bit-ykiko and others added 6 commits April 6, 2026 19:55
Instead of re-compiling files from scratch in a stateless worker just
for indexing, extract index data during the compilations that already
happen:

- PCH build: index all preamble headers, merge into MergedIndex
- PCM build: index module interface, merge into MergedIndex
- Stateful worker compile: index main file only (interested_only=true),
  keep in-memory as OpenFileIndex — no disk persistence

Query paths (GoToDefinition, FindReferences, CallHierarchy,
TypeHierarchy) now check OpenFileIndex for open files and fall back
to MergedIndex for disk-indexed files. Background indexing skips
open files; on didClose the file is re-queued for proper MergedIndex
merging.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move TUIndex::build before unit destruction in PCH handler (was a
  use-after-destroy bug)
- Use is_one_of() instead of operator& for RelationKind::Kind enum
  values to avoid ambiguous overload with built-in operator&

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PCM indexing with interested_only=false traversed the entire AST
(including transitive headers), making it too slow. Use
interested_only=true to only index the module interface's own content.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Open files store symbols only in OpenFileIndex, not ProjectIndex.
Query handlers for workspace/symbol, callHierarchy, and typeHierarchy
now also search open_file_indices for symbol info and relations,
so they work correctly when only open files have been indexed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Apply clang-format 21 formatting
- Use FileIndex() instead of FileIndex{} to avoid explicit constructor
  copy-init error on Clang

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Extract lookup_occurrence() helper with proper innermost-match logic,
  replacing broken while/break/break pattern that never iterated
- Extract find_symbol_info() to deduplicate symbol lookup across open
  file indices and ProjectIndex (6 call sites consolidated)
- Fix resolve_hierarchy_item to check open file indices, not just
  ProjectIndex (symbols may not be in ProjectIndex yet)
- Re-queue non-open dependents in didSave for background re-indexing
  after header changes
- Add detailed comments explaining PCH indexing trade-offs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@16bit-ykiko 16bit-ykiko merged commit ada202e into main Apr 6, 2026
15 checks passed
@16bit-ykiko 16bit-ykiko deleted the feat/improve-index branch April 6, 2026 12:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant