refactor: migrate to modules and support module-based document generation#1
refactor: migrate to modules and support module-based document generation#1tsurumi-yizhou merged 7 commits intomainfrom
Conversation
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 9 minutes and 8 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughMoves many subsystems from headers to C++20 modules, deletes corresponding headers, updates build/toolchain (pixi.toml, CMake) to enable module scanning and pin clang-tools, adapts main/tests to import modules, and adds module-aware extraction, scanning, compilation-database, generation, and LLM call logic. Changes
Sequence Diagram(s)sequenceDiagram
participant Generator as Generator (generate)
participant FS as Filesystem (temp files)
participant Shell as Shell / curl
participant LLM as LLM API
Note over Generator,FS: call_llm(model,prompt)
Generator->>FS: create temp files (request, headers, response)
Generator->>Shell: invoke curl with temp files
Shell->>LLM: HTTP request (OPENAI_BASE_URL, API key)
LLM-->>Shell: HTTP response (body / error)
Shell-->>FS: write response to temp file
Shell-->>Generator: exit code, stdout/stderr
Generator->>FS: read response file
Generator->>Generator: parse JSON -> content or LLMError
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/config/validate.cppm (1)
33-55:⚠️ Potential issue | 🟠 MajorUse non-throwing
std::error_codeoverloads for filesystem checks to prevent exceptions from escapingstd::expectedflow.The code at lines 33, 38, 48, 52, 61, and 79 uses throwing
std::filesystemoverloads. Permission-denied or filesystem errors will throwstd::filesystem_error, bypassing yourstd::expected<void, ValidationError>error handling contract. Replace with non-throwing overloads that accept anstd::error_codeparameter.Example fix pattern
namespace clore::config { auto validate(const TaskConfig& config) -> std::expected<void, ValidationError> { namespace fs = std::filesystem; + auto fs_fail = [](std::string_view field, const fs::path& p, const std::error_code& ec) { + return std::unexpected(ValidationError{ + .message = std::format("{} filesystem error for '{}': {}", field, p.string(), ec.message())}); + }; - if(!fs::exists(config.compile_commands_path)) { + std::error_code ec; + if(!fs::exists(config.compile_commands_path, ec)) { + if(ec) return fs_fail("compile_commands_path", config.compile_commands_path, ec); return std::unexpected(ValidationError{ .message = std::format("compile_commands_path does not exist: {}", config.compile_commands_path)}); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/config/validate.cppm` around lines 33 - 55, The filesystem checks in validate.cppm (the checks using fs::exists, fs::is_regular_file, and fs::is_directory against config.compile_commands_path and config.project_root) use throwing overloads; replace them with the non-throwing overloads that take an std::error_code& (e.g., fs::exists(path, ec), fs::is_regular_file(path, ec), fs::is_directory(path, ec)), check the error_code after each call, and on error return std::unexpected(ValidationError{ .message = std::format("...: {} ({})", path, ec.message()) }) so filesystem exceptions cannot escape the std::expected flow. Ensure you update all occurrences that validate config.compile_commands_path and config.project_root accordingly.src/main.cpp (1)
111-114:⚠️ Potential issue | 🟠 MajorValidate invalid
--log-levelvalues instead of accepting them silently.At line 112,
spdlog::level::from_strmaps unknown values tooff, which silently disables logging and hides user mistakes.Suggested fix
if(opts.log_level.has_value()) { auto level = spdlog::level::from_str(*opts.log_level); + if(level == spdlog::level::off && *opts.log_level != "off") { + clore::logging::err("invalid --log-level: {}", *opts.log_level); + return 1; + } clore::logging::options.level = level; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/main.cpp` around lines 111 - 114, The current code accepts any --log-level because spdlog::level::from_str maps unknown strings to spdlog::level::off; update the opts.log_level handling to validate the user string: call spdlog::level::from_str(*opts.log_level) as before but then check if the returned level equals spdlog::level::off while the provided string (case-insensitive) is not "off"; if so, emit a clear error (e.g., processLogger or std::cerr) indicating an invalid log level and exit with nonzero status, otherwise assign clore::logging::options.level = level. Ensure you reference opts.log_level, spdlog::level::from_str, and clore::logging::options.level when making the change.
🧹 Nitpick comments (5)
src/generate/generate.cppm (3)
414-436: Code duplication with file-based page graph.The call graph edge building logic (lines 414-436) is nearly identical to lines 318-340 in
build_file_page_graph. Consider extracting this to a shared helper function.♻️ Suggested extraction
auto add_call_graph_edges(PageGraph& graph) -> void { std::unordered_map<extract::SymbolID, std::string> sym_to_page; for(auto& [page_path, node] : graph.nodes) { for(auto& ctx : node.plan.contexts) { sym_to_page[ctx.self.id] = page_path; } } for(auto& [page_path, node] : graph.nodes) { std::unordered_set<std::string> already; for(auto& dep : node.depends_on) already.insert(dep); for(auto& ctx : node.plan.contexts) { for(auto& callee_id : ctx.self.calls) { auto it = sym_to_page.find(callee_id); if(it == sym_to_page.end()) continue; if(it->second == page_path) continue; if(!already.insert(it->second).second) continue; node.depends_on.push_back(it->second); graph.nodes[it->second].depended_by.push_back(page_path); node.plan.linked_pages.push_back(it->second); } } } }Then call
add_call_graph_edges(graph);at the end of bothbuild_file_page_graphandbuild_module_page_graph.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/generate/generate.cppm` around lines 414 - 436, Extract the duplicated call-graph edge construction into a helper (e.g., add_call_graph_edges(PageGraph& graph)) and call it from both build_file_page_graph and build_module_page_graph: move the sym_to_page construction and the second loop that updates node.depends_on, graph.nodes[...].depended_by and node.plan.linked_pages into that function, keep types like extract::SymbolID, PageGraph, node.plan.contexts, ctx.self.calls and the local "already" set unchanged, and replace the original duplicated blocks with a single call add_call_graph_edges(graph).
404-411: Partition-to-main module dependency is incomplete.The code adds the main module to
linked_pagesbut not todepends_on. This means the topological sort won't enforce that main modules are generated before their partitions, potentially affecting cross-reference quality in generated documentation.♻️ Suggested fix to add dependency edge
// Partition imports from main module: if this is a partition, depend on the main module if(auto colon_pos = mod_name.find(':'); colon_pos != std::string::npos) { auto main_name = mod_name.substr(0, colon_pos); auto main_it = mod_to_page.find(main_name); if(main_it != mod_to_page.end() && main_it->second != page_a) { + node_a_it->second.depends_on.push_back(main_it->second); + graph.nodes[main_it->second].depended_by.push_back(page_a); node_a_it->second.plan.linked_pages.push_back(main_it->second); } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/generate/generate.cppm` around lines 404 - 411, The partition-to-main dependency only pushes the main module into node_a_it->second.plan.linked_pages but misses adding the same main page to the dependency list; update the block that handles partitions (using mod_name, main_name, mod_to_page, page_a, and node_a_it) so that when main_it->second is added to node_a_it->second.plan.linked_pages you also add main_it->second to node_a_it->second.plan.depends_on (or the equivalent dependency container) to ensure the topological sort orders the main module before its partition.
695-724: Consider providing a summary of failed page generations.When some pages fail LLM generation, the code logs individual warnings and continues. Users may not easily notice missing documentation pages. Consider logging a summary at the end (e.g., "Generated X/Y pages successfully, Z failed").
♻️ Optional: Add failure summary
+ std::size_t failed_count = 0; for(std::size_t i = 0; i < prompts.size(); ++i) { auto& prompt = prompts[i]; logging::info("generating page {}/{}: {}", i + 1, prompts.size(), prompt.relative_path); auto llm_result = call_llm(llm_model, prompt.prompt); if(!llm_result.has_value()) { logging::warn("LLM call failed for {}: {}", prompt.relative_path, llm_result.error().message); + ++failed_count; continue; } // ... rest of loop } + if(failed_count > 0) { + logging::warn("generated {}/{} pages ({} failed)", + pages.size(), prompts.size(), failed_count); + } + if(pages.empty()) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/generate/generate.cppm` around lines 695 - 724, Track failures during the loop that iterates prompts (the for loop that calls call_llm) by incrementing a counter or collecting failed prompt.relative_path whenever llm_result.has_value() is false, then after the loop (before the pages.empty() check and return) emit a concise summary log using logging::info or logging::warn that states how many pages succeeded (pages.size()), how many total prompts there were (prompts.size()), and how many failed (or list failed relative_path values); update references in the generate routine that creates GeneratedPage and pushes to pages to use this failure counter/list so the final summary reflects the actual outcomes.src/extract/compdb.cppm (1)
79-82: Simplify vector population.The loop can be replaced with direct assignment since
CommandLineis astd::vector<std::string>.♻️ Suggested simplification
entry.arguments.reserve(cmd.CommandLine.size()); - for(auto& arg : cmd.CommandLine) { - entry.arguments.push_back(arg); - } + entry.arguments = cmd.CommandLine;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/extract/compdb.cppm` around lines 79 - 82, Replace the manual reserve + loop that copies cmd.CommandLine into entry.arguments with a direct vector assignment (e.g., entry.arguments = cmd.CommandLine) or an assign call; remove the now-unnecessary reserve and loop. This change targets the code that manipulates entry.arguments and cmd.CommandLine in compdb.cppm to simplify and make the copy idiomatic.src/extract/scan.cppm (1)
205-217: Consider adding a debug log when file read fails.If the file cannot be opened for the fast module scan, the code silently falls through. While the subsequent Clang-based scan will handle the error, a debug log would aid troubleshooting.
♻️ Optional: Add debug logging
std::ifstream ifs(file_path); if(ifs.is_open()) { std::string content((std::istreambuf_iterator<char>(ifs)), std::istreambuf_iterator<char>()); scan_module_decl(content, result); + } else { + logging::debug("could not open file for module scan: {}", file_path.string()); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/extract/scan.cppm` around lines 205 - 217, When opening files for the fast module scan in the block that constructs file_path from entry.file/entry.directory and before calling scan_module_decl(content, result), add a debug log when std::ifstream ifs(file_path) fails to open (i.e., else of ifs.is_open()). Log the attempted file_path and a brief error detail (e.g., strerror(errno) or filesystem error) so failures are visible during debugging; update the same scope that currently calls scan_module_decl to emit this debug message instead of silently falling through.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/extract/extract.cppm`:
- Around line 509-510: The fallback branch that re-scans files produces a
ScanResult that is never stored back into scan_cache, so
build_module_info(model, scan_cache) never sees those recovered results; update
the fallback re-scan branch to persist the newly produced ScanResult into
scan_cache (e.g., insert/emplace/update scan_cache keyed by the file path)
before continuing so build_module_info will include the recovered module/import
metadata for that file.
In `@src/extract/model.cppm`:
- Around line 74-80: The model currently keys module units by module name
causing later units to overwrite earlier ones; update the data structure and
build_module_info logic so modules are unique per source file or allow multiple
units per module name: modify ProjectModel.modules to key by
ModuleUnit.source_file (or change its value to a collection like
std::vector<ModuleUnit> mapped from the module name) and update
build_module_info and any consumers to use the new keying (look for uses of
ProjectModel.modules and the ModuleUnit struct and adjust creation/population in
build_module_info to insert a distinct ModuleUnit per source_file or push_back
into the module-name collection).
In `@src/support/logging.cppm`:
- Around line 3-8: Add an explicit `#include` <utility> to the existing include
block so that std::forward is provided by the header rather than relying on
transitive includes; locate the include list that currently has <cstdint>,
<format>, <optional>, <string>, <string_view> and insert <utility> there to
cover uses of std::forward in the file.
---
Outside diff comments:
In `@src/config/validate.cppm`:
- Around line 33-55: The filesystem checks in validate.cppm (the checks using
fs::exists, fs::is_regular_file, and fs::is_directory against
config.compile_commands_path and config.project_root) use throwing overloads;
replace them with the non-throwing overloads that take an std::error_code&
(e.g., fs::exists(path, ec), fs::is_regular_file(path, ec),
fs::is_directory(path, ec)), check the error_code after each call, and on error
return std::unexpected(ValidationError{ .message = std::format("...: {} ({})",
path, ec.message()) }) so filesystem exceptions cannot escape the std::expected
flow. Ensure you update all occurrences that validate
config.compile_commands_path and config.project_root accordingly.
In `@src/main.cpp`:
- Around line 111-114: The current code accepts any --log-level because
spdlog::level::from_str maps unknown strings to spdlog::level::off; update the
opts.log_level handling to validate the user string: call
spdlog::level::from_str(*opts.log_level) as before but then check if the
returned level equals spdlog::level::off while the provided string
(case-insensitive) is not "off"; if so, emit a clear error (e.g., processLogger
or std::cerr) indicating an invalid log level and exit with nonzero status,
otherwise assign clore::logging::options.level = level. Ensure you reference
opts.log_level, spdlog::level::from_str, and clore::logging::options.level when
making the change.
---
Nitpick comments:
In `@src/extract/compdb.cppm`:
- Around line 79-82: Replace the manual reserve + loop that copies
cmd.CommandLine into entry.arguments with a direct vector assignment (e.g.,
entry.arguments = cmd.CommandLine) or an assign call; remove the now-unnecessary
reserve and loop. This change targets the code that manipulates entry.arguments
and cmd.CommandLine in compdb.cppm to simplify and make the copy idiomatic.
In `@src/extract/scan.cppm`:
- Around line 205-217: When opening files for the fast module scan in the block
that constructs file_path from entry.file/entry.directory and before calling
scan_module_decl(content, result), add a debug log when std::ifstream
ifs(file_path) fails to open (i.e., else of ifs.is_open()). Log the attempted
file_path and a brief error detail (e.g., strerror(errno) or filesystem error)
so failures are visible during debugging; update the same scope that currently
calls scan_module_decl to emit this debug message instead of silently falling
through.
In `@src/generate/generate.cppm`:
- Around line 414-436: Extract the duplicated call-graph edge construction into
a helper (e.g., add_call_graph_edges(PageGraph& graph)) and call it from both
build_file_page_graph and build_module_page_graph: move the sym_to_page
construction and the second loop that updates node.depends_on,
graph.nodes[...].depended_by and node.plan.linked_pages into that function, keep
types like extract::SymbolID, PageGraph, node.plan.contexts, ctx.self.calls and
the local "already" set unchanged, and replace the original duplicated blocks
with a single call add_call_graph_edges(graph).
- Around line 404-411: The partition-to-main dependency only pushes the main
module into node_a_it->second.plan.linked_pages but misses adding the same main
page to the dependency list; update the block that handles partitions (using
mod_name, main_name, mod_to_page, page_a, and node_a_it) so that when
main_it->second is added to node_a_it->second.plan.linked_pages you also add
main_it->second to node_a_it->second.plan.depends_on (or the equivalent
dependency container) to ensure the topological sort orders the main module
before its partition.
- Around line 695-724: Track failures during the loop that iterates prompts (the
for loop that calls call_llm) by incrementing a counter or collecting failed
prompt.relative_path whenever llm_result.has_value() is false, then after the
loop (before the pages.empty() check and return) emit a concise summary log
using logging::info or logging::warn that states how many pages succeeded
(pages.size()), how many total prompts there were (prompts.size()), and how many
failed (or list failed relative_path values); update references in the generate
routine that creates GeneratedPage and pushes to pages to use this failure
counter/list so the final summary reflects the actual outcomes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3db2a0af-4529-477d-b466-6448972d0ed8
⛔ Files ignored due to path filters (1)
pixi.lockis excluded by!**/*.lock
📒 Files selected for processing (42)
pixi.tomlsrc/CMakeLists.txtsrc/config/config.cppmsrc/config/config.hsrc/config/load.cppmsrc/config/load.hsrc/config/normalize.cppmsrc/config/normalize.hsrc/config/schema.cppmsrc/config/validate.cppmsrc/config/validate.hsrc/extract/ast.cppmsrc/extract/ast.hsrc/extract/compdb.cppsrc/extract/compdb.cppmsrc/extract/compdb.hsrc/extract/extract.cppmsrc/extract/extract.hsrc/extract/model.cppmsrc/extract/scan.cppmsrc/extract/scan.hsrc/extract/symbol.cppsrc/extract/symbol.cppmsrc/extract/symbol.hsrc/extract/tooling.cppmsrc/extract/tooling.hsrc/generate/generate.cppmsrc/generate/generate.hsrc/generate/llm.cppmsrc/generate/llm.hsrc/main.cppsrc/support/logging.cppsrc/support/logging.cppmtests/unit/config/config.cpptests/unit/config/validate.cpptests/unit/extract/ast.cpptests/unit/extract/compdb.cpptests/unit/extract/extract.cpptests/unit/extract/scan.cpptests/unit/extract/symbol.cpptests/unit/generate/generate.cpptests/unit/generate/llm.cpp
💤 Files with no reviewable changes (15)
- src/support/logging.cpp
- src/config/config.h
- src/extract/symbol.cpp
- src/config/normalize.h
- src/config/validate.h
- src/extract/tooling.h
- src/config/load.h
- src/extract/extract.h
- src/extract/compdb.h
- src/extract/symbol.h
- src/generate/llm.h
- src/extract/ast.h
- src/extract/compdb.cpp
- src/extract/scan.h
- src/generate/generate.h
There was a problem hiding this comment.
Pull request overview
Refactors the C++ codebase to use C++20 named modules (.cppm) instead of header/source pairs, and extends the documentation generator to support module-based page planning/generation when a project uses modules.
Changes:
- Migrates
config,extract,generate, andsupportAPIs from headers to exported modules, updating app and unit tests toimport .... - Adds module detection to extraction and introduces a module-based page graph + module-oriented prompt framing in generation.
- Updates CMake and Pixi environment dependencies to build with module scanning and ensure required Clang tooling is available.
Reviewed changes
Copilot reviewed 42 out of 43 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/generate/llm.cpp | Switches unit test include to import clore.generate. |
| tests/unit/generate/generate.cpp | Switches unit tests to module imports for config/extract/generate. |
| tests/unit/extract/symbol.cpp | Switches unit test include to import clore.extract. |
| tests/unit/extract/scan.cpp | Switches unit test include to import clore.extract. |
| tests/unit/extract/extract.cpp | Switches integration/unit tests to module imports. |
| tests/unit/extract/compdb.cpp | Switches unit test include to import clore.extract. |
| tests/unit/extract/ast.cpp | Switches unit test include to import clore.extract. |
| tests/unit/config/validate.cpp | Switches unit test include to import clore.config. |
| tests/unit/config/config.cpp | Switches unit test include to import clore.config. |
| src/support/logging.cppm | Converts logging to exported module (clore.support) and removes legacy logging macros. |
| src/support/logging.cpp | Removes legacy TU that only included the old header. |
| src/main.cpp | Replaces header includes with module imports; adjusts logging setup to use module API. |
| src/generate/llm.h | Deletes header in favor of module interface. |
| src/generate/llm.cppm | Turns LLM client into module partition (clore.generate:llm) and refactors internals slightly. |
| src/generate/generate.h | Deletes header in favor of module interface. |
| src/generate/generate.cppm | Exports clore.generate module and adds module-based page graph + module-specific prompt text. |
| src/extract/tooling.h | Deletes header in favor of module partition. |
| src/extract/tooling.cppm | Turns tooling helper into module partition (clore.extract:tooling). |
| src/extract/symbol.h | Deletes header in favor of module partition. |
| src/extract/symbol.cppm | Adds exported module partition (clore.extract:symbol) defining SymbolKind, SymbolID, and std::hash. |
| src/extract/symbol.cpp | Removes legacy implementation TU. |
| src/extract/scan.h | Deletes header in favor of module partition. |
| src/extract/scan.cppm | Exports scan-related types/APIs as a module partition and adds fast module directive scanning. |
| src/extract/model.cppm | Exports model as a module partition and extends ProjectModel with module-unit metadata. |
| src/extract/extract.h | Deletes header in favor of module interface. |
| src/extract/extract.cppm | Exports clore.extract module and populates module info from scan cache into the model. |
| src/extract/compdb.h | Deletes header in favor of module partition. |
| src/extract/compdb.cppm | Adds exported module partition (clore.extract:compdb) and moves argument-sanitization helpers into it. |
| src/extract/compdb.cpp | Removes legacy implementation TU. |
| src/extract/ast.h | Deletes header in favor of module partition. |
| src/extract/ast.cppm | Turns AST extraction into module partition (clore.extract:ast). |
| src/config/validate.h | Deletes header in favor of module partition. |
| src/config/validate.cppm | Turns validate into module partition (clore.config:validate). |
| src/config/schema.cppm | Turns schema into module partition (clore.config:schema). |
| src/config/normalize.h | Deletes header in favor of module partition. |
| src/config/normalize.cppm | Turns normalize into module partition (clore.config:normalize). |
| src/config/load.h | Deletes header in favor of module partition. |
| src/config/load.cppm | Turns load into module partition (clore.config:load). |
| src/config/config.h | Deletes umbrella header. |
| src/config/config.cppm | Adds umbrella module (clore.config) that re-exports schema/load/normalize/validate. |
| src/CMakeLists.txt | Switches library sources to .cppm module file sets, enables module scanning, and locates clang-scan-deps. |
| pixi.toml | Adds clang-tools dependency (needed for module compilation support tooling). |
| pixi.lock | Updates lockfile to reflect the added clang-tools dependency. |
Comments suppressed due to low confidence (10)
src/support/logging.cppm:7
- This module uses
std::forwardin the templated logging overloads later in the file, but<utility>is not included in the global module fragment. Add#include <utility>to avoid relying on transitive includes.
src/extract/scan.cppm:191 importdirectives likeimport :tooling;are currently recorded as just":tooling", sobuild_module_page_graph()won't be able to match them against fully-qualified module names (e.g.clore.extract:tooling). Consider normalizing partition imports here (prefix with the current module's main name) and de-duplicatingmodule_importsto avoid repeated edges later.
src/generate/generate.cppm:409- The comment says partitions should "depend on the main module", but the code only adds the main module to
linked_pages(nodepends_on/depended_byedge). If generation order matters here, add the corresponding dependency edge sosort_page_graph()can enforce it.
src/generate/generate.cppm:401 depends_onedges added frommod_unit.importsare not de-duplicated. If a module imports the same dependency more than once,sort_page_graph()will over-count in-degrees (it usesdepends_on.size()), which can produce incorrect generation order / false cycle detection. Consider guarding with a set (or de-dupingmod_unit.imports) before pushing.
src/generate/llm.cppm:188- The request body is written to a predictable temp path (
.../clore_llm_request.json). This can collide across concurrent runs and is susceptible to symlink/hardlink attacks on multi-user systems. Use a unique temp file (e.g.,mkstemp/std::tmpnam-avoidant approach or a random filename) and consider opening with exclusive-create semantics.
src/generate/llm.cppm:204 - The
curlcommand embeds the API key directly into a shell command string and is executed viapopen(). This risks leaking the key via process listings / logs and also allows shell injection if the env-provided URL/key contain special characters. Prefer a non-shell execution path (argv-based process spawn) or a proper HTTP client library, and avoid putting secrets on the command line.
src/generate/generate.cppm:641 build_page_graph()now has a module-based path (model.uses_modules) and module-specific prompt framing, but existing unit tests only cover the file-based flow. Add tests that build aProjectModelwithuses_modules=true/modulespopulated and assert output paths (including partitions) and prompt headers ("## Module:") are correct.
src/extract/extract.cppm:300- Module detection now populates
ProjectModel::uses_modulesandProjectModel::modules, but the current extract tests appear to only cover header/source workflows. Consider adding an integration/unit test that feeds a small module interface unit (export module ...; import :partition;) and assertsuses_modules==true, module names, and import lists are captured/normalized as expected.
src/extract/scan.cppm:13 - This module uses
std::make_unique/std::unique_ptr(e.g., when wiring PPCallbacks and actions), but<memory>is not included. Add#include <memory>to avoid relying on transitive includes from Clang headers.
src/extract/ast.cppm:9 - This module uses
std::make_unique(andstd::unique_ptrindirectly) but does not include<memory>. Add#include <memory>to make the dependency explicit and avoid relying on transitive includes from Clang headers.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| #include <expected> | ||
| #include <filesystem> | ||
| #include <format> | ||
| #include <string> | ||
| #include <string_view> | ||
| #include <vector> |
There was a problem hiding this comment.
std::erase_if is used below but <algorithm> is not included in this module unit. This will fail to compile on standard libraries that don't include it transitively; add #include <algorithm> (or stop using std::erase_if).
| #include <cstdint> | ||
| #include <functional> | ||
| #include <string> | ||
| #include <string_view> | ||
|
|
There was a problem hiding this comment.
This file defines a defaulted three-way comparison (operator<=>) for SymbolID but does not include <compare>, which declares the comparison category types used by built-in <=>. Add #include <compare> to avoid non-portable reliance on transitive includes.
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (5)
src/extract/compdb.cppm (1)
91-105: Consider normalizing both paths consistently for robust matching.The
lookupfunction normalizes both paths withlexically_normal(), which is good. However, ifentry.fileis a relative path andfileis absolute (or vice versa), the comparison may fail unexpectedly. This could be intentional if all entries are expected to be normalized upstream.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/extract/compdb.cppm` around lines 91 - 105, The path comparison in lookup(CompilationDatabase) can fail when one side is relative and the other absolute; change the normalization to produce absolute (or canonical if you want symlink resolution) paths before comparing: for both the input file and each entry.file call std::filesystem::absolute(...) (or std::filesystem::canonical(...) if resolving symlinks) and then .lexically_normal(), then compare those resulting paths (references: lookup, CompilationDatabase, CompileEntry, db.entries, entry.file).src/extract/ast.cppm (1)
117-131: Minor: potential issue with offset calculation for source snippets.At line 128,
end_offset > buffer.size()is checked, butbegin_offsetisn't validated againstbuffer.size(). Ifbegin_offsetsomehow exceedsbuffer.size(), thesubstrcall could still be problematic. Consider validating both offsets.🛡️ Defensive bounds check
if(end_offset > buffer.size()) return ""; + if(begin_offset > buffer.size()) return ""; return std::string(buffer.substr(begin_offset, end_offset - begin_offset));🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/extract/ast.cppm` around lines 117 - 131, Validate and clamp both begin_offset and end_offset before using substr: check begin_offset against buffer.size() (and return empty string or adjust to buffer.size()), ensure begin_offset <= end_offset after applying max_bytes, and also clamp end_offset to buffer.size(); update the logic around sm.getFileOffset(range.getBegin()), sm.getFileOffset(range.getEnd()), and the subsequent bounds checks so that begin_offset and end_offset are both within [0, buffer.size()] and begin_offset <= end_offset before calling buffer.substr.src/extract/scan.cppm (1)
239-244: Silent failure when file cannot be opened.If
ifs.is_open()returns false (file doesn't exist or can't be read), the function silently proceeds without module declaration info. While the subsequent preprocessor pass may still work, this could lead to missing module metadata. Consider logging a warning.📝 Add warning for file open failure
std::ifstream ifs(file_path); if(ifs.is_open()) { std::string content((std::istreambuf_iterator<char>(ifs)), std::istreambuf_iterator<char>()); scan_module_decl(content, result); + } else { + logging::warn("unable to read file for module scan: {}", file_path.string()); } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/extract/scan.cppm` around lines 239 - 244, When opening the file with std::ifstream ifs(file_path) before calling scan_module_decl(content, result), add handling for the failure case: if ifs.is_open() is false, emit a warning log mentioning file_path (and optionally strerror(errno) or std::error_code) so missing/unreadable files don’t fail silently; keep the existing behavior otherwise. Update the block around std::ifstream ifs, the if(ifs.is_open()) branch, and the else branch to call the project logger or a warning helper so callers can see which file could not be opened.src/generate/llm.cppm (2)
171-181: Usellvm::jsoninstead of hand-serializing the request body.This module already depends on
llvm/Support/JSON.h, but Lines 177-181 still build JSON viastd::formatplus a custom escaper. That duplicates escaping logic and makes request-shape changes easier to break than constructing the payload withllvm::json::Object/Array.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/generate/llm.cppm` around lines 171 - 181, The build_request_json function currently hand-serializes JSON using std::format and escape_json_string; replace that with llvm::json types to avoid manual escaping and make payload mutations safer: construct an llvm::json::Object with keys "model" and "messages" (the latter an llvm::json::Array containing two Objects with "role"/"content"), populate the system_msg string into the system message object and the prompt into the user message object, then stringify via llvm::json::Value(...).toString() (or llvm::json::toJSON) and return that string; update references to escape_json_string and remove the std::format usage in build_request_json.
31-37: Keepdetailout of the exported module surface.Because
src/generate/generate.cppm:19-27re-exports:llm, the declarations in Lines 31-37 become publicclore.generateAPI. That makesbuild_request_jsonandparse_responsemuch harder to change later even though they read like internal helpers. Prefer keeping onlycall_llmin the interface and movingdetailto a non-exported namespace.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/generate/llm.cppm` around lines 31 - 37, The declarations build_request_json and parse_response are currently in the exported detail namespace and must be hidden; remove or relocate the detail namespace from the exported module interface and leave only call_llm exposed. Concretely, move the detail::build_request_json and detail::parse_response declarations (and their definitions if present in the same module) into a non-exported/internal translation unit or an internal (unnamed or private) namespace within the implementation file, and update call_llm to call those now-internal helpers; ensure the exported module interface (the :llm export) only declares the call_llm symbol.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/generate-docs.yml:
- Around line 60-61: The "Stop sccache" step (named "Stop sccache") runs only on
success and can be skipped on failures, leaving the sccache server running;
update that step to always run by adding if: always() and make it resilient by
adding continue-on-error: true so cleanup never blocks job results—modify the
"Stop sccache" job step to include those two fields.
In `@cmake/toolchain.cmake`:
- Around line 6-10: The cached scanner path CMAKE_CXX_COMPILER_CLANG_SCAN_DEPS
can persist when find_program("clang-scan-deps") fails, causing broken module
scanning; update the logic around CLANG_SCAN_DEPS_PATH and
CMAKE_CXX_COMPILER_CLANG_SCAN_DEPS so that when find_program does not find
clang-scan-deps you clear the cached variable (e.g., unset the
CMAKE_CXX_COMPILER_CLANG_SCAN_DEPS cache entry or reset it to an empty CACHE
FILEPATH) instead of leaving the stale value, keeping references to the symbols
CLANG_SCAN_DEPS_PATH, find_program, and CMAKE_CXX_COMPILER_CLANG_SCAN_DEPS to
locate the change.
In `@src/generate/llm.cppm`:
- Around line 296-325: The curl invocation built in arg_storage (used to
populate args and passed into llvm::sys::ExecuteAndWait) lacks timeouts and the
--fail-with-body flag, so stalled connections hang and HTTP 4xx/5xx can appear
as success; update arg_storage to include "--fail-with-body" and sensible
timeout flags (for example "--max-time" with a duration and "--connect-timeout"
with a shorter duration) before calling ExecuteAndWait, keeping curl_path, args,
redirects, err_msg and execution_failed usage unchanged so HTTP errors surface
via nonzero exit codes and ExecuteAndWait will time out instead of hanging.
---
Nitpick comments:
In `@src/extract/ast.cppm`:
- Around line 117-131: Validate and clamp both begin_offset and end_offset
before using substr: check begin_offset against buffer.size() (and return empty
string or adjust to buffer.size()), ensure begin_offset <= end_offset after
applying max_bytes, and also clamp end_offset to buffer.size(); update the logic
around sm.getFileOffset(range.getBegin()), sm.getFileOffset(range.getEnd()), and
the subsequent bounds checks so that begin_offset and end_offset are both within
[0, buffer.size()] and begin_offset <= end_offset before calling buffer.substr.
In `@src/extract/compdb.cppm`:
- Around line 91-105: The path comparison in lookup(CompilationDatabase) can
fail when one side is relative and the other absolute; change the normalization
to produce absolute (or canonical if you want symlink resolution) paths before
comparing: for both the input file and each entry.file call
std::filesystem::absolute(...) (or std::filesystem::canonical(...) if resolving
symlinks) and then .lexically_normal(), then compare those resulting paths
(references: lookup, CompilationDatabase, CompileEntry, db.entries, entry.file).
In `@src/extract/scan.cppm`:
- Around line 239-244: When opening the file with std::ifstream ifs(file_path)
before calling scan_module_decl(content, result), add handling for the failure
case: if ifs.is_open() is false, emit a warning log mentioning file_path (and
optionally strerror(errno) or std::error_code) so missing/unreadable files don’t
fail silently; keep the existing behavior otherwise. Update the block around
std::ifstream ifs, the if(ifs.is_open()) branch, and the else branch to call the
project logger or a warning helper so callers can see which file could not be
opened.
In `@src/generate/llm.cppm`:
- Around line 171-181: The build_request_json function currently hand-serializes
JSON using std::format and escape_json_string; replace that with llvm::json
types to avoid manual escaping and make payload mutations safer: construct an
llvm::json::Object with keys "model" and "messages" (the latter an
llvm::json::Array containing two Objects with "role"/"content"), populate the
system_msg string into the system message object and the prompt into the user
message object, then stringify via llvm::json::Value(...).toString() (or
llvm::json::toJSON) and return that string; update references to
escape_json_string and remove the std::format usage in build_request_json.
- Around line 31-37: The declarations build_request_json and parse_response are
currently in the exported detail namespace and must be hidden; remove or
relocate the detail namespace from the exported module interface and leave only
call_llm exposed. Concretely, move the detail::build_request_json and
detail::parse_response declarations (and their definitions if present in the
same module) into a non-exported/internal translation unit or an internal
(unnamed or private) namespace within the implementation file, and update
call_llm to call those now-internal helpers; ensure the exported module
interface (the :llm export) only declares the call_llm symbol.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 59df32d8-0739-4f69-8774-9d71cfc186e7
📒 Files selected for processing (15)
.github/workflows/generate-docs.ymlcmake/toolchain.cmakesrc/CMakeLists.txtsrc/extract/ast.cppmsrc/extract/compdb.cppmsrc/extract/extract.cppmsrc/extract/model.cppmsrc/extract/scan.cppmsrc/extract/symbol.cppmsrc/generate/generate.cppmsrc/generate/llm.cppmsrc/support/logging.cppmtests/unit/extract/extract.cpptests/unit/extract/scan.cpptests/unit/generate/generate.cpp
✅ Files skipped from review due to trivial changes (1)
- src/extract/symbol.cppm
🚧 Files skipped from review as they are similar to previous changes (6)
- tests/unit/extract/scan.cpp
- src/CMakeLists.txt
- src/support/logging.cppm
- src/extract/model.cppm
- tests/unit/extract/extract.cpp
- src/generate/generate.cppm
| - name: Stop sccache | ||
| if: always() | ||
| run: pixi run -- sccache --stop-server 2>$null | ||
| run: sccache --stop-server |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify whether Stop sccache runs unconditionally
rg -n -C2 'name:\s+Stop sccache|if:\s+always\(\)|sccache --stop-server' .github/workflows/generate-docs.ymlRepository: clice-io/clore
Length of output: 161
🏁 Script executed:
cat -n .github/workflows/generate-docs.ymlRepository: clice-io/clore
Length of output: 1927
Add conditional execution to ensure sccache is always shut down.
The "Stop sccache" step only runs when prior steps succeed. If the Build or documentation generation fails, this step is skipped, leaving the sccache server running. Add if: always() to ensure cleanup occurs regardless of job status, and continue-on-error: true to prevent the cleanup failure from failing the job.
Suggested change
- name: Stop sccache
+ if: always()
+ continue-on-error: true
run: sccache --stop-server📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - name: Stop sccache | |
| if: always() | |
| run: pixi run -- sccache --stop-server 2>$null | |
| run: sccache --stop-server | |
| - name: Stop sccache | |
| if: always() | |
| continue-on-error: true | |
| run: sccache --stop-server |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/generate-docs.yml around lines 60 - 61, The "Stop sccache"
step (named "Stop sccache") runs only on success and can be skipped on failures,
leaving the sccache server running; update that step to always run by adding if:
always() and make it resilient by adding continue-on-error: true so cleanup
never blocks job results—modify the "Stop sccache" job step to include those two
fields.
| std::vector<std::string> arg_storage{ | ||
| curl_path, | ||
| "--silent", | ||
| "--show-error", | ||
| "--request", | ||
| "POST", | ||
| url, | ||
| "--config", | ||
| curl_config_path.string(), | ||
| "--data-binary", | ||
| body_arg, | ||
| }; | ||
| llvm::SmallVector<llvm::StringRef, 10> args; | ||
| args.reserve(arg_storage.size()); | ||
| for(auto& arg : arg_storage) { | ||
| args.push_back(arg); | ||
| } | ||
|
|
||
| std::array<std::optional<llvm::StringRef>, 3> redirects{ | ||
| std::nullopt, | ||
| llvm::StringRef(response_path_string), | ||
| llvm::StringRef(stderr_path_string), | ||
| }; | ||
|
|
||
| logging::info("calling LLM: {} model={}", url, model); | ||
|
|
||
| std::string err_msg; | ||
| bool execution_failed = false; | ||
| auto exit_code = llvm::sys::ExecuteAndWait(curl_path, args, std::nullopt, redirects, | ||
| 0, 0, &err_msg, &execution_failed); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, check if the file exists and read the relevant section
head -n 330 src/generate/llm.cppm | tail -n 50Repository: clice-io/clore
Length of output: 1700
🏁 Script executed:
# Check LLVM documentation and ExecuteAndWait usage
rg "ExecuteAndWait" --type cpp -A 5 -B 5 | head -100Repository: clice-io/clore
Length of output: 40
🌐 Web query:
LLVM ExecuteAndWait timeout parameter documentation
💡 Result:
The LLVM ExecuteAndWait function with timeout parameter is llvm::sys::ExecuteAndWait, declared in llvm/include/llvm/Support/Program.h. Signature (from LLVM main, as of recent sources): int llvm::sys::ExecuteAndWait( llvm::StringRef Program, llvm::ArrayRefllvm::StringRef Args, std::optional<llvm::ArrayRefllvm::StringRef> Env = std::nullopt, llvm::ArrayRef<std::optionalllvm::StringRef> Redirects = {}, unsigned SecondsToWait = 0, unsigned MemoryLimit = 0, std::string* ErrMsg = nullptr, bool* ExecutionFailed = nullptr, std::optionalllvm::sys::ProcessStatistics* ProcStat = nullptr, llvm::BitVector* AffinityMask = nullptr ); The SecondsToWait parameter specifies the maximum number of seconds to wait for the child process to exit. If non-zero and the time expires, the child process is killed, and the function returns -2 (indicating timeout or crash). If 0 (default), waits indefinitely until termination. This function launches an external program, waits for it to finish (with optional timeout), and returns the exit code. Use ExecuteNoWait for non-blocking execution. Implementation in llvm/lib/Support/Program.cpp calls Execute to start the process, then Wait(PI, SecondsToWait == 0 ? std::nullopt : std::optional(SecondsToWait), ...). Note: No ExecuteAndWait with timeout exists in LLVM ORC JIT APIs (e.g., ExecutionSession or ExecutorProcessControl).
Citations:
- 1: https://docs.hdoc.io/hdoc/llvm-project/fDC2FF79EAC7F1E2C.html
- 2: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Support/Program.cpp
- 3: https://llvm.org/doxygen/Program_8cpp_source.html
🌐 Web query:
curl --fail-with-body --connect-timeout --max-time flags
💡 Result:
The curl flags --fail-with-body, --connect-timeout, and --max-time are used for error handling and timeouts. --fail-with-body: When an HTTP server returns a non-success response (e.g., 4xx), curl normally outputs the error body but exits with code 0. This flag makes curl exit with error code 22 while still outputting the response body. Mutually exclusive with -f/--fail. Added in curl 7.76.0. Example: curl --fail-with-body https://example.com --connect-timeout : Maximum time allowed to establish a connection (DNS, TCP/TLS/QUIC handshake). Defaults to no timeout (system-dependent, often ~5min). Accepts decimals (e.g., 2.5). Example: curl --connect-timeout 10 https://example.com --max-time / -m: Maximum total time for the entire operation (connect + transfer). Exits with code 28 on timeout. Accepts decimals. Example: curl -m 30 https://example.com These are often combined for robust scripting: curl --fail-with-body --connect-timeout 10 --max-time 30 https://example.com
Citations:
- 1: https://curl.se/docs/manpage.html
- 2: https://curl.se/docs/manpage.html?pubDate=20250316
- 3: https://everything.curl.dev/usingcurl/timeouts.html
- 4: http://curl.se/docs/manpage.html
🏁 Script executed:
# Check for curl version requirements and CMake/build configuration
rg "curl" --type cmake -A 3 -B 3 | head -80Repository: clice-io/clore
Length of output: 40
🏁 Script executed:
# Check for error handling around ExecuteAndWait return value
rg "exit_code" src/generate/llm.cppm -A 10 -B 2Repository: clice-io/clore
Length of output: 1247
🏁 Script executed:
# Check if there's any response body parsing/error handling
rg "stderr" src/generate/llm.cppm -A 5 -B 2Repository: clice-io/clore
Length of output: 2224
Add timeouts and --fail-with-body flag to curl invocation.
The curl subprocess currently has no timeout, allowing stalled connections to hang indefinitely. Additionally, without --fail-with-body, HTTP 4xx/5xx responses will exit with code 0 (success), causing them to bypass the error check at line 318 and surface as a generic JSON parse error instead of the actual HTTP error.
Suggested fixes
std::vector<std::string> arg_storage{
curl_path,
"--silent",
"--show-error",
+ "--fail-with-body",
+ "--connect-timeout",
+ "10",
+ "--max-time",
+ "60",
"--request",
"POST",
url,
"--config",
curl_config_path.string(),
"--data-binary",
body_arg,
}; auto exit_code = llvm::sys::ExecuteAndWait(curl_path, args, std::nullopt, redirects,
- 0, 0, &err_msg, &execution_failed);
+ 70, 0, &err_msg, &execution_failed);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| std::vector<std::string> arg_storage{ | |
| curl_path, | |
| "--silent", | |
| "--show-error", | |
| "--request", | |
| "POST", | |
| url, | |
| "--config", | |
| curl_config_path.string(), | |
| "--data-binary", | |
| body_arg, | |
| }; | |
| llvm::SmallVector<llvm::StringRef, 10> args; | |
| args.reserve(arg_storage.size()); | |
| for(auto& arg : arg_storage) { | |
| args.push_back(arg); | |
| } | |
| std::array<std::optional<llvm::StringRef>, 3> redirects{ | |
| std::nullopt, | |
| llvm::StringRef(response_path_string), | |
| llvm::StringRef(stderr_path_string), | |
| }; | |
| logging::info("calling LLM: {} model={}", url, model); | |
| std::string err_msg; | |
| bool execution_failed = false; | |
| auto exit_code = llvm::sys::ExecuteAndWait(curl_path, args, std::nullopt, redirects, | |
| 0, 0, &err_msg, &execution_failed); | |
| std::vector<std::string> arg_storage{ | |
| curl_path, | |
| "--silent", | |
| "--show-error", | |
| "--fail-with-body", | |
| "--connect-timeout", | |
| "10", | |
| "--max-time", | |
| "60", | |
| "--request", | |
| "POST", | |
| url, | |
| "--config", | |
| curl_config_path.string(), | |
| "--data-binary", | |
| body_arg, | |
| }; | |
| llvm::SmallVector<llvm::StringRef, 10> args; | |
| args.reserve(arg_storage.size()); | |
| for(auto& arg : arg_storage) { | |
| args.push_back(arg); | |
| } | |
| std::array<std::optional<llvm::StringRef>, 3> redirects{ | |
| std::nullopt, | |
| llvm::StringRef(response_path_string), | |
| llvm::StringRef(stderr_path_string), | |
| }; | |
| logging::info("calling LLM: {} model={}", url, model); | |
| std::string err_msg; | |
| bool execution_failed = false; | |
| auto exit_code = llvm::sys::ExecuteAndWait(curl_path, args, std::nullopt, redirects, | |
| 70, 0, &err_msg, &execution_failed); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/generate/llm.cppm` around lines 296 - 325, The curl invocation built in
arg_storage (used to populate args and passed into llvm::sys::ExecuteAndWait)
lacks timeouts and the --fail-with-body flag, so stalled connections hang and
HTTP 4xx/5xx can appear as success; update arg_storage to include
"--fail-with-body" and sensible timeout flags (for example "--max-time" with a
duration and "--connect-timeout" with a shorter duration) before calling
ExecuteAndWait, keeping curl_path, args, redirects, err_msg and execution_failed
usage unchanged so HTTP errors surface via nonzero exit codes and ExecuteAndWait
will time out instead of hanging.
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/extract/scan.cppm (1)
279-305:⚠️ Potential issue | 🟠 MajorNormalize scanned file keys with the compile-command working directory.
This function uses
entry.fileas a lexical path key, but compile databases often store that path relative toentry.directory. The result is thatentry_files,graph.files, andcachecan disagree with the absolute include paths reported by the preprocessor, so dependency edges disappear or unrelated entries collapse together.Suggested fix
for(auto& entry : db.entries) { namespace fs = std::filesystem; - auto normalized = fs::path(entry.file).lexically_normal().string(); + auto normalized = normalize_argument_path(entry.file, entry.directory).string(); entry_files.insert(normalized); if(file_set.insert(normalized).second) graph.files.push_back(normalized); } for(auto& entry : db.entries) { namespace fs = std::filesystem; - auto normalized = fs::path(entry.file).lexically_normal().string(); + auto normalized = normalize_argument_path(entry.file, entry.directory).string(); auto scan_result = scan_file(entry); if(!scan_result.has_value()) { return std::unexpected(std::move(scan_result.error())); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/extract/scan.cppm` around lines 279 - 305, Normalize each entry and included path relative to the compile-command working directory (entry.directory) before inserting into entry_files, graph.files, and cache so keys match preprocessor-reported absolute/relative includes; specifically, replace usages of fs::path(entry.file).lexically_normal() with fs::path(entry.directory) / entry.file then .lexically_normal() (and for includes, if inc.path is relative, resolve it against entry.directory before normalizing) in the loops that populate entry_files/file_set/graph.files, when calling scan_file(entry) results are inserted into cache.emplace(normalized, ...), and when pushing DependencyEdge use the normalized resolved paths for .from and .to.
♻️ Duplicate comments (1)
src/generate/llm.cppm (1)
296-325:⚠️ Potential issue | 🟠 MajorBound the curl subprocess and fail on HTTP status codes.
This is still missing
--fail-with-body,--connect-timeout, and--max-time, and Line 324 still passes0asSecondsToWait. A stalled network call can hang indefinitely, and HTTP 4xx/5xx can still fall through as misleading JSON-parse errors.Suggested fix
std::vector<std::string> arg_storage{ curl_path, "--silent", "--show-error", + "--fail-with-body", + "--connect-timeout", + "10", + "--max-time", + "60", "--request", "POST", url, "--config", curl_config_path.string(), "--data-binary", body_arg, }; @@ - auto exit_code = llvm::sys::ExecuteAndWait(curl_path, args, std::nullopt, redirects, - 0, 0, &err_msg, &execution_failed); + auto exit_code = llvm::sys::ExecuteAndWait(curl_path, args, std::nullopt, redirects, + 70, 0, &err_msg, &execution_failed);Verify that the current invocation still has no HTTP failure flags and no subprocess timeout:
#!/bin/bash sed -n '296,325p' src/generate/llm.cppm🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/generate/llm.cppm` around lines 296 - 325, The curl invocation lacks HTTP-failure flags and a subprocess timeout: update arg_storage (used to build args) to include "--fail-with-body", "--connect-timeout", and "--max-time" with appropriate numeric values (e.g., connect and total timeouts) and ensure these are inserted into the std::vector<std::string> arg_storage before executing; also change the SecondsToWait argument passed to llvm::sys::ExecuteAndWait (currently 0) to a non-zero timeout value (or a named timeout variable) so the subprocess is bounded, and keep references to arg_storage, args, and the call to ExecuteAndWait when making these edits.
🧹 Nitpick comments (2)
src/support/logging.cppm (1)
15-23: Encapsulate exported mutable logging state.
optionsis exported as mutable global state, which makes accidental concurrent writes possible across importers. Consider keeping state private and exposing a small setter/getter API (optionally synchronized) to prevent data-race-prone usage.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/support/logging.cppm` around lines 15 - 23, The exported mutable global options (clore::logging::Options options) must be made non-exported and accessed via a small API: move options into private scope (e.g., anonymous namespace or static inside the translation unit) and provide functions such as clore::logging::setLevel / clore::logging::getLevel or clore::logging::setOptions / clore::logging::getOptions that operate on the hidden Options instance; protect mutations with a std::mutex (or std::atomic for the level field) to avoid data races and ensure thread-safety when callers update or read the logging level.src/generate/generate.cppm (1)
448-485: Stabilize the ready set if generation order needs to stay reproducible.
graph.nodesandin_degreeare unordered maps, so zero-dependency pages are currently queued in hash order. Independent pages can reshuffle between runs, which makes prompt/page ordering unstable even when the graph itself is unchanged.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/generate/generate.cppm` around lines 448 - 485, In sort_page_graph the ready set is non-deterministic because you use an unordered_map and a FIFO queue q to hold zero-degree nodes; replace q with a deterministic container (e.g., a min-heap or ordered set) so nodes are always popped in a stable order: change q to a priority queue or std::set<string> and push initial zero-degree paths from in_degree into it, then when a dependent hits zero (inside the loop where you decrement in_degree[dependent]) push that dependent into the same ordered container; leave generation_order and the rest of the algorithm unchanged so the topological order becomes reproducible across runs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/extract/compdb.cppm`:
- Around line 91-104: lookup currently compares only entry.file which fails for
relative compile_commands.json entries; change the comparison to resolve
entry.file against entry.directory before normalizing. In function lookup (and
types CompilationDatabase / CompileEntry), build entry_path as
std::filesystem::path(entry.directory) / entry.file and call .lexically_normal()
on that result, keep target as std::filesystem::path(file).lexically_normal(),
then compare entry_path == target and push_back(&entry) on match so relative
filenames are resolved against their working directory.
In `@src/generate/generate.cppm`:
- Around line 417-422: The code is adding a reverse dependency for partition
imports by calling add_page_dependency(graph, page_a, main_it->second) when
mod_name contains ':' (symbols: mod_name, colon_pos, main_name, mod_to_page,
add_page_dependency, page_a), which creates cycles; remove or disable that
add_page_dependency call for the partition->main case so only the
main->partition dependency (already created elsewhere) remains and no reverse
edge is introduced.
---
Outside diff comments:
In `@src/extract/scan.cppm`:
- Around line 279-305: Normalize each entry and included path relative to the
compile-command working directory (entry.directory) before inserting into
entry_files, graph.files, and cache so keys match preprocessor-reported
absolute/relative includes; specifically, replace usages of
fs::path(entry.file).lexically_normal() with fs::path(entry.directory) /
entry.file then .lexically_normal() (and for includes, if inc.path is relative,
resolve it against entry.directory before normalizing) in the loops that
populate entry_files/file_set/graph.files, when calling scan_file(entry) results
are inserted into cache.emplace(normalized, ...), and when pushing
DependencyEdge use the normalized resolved paths for .from and .to.
---
Duplicate comments:
In `@src/generate/llm.cppm`:
- Around line 296-325: The curl invocation lacks HTTP-failure flags and a
subprocess timeout: update arg_storage (used to build args) to include
"--fail-with-body", "--connect-timeout", and "--max-time" with appropriate
numeric values (e.g., connect and total timeouts) and ensure these are inserted
into the std::vector<std::string> arg_storage before executing; also change the
SecondsToWait argument passed to llvm::sys::ExecuteAndWait (currently 0) to a
non-zero timeout value (or a named timeout variable) so the subprocess is
bounded, and keep references to arg_storage, args, and the call to
ExecuteAndWait when making these edits.
---
Nitpick comments:
In `@src/generate/generate.cppm`:
- Around line 448-485: In sort_page_graph the ready set is non-deterministic
because you use an unordered_map and a FIFO queue q to hold zero-degree nodes;
replace q with a deterministic container (e.g., a min-heap or ordered set) so
nodes are always popped in a stable order: change q to a priority queue or
std::set<string> and push initial zero-degree paths from in_degree into it, then
when a dependent hits zero (inside the loop where you decrement
in_degree[dependent]) push that dependent into the same ordered container; leave
generation_order and the rest of the algorithm unchanged so the topological
order becomes reproducible across runs.
In `@src/support/logging.cppm`:
- Around line 15-23: The exported mutable global options
(clore::logging::Options options) must be made non-exported and accessed via a
small API: move options into private scope (e.g., anonymous namespace or static
inside the translation unit) and provide functions such as
clore::logging::setLevel / clore::logging::getLevel or
clore::logging::setOptions / clore::logging::getOptions that operate on the
hidden Options instance; protect mutations with a std::mutex (or std::atomic for
the level field) to avoid data races and ensure thread-safety when callers
update or read the logging level.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 729c7da6-6bb7-4e80-a23e-347ae5ac3055
📒 Files selected for processing (27)
.github/workflows/generate-docs.ymlcmake/toolchain.cmakesrc/config/config.cppmsrc/config/load.cppmsrc/config/normalize.cppmsrc/config/schema.cppmsrc/config/validate.cppmsrc/extract/ast.cppmsrc/extract/compdb.cppmsrc/extract/extract.cppmsrc/extract/model.cppmsrc/extract/scan.cppmsrc/extract/symbol.cppmsrc/extract/tooling.cppmsrc/generate/generate.cppmsrc/generate/llm.cppmsrc/main.cppsrc/support/logging.cppmtests/unit/config/config.cpptests/unit/config/validate.cpptests/unit/extract/ast.cpptests/unit/extract/compdb.cpptests/unit/extract/extract.cpptests/unit/extract/scan.cpptests/unit/extract/symbol.cpptests/unit/generate/generate.cpptests/unit/generate/llm.cpp
✅ Files skipped from review due to trivial changes (7)
- tests/unit/config/config.cpp
- src/main.cpp
- tests/unit/extract/ast.cpp
- tests/unit/extract/compdb.cpp
- tests/unit/config/validate.cpp
- tests/unit/generate/llm.cpp
- tests/unit/extract/symbol.cpp
🚧 Files skipped from review as they are similar to previous changes (12)
- .github/workflows/generate-docs.yml
- src/config/config.cppm
- src/config/schema.cppm
- cmake/toolchain.cmake
- src/extract/tooling.cppm
- tests/unit/extract/extract.cpp
- src/config/normalize.cppm
- src/extract/model.cppm
- src/config/validate.cppm
- src/extract/extract.cppm
- tests/unit/generate/generate.cpp
- src/extract/ast.cppm
Summary by CodeRabbit
Build & Infrastructure
New Features
Improvements