Skip to content

refactor: migrate to modules and support module-based document generation#1

Merged
tsurumi-yizhou merged 7 commits intomainfrom
modules
Apr 11, 2026
Merged

refactor: migrate to modules and support module-based document generation#1
tsurumi-yizhou merged 7 commits intomainfrom
modules

Conversation

@tsurumi-yizhou
Copy link
Copy Markdown
Member

@tsurumi-yizhou tsurumi-yizhou commented Apr 11, 2026

Summary by CodeRabbit

  • Build & Infrastructure

    • Pinned clang-tools and enabled parallel builds; improved CMake support for module-aware compilation and dependency scanning.
  • New Features

    • Migrated core subsystems to C++20 module interfaces for module-aware extraction, scanning, and page generation.
    • Added integrated LLM support for prompt-based content generation.
  • Improvements

    • Tests and logging updated to use modules; deterministic generation ordering and clearer module-aware outputs.

Copilot AI review requested due to automatic review settings April 11, 2026 04:58
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 11, 2026

Warning

Rate limit exceeded

@tsurumi-yizhou has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 9 minutes and 8 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 9 minutes and 8 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d9a39ff9-2238-4725-bda2-e3ff7a2b4d51

📥 Commits

Reviewing files that changed from the base of the PR and between 18aed86 and b6c6e63.

📒 Files selected for processing (4)
  • src/extract/compdb.cppm
  • src/generate/generate.cppm
  • tests/unit/extract/compdb.cpp
  • tests/unit/generate/generate.cpp
📝 Walkthrough

Walkthrough

Moves many subsystems from headers to C++20 modules, deletes corresponding headers, updates build/toolchain (pixi.toml, CMake) to enable module scanning and pin clang-tools, adapts main/tests to import modules, and adds module-aware extraction, scanning, compilation-database, generation, and LLM call logic.

Changes

Cohort / File(s) Summary
Build & toolchain
pixi.toml, src/CMakeLists.txt, cmake/toolchain.cmake, CMakeLists.txt
Pinned clang-tools = "==20.1.8"; changed build command to cmake --build ... --parallel; detect/configure clang-scan-deps in toolchain cache; enabled CXX_SCAN_FOR_MODULES and adjusted module-aware target/source handling and LTO/IPO gating.
Config modules
src/config/config.cppm, src/config/load.cppm, src/config/normalize.cppm, src/config/schema.cppm, src/config/validate.cppm
Converted config subsystem to exported C++20 modules; added exported error/result types and API declarations (load/normalize/validate); introduced umbrella export module config; re-exporting submodules.
Config headers removed
src/config/config.h, src/config/load.h, src/config/normalize.h, src/config/schema.h, src/config/validate.h
Deleted legacy headers that previously exposed config APIs (now provided by modules).
Extract modules
src/extract/.../*.cppm (ast.cppm, compdb.cppm, extract.cppm, model.cppm, scan.cppm, symbol.cppm, tooling.cppm)
Added exported extract module and named submodules: AST, compdb, scan, model (module units), symbol, tooling; exported data types and APIs for loading/lookup, scanning, dependency graph, extraction entrypoints, and compiler-instance creation.
Extract headers & legacy units removed
src/extract/*.h, src/extract/compdb.cpp, src/extract/symbol.cpp
Removed legacy headers and two translation units; their functionality moved into .cppm module interfaces/implementations.
Generate modules
src/generate/generate.cppm, src/generate/llm.cppm
Converted generation pipeline and LLM integration to exported modules; added module-aware page-graph planning, prompt/page types, LLM request/response builders, and exported generation APIs (build_page_graph, build_prompts, generate_pages, write_*).
Generate headers removed
src/generate/generate.h, src/generate/llm.h
Deleted headers that previously declared generate/LLM APIs (now provided by modules).
Support / logging
src/support/logging.cppm, src/support/logging.cpp
Converted logging into export module support; exported clore::logging namespace and removed macro-based logging API; adjusted logging translation unit include.
Main & tests
src/main.cpp, tests/unit/...
Replaced header includes with import statements in main and tests; updated/added unit tests exercising module scanning, dependency graph/toposort, module-unit detection, page-graph and prompt generation, and LLM helpers.
CI workflow
.github/workflows/generate-docs.yml
Switched Windows runner version, added cached configure-artifacts restore/save, starts sccache server before configure, and changed build invocation to pixi run cmake-build.

Sequence Diagram(s)

sequenceDiagram
  participant Generator as Generator (generate)
  participant FS as Filesystem (temp files)
  participant Shell as Shell / curl
  participant LLM as LLM API

  Note over Generator,FS: call_llm(model,prompt)
  Generator->>FS: create temp files (request, headers, response)
  Generator->>Shell: invoke curl with temp files
  Shell->>LLM: HTTP request (OPENAI_BASE_URL, API key)
  LLM-->>Shell: HTTP response (body / error)
  Shell-->>FS: write response to temp file
  Shell-->>Generator: exit code, stdout/stderr
  Generator->>FS: read response file
  Generator->>Generator: parse JSON -> content or LLMError
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐇 I nibbled old headers, turned them into spree,
Modules now blooming, tidy as can be.
I scanned, I parsed, I called the LLM drum,
The rabbit cheers softly: "Parallel builds — here they come!" 🥕✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 7.69% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main objectives of the pull request: migrating the codebase to C++20 modules and adding support for module-based document generation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch modules

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/config/validate.cppm (1)

33-55: ⚠️ Potential issue | 🟠 Major

Use non-throwing std::error_code overloads for filesystem checks to prevent exceptions from escaping std::expected flow.

The code at lines 33, 38, 48, 52, 61, and 79 uses throwing std::filesystem overloads. Permission-denied or filesystem errors will throw std::filesystem_error, bypassing your std::expected<void, ValidationError> error handling contract. Replace with non-throwing overloads that accept an std::error_code parameter.

Example fix pattern
 namespace clore::config {

 auto validate(const TaskConfig& config) -> std::expected<void, ValidationError> {
     namespace fs = std::filesystem;
+    auto fs_fail = [](std::string_view field, const fs::path& p, const std::error_code& ec) {
+        return std::unexpected(ValidationError{
+            .message = std::format("{} filesystem error for '{}': {}", field, p.string(), ec.message())});
+    };

-    if(!fs::exists(config.compile_commands_path)) {
+    std::error_code ec;
+    if(!fs::exists(config.compile_commands_path, ec)) {
+        if(ec) return fs_fail("compile_commands_path", config.compile_commands_path, ec);
         return std::unexpected(ValidationError{
             .message = std::format("compile_commands_path does not exist: {}",
                                    config.compile_commands_path)});
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/config/validate.cppm` around lines 33 - 55, The filesystem checks in
validate.cppm (the checks using fs::exists, fs::is_regular_file, and
fs::is_directory against config.compile_commands_path and config.project_root)
use throwing overloads; replace them with the non-throwing overloads that take
an std::error_code& (e.g., fs::exists(path, ec), fs::is_regular_file(path, ec),
fs::is_directory(path, ec)), check the error_code after each call, and on error
return std::unexpected(ValidationError{ .message = std::format("...: {} ({})",
path, ec.message()) }) so filesystem exceptions cannot escape the std::expected
flow. Ensure you update all occurrences that validate
config.compile_commands_path and config.project_root accordingly.
src/main.cpp (1)

111-114: ⚠️ Potential issue | 🟠 Major

Validate invalid --log-level values instead of accepting them silently.

At line 112, spdlog::level::from_str maps unknown values to off, which silently disables logging and hides user mistakes.

Suggested fix
 if(opts.log_level.has_value()) {
     auto level = spdlog::level::from_str(*opts.log_level);
+    if(level == spdlog::level::off && *opts.log_level != "off") {
+        clore::logging::err("invalid --log-level: {}", *opts.log_level);
+        return 1;
+    }
     clore::logging::options.level = level;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/main.cpp` around lines 111 - 114, The current code accepts any
--log-level because spdlog::level::from_str maps unknown strings to
spdlog::level::off; update the opts.log_level handling to validate the user
string: call spdlog::level::from_str(*opts.log_level) as before but then check
if the returned level equals spdlog::level::off while the provided string
(case-insensitive) is not "off"; if so, emit a clear error (e.g., processLogger
or std::cerr) indicating an invalid log level and exit with nonzero status,
otherwise assign clore::logging::options.level = level. Ensure you reference
opts.log_level, spdlog::level::from_str, and clore::logging::options.level when
making the change.
🧹 Nitpick comments (5)
src/generate/generate.cppm (3)

414-436: Code duplication with file-based page graph.

The call graph edge building logic (lines 414-436) is nearly identical to lines 318-340 in build_file_page_graph. Consider extracting this to a shared helper function.

♻️ Suggested extraction
auto add_call_graph_edges(PageGraph& graph) -> void {
    std::unordered_map<extract::SymbolID, std::string> sym_to_page;
    for(auto& [page_path, node] : graph.nodes) {
        for(auto& ctx : node.plan.contexts) {
            sym_to_page[ctx.self.id] = page_path;
        }
    }
    for(auto& [page_path, node] : graph.nodes) {
        std::unordered_set<std::string> already;
        for(auto& dep : node.depends_on) already.insert(dep);

        for(auto& ctx : node.plan.contexts) {
            for(auto& callee_id : ctx.self.calls) {
                auto it = sym_to_page.find(callee_id);
                if(it == sym_to_page.end()) continue;
                if(it->second == page_path) continue;
                if(!already.insert(it->second).second) continue;
                node.depends_on.push_back(it->second);
                graph.nodes[it->second].depended_by.push_back(page_path);
                node.plan.linked_pages.push_back(it->second);
            }
        }
    }
}

Then call add_call_graph_edges(graph); at the end of both build_file_page_graph and build_module_page_graph.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/generate/generate.cppm` around lines 414 - 436, Extract the duplicated
call-graph edge construction into a helper (e.g.,
add_call_graph_edges(PageGraph& graph)) and call it from both
build_file_page_graph and build_module_page_graph: move the sym_to_page
construction and the second loop that updates node.depends_on,
graph.nodes[...].depended_by and node.plan.linked_pages into that function, keep
types like extract::SymbolID, PageGraph, node.plan.contexts, ctx.self.calls and
the local "already" set unchanged, and replace the original duplicated blocks
with a single call add_call_graph_edges(graph).

404-411: Partition-to-main module dependency is incomplete.

The code adds the main module to linked_pages but not to depends_on. This means the topological sort won't enforce that main modules are generated before their partitions, potentially affecting cross-reference quality in generated documentation.

♻️ Suggested fix to add dependency edge
         // Partition imports from main module: if this is a partition, depend on the main module
         if(auto colon_pos = mod_name.find(':'); colon_pos != std::string::npos) {
             auto main_name = mod_name.substr(0, colon_pos);
             auto main_it = mod_to_page.find(main_name);
             if(main_it != mod_to_page.end() && main_it->second != page_a) {
+                node_a_it->second.depends_on.push_back(main_it->second);
+                graph.nodes[main_it->second].depended_by.push_back(page_a);
                 node_a_it->second.plan.linked_pages.push_back(main_it->second);
             }
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/generate/generate.cppm` around lines 404 - 411, The partition-to-main
dependency only pushes the main module into node_a_it->second.plan.linked_pages
but misses adding the same main page to the dependency list; update the block
that handles partitions (using mod_name, main_name, mod_to_page, page_a, and
node_a_it) so that when main_it->second is added to
node_a_it->second.plan.linked_pages you also add main_it->second to
node_a_it->second.plan.depends_on (or the equivalent dependency container) to
ensure the topological sort orders the main module before its partition.

695-724: Consider providing a summary of failed page generations.

When some pages fail LLM generation, the code logs individual warnings and continues. Users may not easily notice missing documentation pages. Consider logging a summary at the end (e.g., "Generated X/Y pages successfully, Z failed").

♻️ Optional: Add failure summary
+    std::size_t failed_count = 0;
     for(std::size_t i = 0; i < prompts.size(); ++i) {
         auto& prompt = prompts[i];
         logging::info("generating page {}/{}: {}",
                       i + 1, prompts.size(), prompt.relative_path);

         auto llm_result = call_llm(llm_model, prompt.prompt);
         if(!llm_result.has_value()) {
             logging::warn("LLM call failed for {}: {}", prompt.relative_path,
                           llm_result.error().message);
+            ++failed_count;
             continue;
         }
         // ... rest of loop
     }

+    if(failed_count > 0) {
+        logging::warn("generated {}/{} pages ({} failed)",
+                      pages.size(), prompts.size(), failed_count);
+    }
+
     if(pages.empty()) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/generate/generate.cppm` around lines 695 - 724, Track failures during the
loop that iterates prompts (the for loop that calls call_llm) by incrementing a
counter or collecting failed prompt.relative_path whenever
llm_result.has_value() is false, then after the loop (before the pages.empty()
check and return) emit a concise summary log using logging::info or
logging::warn that states how many pages succeeded (pages.size()), how many
total prompts there were (prompts.size()), and how many failed (or list failed
relative_path values); update references in the generate routine that creates
GeneratedPage and pushes to pages to use this failure counter/list so the final
summary reflects the actual outcomes.
src/extract/compdb.cppm (1)

79-82: Simplify vector population.

The loop can be replaced with direct assignment since CommandLine is a std::vector<std::string>.

♻️ Suggested simplification
         entry.arguments.reserve(cmd.CommandLine.size());
-        for(auto& arg : cmd.CommandLine) {
-            entry.arguments.push_back(arg);
-        }
+        entry.arguments = cmd.CommandLine;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/extract/compdb.cppm` around lines 79 - 82, Replace the manual reserve +
loop that copies cmd.CommandLine into entry.arguments with a direct vector
assignment (e.g., entry.arguments = cmd.CommandLine) or an assign call; remove
the now-unnecessary reserve and loop. This change targets the code that
manipulates entry.arguments and cmd.CommandLine in compdb.cppm to simplify and
make the copy idiomatic.
src/extract/scan.cppm (1)

205-217: Consider adding a debug log when file read fails.

If the file cannot be opened for the fast module scan, the code silently falls through. While the subsequent Clang-based scan will handle the error, a debug log would aid troubleshooting.

♻️ Optional: Add debug logging
         std::ifstream ifs(file_path);
         if(ifs.is_open()) {
             std::string content((std::istreambuf_iterator<char>(ifs)),
                                  std::istreambuf_iterator<char>());
             scan_module_decl(content, result);
+        } else {
+            logging::debug("could not open file for module scan: {}", file_path.string());
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/extract/scan.cppm` around lines 205 - 217, When opening files for the
fast module scan in the block that constructs file_path from
entry.file/entry.directory and before calling scan_module_decl(content, result),
add a debug log when std::ifstream ifs(file_path) fails to open (i.e., else of
ifs.is_open()). Log the attempted file_path and a brief error detail (e.g.,
strerror(errno) or filesystem error) so failures are visible during debugging;
update the same scope that currently calls scan_module_decl to emit this debug
message instead of silently falling through.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/extract/extract.cppm`:
- Around line 509-510: The fallback branch that re-scans files produces a
ScanResult that is never stored back into scan_cache, so
build_module_info(model, scan_cache) never sees those recovered results; update
the fallback re-scan branch to persist the newly produced ScanResult into
scan_cache (e.g., insert/emplace/update scan_cache keyed by the file path)
before continuing so build_module_info will include the recovered module/import
metadata for that file.

In `@src/extract/model.cppm`:
- Around line 74-80: The model currently keys module units by module name
causing later units to overwrite earlier ones; update the data structure and
build_module_info logic so modules are unique per source file or allow multiple
units per module name: modify ProjectModel.modules to key by
ModuleUnit.source_file (or change its value to a collection like
std::vector<ModuleUnit> mapped from the module name) and update
build_module_info and any consumers to use the new keying (look for uses of
ProjectModel.modules and the ModuleUnit struct and adjust creation/population in
build_module_info to insert a distinct ModuleUnit per source_file or push_back
into the module-name collection).

In `@src/support/logging.cppm`:
- Around line 3-8: Add an explicit `#include` <utility> to the existing include
block so that std::forward is provided by the header rather than relying on
transitive includes; locate the include list that currently has <cstdint>,
<format>, <optional>, <string>, <string_view> and insert <utility> there to
cover uses of std::forward in the file.

---

Outside diff comments:
In `@src/config/validate.cppm`:
- Around line 33-55: The filesystem checks in validate.cppm (the checks using
fs::exists, fs::is_regular_file, and fs::is_directory against
config.compile_commands_path and config.project_root) use throwing overloads;
replace them with the non-throwing overloads that take an std::error_code&
(e.g., fs::exists(path, ec), fs::is_regular_file(path, ec),
fs::is_directory(path, ec)), check the error_code after each call, and on error
return std::unexpected(ValidationError{ .message = std::format("...: {} ({})",
path, ec.message()) }) so filesystem exceptions cannot escape the std::expected
flow. Ensure you update all occurrences that validate
config.compile_commands_path and config.project_root accordingly.

In `@src/main.cpp`:
- Around line 111-114: The current code accepts any --log-level because
spdlog::level::from_str maps unknown strings to spdlog::level::off; update the
opts.log_level handling to validate the user string: call
spdlog::level::from_str(*opts.log_level) as before but then check if the
returned level equals spdlog::level::off while the provided string
(case-insensitive) is not "off"; if so, emit a clear error (e.g., processLogger
or std::cerr) indicating an invalid log level and exit with nonzero status,
otherwise assign clore::logging::options.level = level. Ensure you reference
opts.log_level, spdlog::level::from_str, and clore::logging::options.level when
making the change.

---

Nitpick comments:
In `@src/extract/compdb.cppm`:
- Around line 79-82: Replace the manual reserve + loop that copies
cmd.CommandLine into entry.arguments with a direct vector assignment (e.g.,
entry.arguments = cmd.CommandLine) or an assign call; remove the now-unnecessary
reserve and loop. This change targets the code that manipulates entry.arguments
and cmd.CommandLine in compdb.cppm to simplify and make the copy idiomatic.

In `@src/extract/scan.cppm`:
- Around line 205-217: When opening files for the fast module scan in the block
that constructs file_path from entry.file/entry.directory and before calling
scan_module_decl(content, result), add a debug log when std::ifstream
ifs(file_path) fails to open (i.e., else of ifs.is_open()). Log the attempted
file_path and a brief error detail (e.g., strerror(errno) or filesystem error)
so failures are visible during debugging; update the same scope that currently
calls scan_module_decl to emit this debug message instead of silently falling
through.

In `@src/generate/generate.cppm`:
- Around line 414-436: Extract the duplicated call-graph edge construction into
a helper (e.g., add_call_graph_edges(PageGraph& graph)) and call it from both
build_file_page_graph and build_module_page_graph: move the sym_to_page
construction and the second loop that updates node.depends_on,
graph.nodes[...].depended_by and node.plan.linked_pages into that function, keep
types like extract::SymbolID, PageGraph, node.plan.contexts, ctx.self.calls and
the local "already" set unchanged, and replace the original duplicated blocks
with a single call add_call_graph_edges(graph).
- Around line 404-411: The partition-to-main dependency only pushes the main
module into node_a_it->second.plan.linked_pages but misses adding the same main
page to the dependency list; update the block that handles partitions (using
mod_name, main_name, mod_to_page, page_a, and node_a_it) so that when
main_it->second is added to node_a_it->second.plan.linked_pages you also add
main_it->second to node_a_it->second.plan.depends_on (or the equivalent
dependency container) to ensure the topological sort orders the main module
before its partition.
- Around line 695-724: Track failures during the loop that iterates prompts (the
for loop that calls call_llm) by incrementing a counter or collecting failed
prompt.relative_path whenever llm_result.has_value() is false, then after the
loop (before the pages.empty() check and return) emit a concise summary log
using logging::info or logging::warn that states how many pages succeeded
(pages.size()), how many total prompts there were (prompts.size()), and how many
failed (or list failed relative_path values); update references in the generate
routine that creates GeneratedPage and pushes to pages to use this failure
counter/list so the final summary reflects the actual outcomes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3db2a0af-4529-477d-b466-6448972d0ed8

📥 Commits

Reviewing files that changed from the base of the PR and between 85a223a and caec233.

⛔ Files ignored due to path filters (1)
  • pixi.lock is excluded by !**/*.lock
📒 Files selected for processing (42)
  • pixi.toml
  • src/CMakeLists.txt
  • src/config/config.cppm
  • src/config/config.h
  • src/config/load.cppm
  • src/config/load.h
  • src/config/normalize.cppm
  • src/config/normalize.h
  • src/config/schema.cppm
  • src/config/validate.cppm
  • src/config/validate.h
  • src/extract/ast.cppm
  • src/extract/ast.h
  • src/extract/compdb.cpp
  • src/extract/compdb.cppm
  • src/extract/compdb.h
  • src/extract/extract.cppm
  • src/extract/extract.h
  • src/extract/model.cppm
  • src/extract/scan.cppm
  • src/extract/scan.h
  • src/extract/symbol.cpp
  • src/extract/symbol.cppm
  • src/extract/symbol.h
  • src/extract/tooling.cppm
  • src/extract/tooling.h
  • src/generate/generate.cppm
  • src/generate/generate.h
  • src/generate/llm.cppm
  • src/generate/llm.h
  • src/main.cpp
  • src/support/logging.cpp
  • src/support/logging.cppm
  • tests/unit/config/config.cpp
  • tests/unit/config/validate.cpp
  • tests/unit/extract/ast.cpp
  • tests/unit/extract/compdb.cpp
  • tests/unit/extract/extract.cpp
  • tests/unit/extract/scan.cpp
  • tests/unit/extract/symbol.cpp
  • tests/unit/generate/generate.cpp
  • tests/unit/generate/llm.cpp
💤 Files with no reviewable changes (15)
  • src/support/logging.cpp
  • src/config/config.h
  • src/extract/symbol.cpp
  • src/config/normalize.h
  • src/config/validate.h
  • src/extract/tooling.h
  • src/config/load.h
  • src/extract/extract.h
  • src/extract/compdb.h
  • src/extract/symbol.h
  • src/generate/llm.h
  • src/extract/ast.h
  • src/extract/compdb.cpp
  • src/extract/scan.h
  • src/generate/generate.h

Comment thread src/extract/extract.cppm
Comment thread src/extract/model.cppm
Comment thread src/support/logging.cppm
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors the C++ codebase to use C++20 named modules (.cppm) instead of header/source pairs, and extends the documentation generator to support module-based page planning/generation when a project uses modules.

Changes:

  • Migrates config, extract, generate, and support APIs from headers to exported modules, updating app and unit tests to import ....
  • Adds module detection to extraction and introduces a module-based page graph + module-oriented prompt framing in generation.
  • Updates CMake and Pixi environment dependencies to build with module scanning and ensure required Clang tooling is available.

Reviewed changes

Copilot reviewed 42 out of 43 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/unit/generate/llm.cpp Switches unit test include to import clore.generate.
tests/unit/generate/generate.cpp Switches unit tests to module imports for config/extract/generate.
tests/unit/extract/symbol.cpp Switches unit test include to import clore.extract.
tests/unit/extract/scan.cpp Switches unit test include to import clore.extract.
tests/unit/extract/extract.cpp Switches integration/unit tests to module imports.
tests/unit/extract/compdb.cpp Switches unit test include to import clore.extract.
tests/unit/extract/ast.cpp Switches unit test include to import clore.extract.
tests/unit/config/validate.cpp Switches unit test include to import clore.config.
tests/unit/config/config.cpp Switches unit test include to import clore.config.
src/support/logging.cppm Converts logging to exported module (clore.support) and removes legacy logging macros.
src/support/logging.cpp Removes legacy TU that only included the old header.
src/main.cpp Replaces header includes with module imports; adjusts logging setup to use module API.
src/generate/llm.h Deletes header in favor of module interface.
src/generate/llm.cppm Turns LLM client into module partition (clore.generate:llm) and refactors internals slightly.
src/generate/generate.h Deletes header in favor of module interface.
src/generate/generate.cppm Exports clore.generate module and adds module-based page graph + module-specific prompt text.
src/extract/tooling.h Deletes header in favor of module partition.
src/extract/tooling.cppm Turns tooling helper into module partition (clore.extract:tooling).
src/extract/symbol.h Deletes header in favor of module partition.
src/extract/symbol.cppm Adds exported module partition (clore.extract:symbol) defining SymbolKind, SymbolID, and std::hash.
src/extract/symbol.cpp Removes legacy implementation TU.
src/extract/scan.h Deletes header in favor of module partition.
src/extract/scan.cppm Exports scan-related types/APIs as a module partition and adds fast module directive scanning.
src/extract/model.cppm Exports model as a module partition and extends ProjectModel with module-unit metadata.
src/extract/extract.h Deletes header in favor of module interface.
src/extract/extract.cppm Exports clore.extract module and populates module info from scan cache into the model.
src/extract/compdb.h Deletes header in favor of module partition.
src/extract/compdb.cppm Adds exported module partition (clore.extract:compdb) and moves argument-sanitization helpers into it.
src/extract/compdb.cpp Removes legacy implementation TU.
src/extract/ast.h Deletes header in favor of module partition.
src/extract/ast.cppm Turns AST extraction into module partition (clore.extract:ast).
src/config/validate.h Deletes header in favor of module partition.
src/config/validate.cppm Turns validate into module partition (clore.config:validate).
src/config/schema.cppm Turns schema into module partition (clore.config:schema).
src/config/normalize.h Deletes header in favor of module partition.
src/config/normalize.cppm Turns normalize into module partition (clore.config:normalize).
src/config/load.h Deletes header in favor of module partition.
src/config/load.cppm Turns load into module partition (clore.config:load).
src/config/config.h Deletes umbrella header.
src/config/config.cppm Adds umbrella module (clore.config) that re-exports schema/load/normalize/validate.
src/CMakeLists.txt Switches library sources to .cppm module file sets, enables module scanning, and locates clang-scan-deps.
pixi.toml Adds clang-tools dependency (needed for module compilation support tooling).
pixi.lock Updates lockfile to reflect the added clang-tools dependency.
Comments suppressed due to low confidence (10)

src/support/logging.cppm:7

  • This module uses std::forward in the templated logging overloads later in the file, but <utility> is not included in the global module fragment. Add #include <utility> to avoid relying on transitive includes.
    src/extract/scan.cppm:191
  • import directives like import :tooling; are currently recorded as just ":tooling", so build_module_page_graph() won't be able to match them against fully-qualified module names (e.g. clore.extract:tooling). Consider normalizing partition imports here (prefix with the current module's main name) and de-duplicating module_imports to avoid repeated edges later.
    src/generate/generate.cppm:409
  • The comment says partitions should "depend on the main module", but the code only adds the main module to linked_pages (no depends_on / depended_by edge). If generation order matters here, add the corresponding dependency edge so sort_page_graph() can enforce it.
    src/generate/generate.cppm:401
  • depends_on edges added from mod_unit.imports are not de-duplicated. If a module imports the same dependency more than once, sort_page_graph() will over-count in-degrees (it uses depends_on.size()), which can produce incorrect generation order / false cycle detection. Consider guarding with a set (or de-duping mod_unit.imports) before pushing.
    src/generate/llm.cppm:188
  • The request body is written to a predictable temp path (.../clore_llm_request.json). This can collide across concurrent runs and is susceptible to symlink/hardlink attacks on multi-user systems. Use a unique temp file (e.g., mkstemp/std::tmpnam-avoidant approach or a random filename) and consider opening with exclusive-create semantics.
    src/generate/llm.cppm:204
  • The curl command embeds the API key directly into a shell command string and is executed via popen(). This risks leaking the key via process listings / logs and also allows shell injection if the env-provided URL/key contain special characters. Prefer a non-shell execution path (argv-based process spawn) or a proper HTTP client library, and avoid putting secrets on the command line.
    src/generate/generate.cppm:641
  • build_page_graph() now has a module-based path (model.uses_modules) and module-specific prompt framing, but existing unit tests only cover the file-based flow. Add tests that build a ProjectModel with uses_modules=true/modules populated and assert output paths (including partitions) and prompt headers ("## Module:") are correct.
    src/extract/extract.cppm:300
  • Module detection now populates ProjectModel::uses_modules and ProjectModel::modules, but the current extract tests appear to only cover header/source workflows. Consider adding an integration/unit test that feeds a small module interface unit (export module ...; import :partition;) and asserts uses_modules==true, module names, and import lists are captured/normalized as expected.
    src/extract/scan.cppm:13
  • This module uses std::make_unique/std::unique_ptr (e.g., when wiring PPCallbacks and actions), but <memory> is not included. Add #include <memory> to avoid relying on transitive includes from Clang headers.
    src/extract/ast.cppm:9
  • This module uses std::make_unique (and std::unique_ptr indirectly) but does not include <memory>. Add #include <memory> to make the dependency explicit and avoid relying on transitive includes from Clang headers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/extract/compdb.cppm
Comment on lines +3 to +8
#include <expected>
#include <filesystem>
#include <format>
#include <string>
#include <string_view>
#include <vector>
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::erase_if is used below but <algorithm> is not included in this module unit. This will fail to compile on standard libraries that don't include it transitively; add #include <algorithm> (or stop using std::erase_if).

Copilot uses AI. Check for mistakes.
Comment thread src/extract/symbol.cppm
Comment on lines +3 to +7
#include <cstdint>
#include <functional>
#include <string>
#include <string_view>

Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file defines a defaulted three-way comparison (operator<=>) for SymbolID but does not include <compare>, which declares the comparison category types used by built-in <=>. Add #include <compare> to avoid non-portable reliance on transitive includes.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (5)
src/extract/compdb.cppm (1)

91-105: Consider normalizing both paths consistently for robust matching.

The lookup function normalizes both paths with lexically_normal(), which is good. However, if entry.file is a relative path and file is absolute (or vice versa), the comparison may fail unexpectedly. This could be intentional if all entries are expected to be normalized upstream.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/extract/compdb.cppm` around lines 91 - 105, The path comparison in
lookup(CompilationDatabase) can fail when one side is relative and the other
absolute; change the normalization to produce absolute (or canonical if you want
symlink resolution) paths before comparing: for both the input file and each
entry.file call std::filesystem::absolute(...) (or
std::filesystem::canonical(...) if resolving symlinks) and then
.lexically_normal(), then compare those resulting paths (references: lookup,
CompilationDatabase, CompileEntry, db.entries, entry.file).
src/extract/ast.cppm (1)

117-131: Minor: potential issue with offset calculation for source snippets.

At line 128, end_offset > buffer.size() is checked, but begin_offset isn't validated against buffer.size(). If begin_offset somehow exceeds buffer.size(), the substr call could still be problematic. Consider validating both offsets.

🛡️ Defensive bounds check
     if(end_offset > buffer.size()) return "";
+    if(begin_offset > buffer.size()) return "";
 
     return std::string(buffer.substr(begin_offset, end_offset - begin_offset));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/extract/ast.cppm` around lines 117 - 131, Validate and clamp both
begin_offset and end_offset before using substr: check begin_offset against
buffer.size() (and return empty string or adjust to buffer.size()), ensure
begin_offset <= end_offset after applying max_bytes, and also clamp end_offset
to buffer.size(); update the logic around sm.getFileOffset(range.getBegin()),
sm.getFileOffset(range.getEnd()), and the subsequent bounds checks so that
begin_offset and end_offset are both within [0, buffer.size()] and begin_offset
<= end_offset before calling buffer.substr.
src/extract/scan.cppm (1)

239-244: Silent failure when file cannot be opened.

If ifs.is_open() returns false (file doesn't exist or can't be read), the function silently proceeds without module declaration info. While the subsequent preprocessor pass may still work, this could lead to missing module metadata. Consider logging a warning.

📝 Add warning for file open failure
         std::ifstream ifs(file_path);
         if(ifs.is_open()) {
             std::string content((std::istreambuf_iterator<char>(ifs)),
                                  std::istreambuf_iterator<char>());
             scan_module_decl(content, result);
+        } else {
+            logging::warn("unable to read file for module scan: {}", file_path.string());
         }
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/extract/scan.cppm` around lines 239 - 244, When opening the file with
std::ifstream ifs(file_path) before calling scan_module_decl(content, result),
add handling for the failure case: if ifs.is_open() is false, emit a warning log
mentioning file_path (and optionally strerror(errno) or std::error_code) so
missing/unreadable files don’t fail silently; keep the existing behavior
otherwise. Update the block around std::ifstream ifs, the if(ifs.is_open())
branch, and the else branch to call the project logger or a warning helper so
callers can see which file could not be opened.
src/generate/llm.cppm (2)

171-181: Use llvm::json instead of hand-serializing the request body.

This module already depends on llvm/Support/JSON.h, but Lines 177-181 still build JSON via std::format plus a custom escaper. That duplicates escaping logic and makes request-shape changes easier to break than constructing the payload with llvm::json::Object/Array.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/generate/llm.cppm` around lines 171 - 181, The build_request_json
function currently hand-serializes JSON using std::format and
escape_json_string; replace that with llvm::json types to avoid manual escaping
and make payload mutations safer: construct an llvm::json::Object with keys
"model" and "messages" (the latter an llvm::json::Array containing two Objects
with "role"/"content"), populate the system_msg string into the system message
object and the prompt into the user message object, then stringify via
llvm::json::Value(...).toString() (or llvm::json::toJSON) and return that
string; update references to escape_json_string and remove the std::format usage
in build_request_json.

31-37: Keep detail out of the exported module surface.

Because src/generate/generate.cppm:19-27 re-exports :llm, the declarations in Lines 31-37 become public clore.generate API. That makes build_request_json and parse_response much harder to change later even though they read like internal helpers. Prefer keeping only call_llm in the interface and moving detail to a non-exported namespace.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/generate/llm.cppm` around lines 31 - 37, The declarations
build_request_json and parse_response are currently in the exported detail
namespace and must be hidden; remove or relocate the detail namespace from the
exported module interface and leave only call_llm exposed. Concretely, move the
detail::build_request_json and detail::parse_response declarations (and their
definitions if present in the same module) into a non-exported/internal
translation unit or an internal (unnamed or private) namespace within the
implementation file, and update call_llm to call those now-internal helpers;
ensure the exported module interface (the :llm export) only declares the
call_llm symbol.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/generate-docs.yml:
- Around line 60-61: The "Stop sccache" step (named "Stop sccache") runs only on
success and can be skipped on failures, leaving the sccache server running;
update that step to always run by adding if: always() and make it resilient by
adding continue-on-error: true so cleanup never blocks job results—modify the
"Stop sccache" job step to include those two fields.

In `@cmake/toolchain.cmake`:
- Around line 6-10: The cached scanner path CMAKE_CXX_COMPILER_CLANG_SCAN_DEPS
can persist when find_program("clang-scan-deps") fails, causing broken module
scanning; update the logic around CLANG_SCAN_DEPS_PATH and
CMAKE_CXX_COMPILER_CLANG_SCAN_DEPS so that when find_program does not find
clang-scan-deps you clear the cached variable (e.g., unset the
CMAKE_CXX_COMPILER_CLANG_SCAN_DEPS cache entry or reset it to an empty CACHE
FILEPATH) instead of leaving the stale value, keeping references to the symbols
CLANG_SCAN_DEPS_PATH, find_program, and CMAKE_CXX_COMPILER_CLANG_SCAN_DEPS to
locate the change.

In `@src/generate/llm.cppm`:
- Around line 296-325: The curl invocation built in arg_storage (used to
populate args and passed into llvm::sys::ExecuteAndWait) lacks timeouts and the
--fail-with-body flag, so stalled connections hang and HTTP 4xx/5xx can appear
as success; update arg_storage to include "--fail-with-body" and sensible
timeout flags (for example "--max-time" with a duration and "--connect-timeout"
with a shorter duration) before calling ExecuteAndWait, keeping curl_path, args,
redirects, err_msg and execution_failed usage unchanged so HTTP errors surface
via nonzero exit codes and ExecuteAndWait will time out instead of hanging.

---

Nitpick comments:
In `@src/extract/ast.cppm`:
- Around line 117-131: Validate and clamp both begin_offset and end_offset
before using substr: check begin_offset against buffer.size() (and return empty
string or adjust to buffer.size()), ensure begin_offset <= end_offset after
applying max_bytes, and also clamp end_offset to buffer.size(); update the logic
around sm.getFileOffset(range.getBegin()), sm.getFileOffset(range.getEnd()), and
the subsequent bounds checks so that begin_offset and end_offset are both within
[0, buffer.size()] and begin_offset <= end_offset before calling buffer.substr.

In `@src/extract/compdb.cppm`:
- Around line 91-105: The path comparison in lookup(CompilationDatabase) can
fail when one side is relative and the other absolute; change the normalization
to produce absolute (or canonical if you want symlink resolution) paths before
comparing: for both the input file and each entry.file call
std::filesystem::absolute(...) (or std::filesystem::canonical(...) if resolving
symlinks) and then .lexically_normal(), then compare those resulting paths
(references: lookup, CompilationDatabase, CompileEntry, db.entries, entry.file).

In `@src/extract/scan.cppm`:
- Around line 239-244: When opening the file with std::ifstream ifs(file_path)
before calling scan_module_decl(content, result), add handling for the failure
case: if ifs.is_open() is false, emit a warning log mentioning file_path (and
optionally strerror(errno) or std::error_code) so missing/unreadable files don’t
fail silently; keep the existing behavior otherwise. Update the block around
std::ifstream ifs, the if(ifs.is_open()) branch, and the else branch to call the
project logger or a warning helper so callers can see which file could not be
opened.

In `@src/generate/llm.cppm`:
- Around line 171-181: The build_request_json function currently hand-serializes
JSON using std::format and escape_json_string; replace that with llvm::json
types to avoid manual escaping and make payload mutations safer: construct an
llvm::json::Object with keys "model" and "messages" (the latter an
llvm::json::Array containing two Objects with "role"/"content"), populate the
system_msg string into the system message object and the prompt into the user
message object, then stringify via llvm::json::Value(...).toString() (or
llvm::json::toJSON) and return that string; update references to
escape_json_string and remove the std::format usage in build_request_json.
- Around line 31-37: The declarations build_request_json and parse_response are
currently in the exported detail namespace and must be hidden; remove or
relocate the detail namespace from the exported module interface and leave only
call_llm exposed. Concretely, move the detail::build_request_json and
detail::parse_response declarations (and their definitions if present in the
same module) into a non-exported/internal translation unit or an internal
(unnamed or private) namespace within the implementation file, and update
call_llm to call those now-internal helpers; ensure the exported module
interface (the :llm export) only declares the call_llm symbol.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 59df32d8-0739-4f69-8774-9d71cfc186e7

📥 Commits

Reviewing files that changed from the base of the PR and between caec233 and bc32261.

📒 Files selected for processing (15)
  • .github/workflows/generate-docs.yml
  • cmake/toolchain.cmake
  • src/CMakeLists.txt
  • src/extract/ast.cppm
  • src/extract/compdb.cppm
  • src/extract/extract.cppm
  • src/extract/model.cppm
  • src/extract/scan.cppm
  • src/extract/symbol.cppm
  • src/generate/generate.cppm
  • src/generate/llm.cppm
  • src/support/logging.cppm
  • tests/unit/extract/extract.cpp
  • tests/unit/extract/scan.cpp
  • tests/unit/generate/generate.cpp
✅ Files skipped from review due to trivial changes (1)
  • src/extract/symbol.cppm
🚧 Files skipped from review as they are similar to previous changes (6)
  • tests/unit/extract/scan.cpp
  • src/CMakeLists.txt
  • src/support/logging.cppm
  • src/extract/model.cppm
  • tests/unit/extract/extract.cpp
  • src/generate/generate.cppm

Comment thread .github/workflows/generate-docs.yml Outdated
Comment on lines +60 to +61
- name: Stop sccache
if: always()
run: pixi run -- sccache --stop-server 2>$null
run: sccache --stop-server
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify whether Stop sccache runs unconditionally
rg -n -C2 'name:\s+Stop sccache|if:\s+always\(\)|sccache --stop-server' .github/workflows/generate-docs.yml

Repository: clice-io/clore

Length of output: 161


🏁 Script executed:

cat -n .github/workflows/generate-docs.yml

Repository: clice-io/clore

Length of output: 1927


Add conditional execution to ensure sccache is always shut down.

The "Stop sccache" step only runs when prior steps succeed. If the Build or documentation generation fails, this step is skipped, leaving the sccache server running. Add if: always() to ensure cleanup occurs regardless of job status, and continue-on-error: true to prevent the cleanup failure from failing the job.

Suggested change
       - name: Stop sccache
+        if: always()
+        continue-on-error: true
         run: sccache --stop-server
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- name: Stop sccache
if: always()
run: pixi run -- sccache --stop-server 2>$null
run: sccache --stop-server
- name: Stop sccache
if: always()
continue-on-error: true
run: sccache --stop-server
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/generate-docs.yml around lines 60 - 61, The "Stop sccache"
step (named "Stop sccache") runs only on success and can be skipped on failures,
leaving the sccache server running; update that step to always run by adding if:
always() and make it resilient by adding continue-on-error: true so cleanup
never blocks job results—modify the "Stop sccache" job step to include those two
fields.

Comment thread cmake/toolchain.cmake Outdated
Comment thread src/generate/llm.cppm
Comment on lines +296 to +325
std::vector<std::string> arg_storage{
curl_path,
"--silent",
"--show-error",
"--request",
"POST",
url,
"--config",
curl_config_path.string(),
"--data-binary",
body_arg,
};
llvm::SmallVector<llvm::StringRef, 10> args;
args.reserve(arg_storage.size());
for(auto& arg : arg_storage) {
args.push_back(arg);
}

std::array<std::optional<llvm::StringRef>, 3> redirects{
std::nullopt,
llvm::StringRef(response_path_string),
llvm::StringRef(stderr_path_string),
};

logging::info("calling LLM: {} model={}", url, model);

std::string err_msg;
bool execution_failed = false;
auto exit_code = llvm::sys::ExecuteAndWait(curl_path, args, std::nullopt, redirects,
0, 0, &err_msg, &execution_failed);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, check if the file exists and read the relevant section
head -n 330 src/generate/llm.cppm | tail -n 50

Repository: clice-io/clore

Length of output: 1700


🏁 Script executed:

# Check LLVM documentation and ExecuteAndWait usage
rg "ExecuteAndWait" --type cpp -A 5 -B 5 | head -100

Repository: clice-io/clore

Length of output: 40


🌐 Web query:

LLVM ExecuteAndWait timeout parameter documentation

💡 Result:

The LLVM ExecuteAndWait function with timeout parameter is llvm::sys::ExecuteAndWait, declared in llvm/include/llvm/Support/Program.h. Signature (from LLVM main, as of recent sources): int llvm::sys::ExecuteAndWait( llvm::StringRef Program, llvm::ArrayRefllvm::StringRef Args, std::optional<llvm::ArrayRefllvm::StringRef> Env = std::nullopt, llvm::ArrayRef<std::optionalllvm::StringRef> Redirects = {}, unsigned SecondsToWait = 0, unsigned MemoryLimit = 0, std::string* ErrMsg = nullptr, bool* ExecutionFailed = nullptr, std::optionalllvm::sys::ProcessStatistics* ProcStat = nullptr, llvm::BitVector* AffinityMask = nullptr ); The SecondsToWait parameter specifies the maximum number of seconds to wait for the child process to exit. If non-zero and the time expires, the child process is killed, and the function returns -2 (indicating timeout or crash). If 0 (default), waits indefinitely until termination. This function launches an external program, waits for it to finish (with optional timeout), and returns the exit code. Use ExecuteNoWait for non-blocking execution. Implementation in llvm/lib/Support/Program.cpp calls Execute to start the process, then Wait(PI, SecondsToWait == 0 ? std::nullopt : std::optional(SecondsToWait), ...). Note: No ExecuteAndWait with timeout exists in LLVM ORC JIT APIs (e.g., ExecutionSession or ExecutorProcessControl).

Citations:


🌐 Web query:

curl --fail-with-body --connect-timeout --max-time flags

💡 Result:

The curl flags --fail-with-body, --connect-timeout, and --max-time are used for error handling and timeouts. --fail-with-body: When an HTTP server returns a non-success response (e.g., 4xx), curl normally outputs the error body but exits with code 0. This flag makes curl exit with error code 22 while still outputting the response body. Mutually exclusive with -f/--fail. Added in curl 7.76.0. Example: curl --fail-with-body https://example.com --connect-timeout : Maximum time allowed to establish a connection (DNS, TCP/TLS/QUIC handshake). Defaults to no timeout (system-dependent, often ~5min). Accepts decimals (e.g., 2.5). Example: curl --connect-timeout 10 https://example.com --max-time / -m: Maximum total time for the entire operation (connect + transfer). Exits with code 28 on timeout. Accepts decimals. Example: curl -m 30 https://example.com These are often combined for robust scripting: curl --fail-with-body --connect-timeout 10 --max-time 30 https://example.com

Citations:


🏁 Script executed:

# Check for curl version requirements and CMake/build configuration
rg "curl" --type cmake -A 3 -B 3 | head -80

Repository: clice-io/clore

Length of output: 40


🏁 Script executed:

# Check for error handling around ExecuteAndWait return value
rg "exit_code" src/generate/llm.cppm -A 10 -B 2

Repository: clice-io/clore

Length of output: 1247


🏁 Script executed:

# Check if there's any response body parsing/error handling
rg "stderr" src/generate/llm.cppm -A 5 -B 2

Repository: clice-io/clore

Length of output: 2224


Add timeouts and --fail-with-body flag to curl invocation.

The curl subprocess currently has no timeout, allowing stalled connections to hang indefinitely. Additionally, without --fail-with-body, HTTP 4xx/5xx responses will exit with code 0 (success), causing them to bypass the error check at line 318 and surface as a generic JSON parse error instead of the actual HTTP error.

Suggested fixes
     std::vector<std::string> arg_storage{
         curl_path,
         "--silent",
         "--show-error",
+        "--fail-with-body",
+        "--connect-timeout",
+        "10",
+        "--max-time",
+        "60",
         "--request",
         "POST",
         url,
         "--config",
         curl_config_path.string(),
         "--data-binary",
         body_arg,
     };
     auto exit_code = llvm::sys::ExecuteAndWait(curl_path, args, std::nullopt, redirects,
-                                               0, 0, &err_msg, &execution_failed);
+                                               70, 0, &err_msg, &execution_failed);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
std::vector<std::string> arg_storage{
curl_path,
"--silent",
"--show-error",
"--request",
"POST",
url,
"--config",
curl_config_path.string(),
"--data-binary",
body_arg,
};
llvm::SmallVector<llvm::StringRef, 10> args;
args.reserve(arg_storage.size());
for(auto& arg : arg_storage) {
args.push_back(arg);
}
std::array<std::optional<llvm::StringRef>, 3> redirects{
std::nullopt,
llvm::StringRef(response_path_string),
llvm::StringRef(stderr_path_string),
};
logging::info("calling LLM: {} model={}", url, model);
std::string err_msg;
bool execution_failed = false;
auto exit_code = llvm::sys::ExecuteAndWait(curl_path, args, std::nullopt, redirects,
0, 0, &err_msg, &execution_failed);
std::vector<std::string> arg_storage{
curl_path,
"--silent",
"--show-error",
"--fail-with-body",
"--connect-timeout",
"10",
"--max-time",
"60",
"--request",
"POST",
url,
"--config",
curl_config_path.string(),
"--data-binary",
body_arg,
};
llvm::SmallVector<llvm::StringRef, 10> args;
args.reserve(arg_storage.size());
for(auto& arg : arg_storage) {
args.push_back(arg);
}
std::array<std::optional<llvm::StringRef>, 3> redirects{
std::nullopt,
llvm::StringRef(response_path_string),
llvm::StringRef(stderr_path_string),
};
logging::info("calling LLM: {} model={}", url, model);
std::string err_msg;
bool execution_failed = false;
auto exit_code = llvm::sys::ExecuteAndWait(curl_path, args, std::nullopt, redirects,
70, 0, &err_msg, &execution_failed);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/generate/llm.cppm` around lines 296 - 325, The curl invocation built in
arg_storage (used to populate args and passed into llvm::sys::ExecuteAndWait)
lacks timeouts and the --fail-with-body flag, so stalled connections hang and
HTTP 4xx/5xx can appear as success; update arg_storage to include
"--fail-with-body" and sensible timeout flags (for example "--max-time" with a
duration and "--connect-timeout" with a shorter duration) before calling
ExecuteAndWait, keeping curl_path, args, redirects, err_msg and execution_failed
usage unchanged so HTTP errors surface via nonzero exit codes and ExecuteAndWait
will time out instead of hanging.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/extract/scan.cppm (1)

279-305: ⚠️ Potential issue | 🟠 Major

Normalize scanned file keys with the compile-command working directory.

This function uses entry.file as a lexical path key, but compile databases often store that path relative to entry.directory. The result is that entry_files, graph.files, and cache can disagree with the absolute include paths reported by the preprocessor, so dependency edges disappear or unrelated entries collapse together.

Suggested fix
     for(auto& entry : db.entries) {
         namespace fs = std::filesystem;
-        auto normalized = fs::path(entry.file).lexically_normal().string();
+        auto normalized = normalize_argument_path(entry.file, entry.directory).string();
         entry_files.insert(normalized);
         if(file_set.insert(normalized).second) graph.files.push_back(normalized);
     }
 
     for(auto& entry : db.entries) {
         namespace fs = std::filesystem;
-        auto normalized = fs::path(entry.file).lexically_normal().string();
+        auto normalized = normalize_argument_path(entry.file, entry.directory).string();
 
         auto scan_result = scan_file(entry);
         if(!scan_result.has_value()) {
             return std::unexpected(std::move(scan_result.error()));
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/extract/scan.cppm` around lines 279 - 305, Normalize each entry and
included path relative to the compile-command working directory
(entry.directory) before inserting into entry_files, graph.files, and cache so
keys match preprocessor-reported absolute/relative includes; specifically,
replace usages of fs::path(entry.file).lexically_normal() with
fs::path(entry.directory) / entry.file then .lexically_normal() (and for
includes, if inc.path is relative, resolve it against entry.directory before
normalizing) in the loops that populate entry_files/file_set/graph.files, when
calling scan_file(entry) results are inserted into cache.emplace(normalized,
...), and when pushing DependencyEdge use the normalized resolved paths for
.from and .to.
♻️ Duplicate comments (1)
src/generate/llm.cppm (1)

296-325: ⚠️ Potential issue | 🟠 Major

Bound the curl subprocess and fail on HTTP status codes.

This is still missing --fail-with-body, --connect-timeout, and --max-time, and Line 324 still passes 0 as SecondsToWait. A stalled network call can hang indefinitely, and HTTP 4xx/5xx can still fall through as misleading JSON-parse errors.

Suggested fix
     std::vector<std::string> arg_storage{
         curl_path,
         "--silent",
         "--show-error",
+        "--fail-with-body",
+        "--connect-timeout",
+        "10",
+        "--max-time",
+        "60",
         "--request",
         "POST",
         url,
         "--config",
         curl_config_path.string(),
         "--data-binary",
         body_arg,
     };
@@
-    auto exit_code = llvm::sys::ExecuteAndWait(curl_path, args, std::nullopt, redirects,
-                                               0, 0, &err_msg, &execution_failed);
+    auto exit_code = llvm::sys::ExecuteAndWait(curl_path, args, std::nullopt, redirects,
+                                               70, 0, &err_msg, &execution_failed);

Verify that the current invocation still has no HTTP failure flags and no subprocess timeout:

#!/bin/bash
sed -n '296,325p' src/generate/llm.cppm
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/generate/llm.cppm` around lines 296 - 325, The curl invocation lacks
HTTP-failure flags and a subprocess timeout: update arg_storage (used to build
args) to include "--fail-with-body", "--connect-timeout", and "--max-time" with
appropriate numeric values (e.g., connect and total timeouts) and ensure these
are inserted into the std::vector<std::string> arg_storage before executing;
also change the SecondsToWait argument passed to llvm::sys::ExecuteAndWait
(currently 0) to a non-zero timeout value (or a named timeout variable) so the
subprocess is bounded, and keep references to arg_storage, args, and the call to
ExecuteAndWait when making these edits.
🧹 Nitpick comments (2)
src/support/logging.cppm (1)

15-23: Encapsulate exported mutable logging state.

options is exported as mutable global state, which makes accidental concurrent writes possible across importers. Consider keeping state private and exposing a small setter/getter API (optionally synchronized) to prevent data-race-prone usage.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/support/logging.cppm` around lines 15 - 23, The exported mutable global
options (clore::logging::Options options) must be made non-exported and accessed
via a small API: move options into private scope (e.g., anonymous namespace or
static inside the translation unit) and provide functions such as
clore::logging::setLevel / clore::logging::getLevel or
clore::logging::setOptions / clore::logging::getOptions that operate on the
hidden Options instance; protect mutations with a std::mutex (or std::atomic for
the level field) to avoid data races and ensure thread-safety when callers
update or read the logging level.
src/generate/generate.cppm (1)

448-485: Stabilize the ready set if generation order needs to stay reproducible.

graph.nodes and in_degree are unordered maps, so zero-dependency pages are currently queued in hash order. Independent pages can reshuffle between runs, which makes prompt/page ordering unstable even when the graph itself is unchanged.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/generate/generate.cppm` around lines 448 - 485, In sort_page_graph the
ready set is non-deterministic because you use an unordered_map and a FIFO queue
q to hold zero-degree nodes; replace q with a deterministic container (e.g., a
min-heap or ordered set) so nodes are always popped in a stable order: change q
to a priority queue or std::set<string> and push initial zero-degree paths from
in_degree into it, then when a dependent hits zero (inside the loop where you
decrement in_degree[dependent]) push that dependent into the same ordered
container; leave generation_order and the rest of the algorithm unchanged so the
topological order becomes reproducible across runs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/extract/compdb.cppm`:
- Around line 91-104: lookup currently compares only entry.file which fails for
relative compile_commands.json entries; change the comparison to resolve
entry.file against entry.directory before normalizing. In function lookup (and
types CompilationDatabase / CompileEntry), build entry_path as
std::filesystem::path(entry.directory) / entry.file and call .lexically_normal()
on that result, keep target as std::filesystem::path(file).lexically_normal(),
then compare entry_path == target and push_back(&entry) on match so relative
filenames are resolved against their working directory.

In `@src/generate/generate.cppm`:
- Around line 417-422: The code is adding a reverse dependency for partition
imports by calling add_page_dependency(graph, page_a, main_it->second) when
mod_name contains ':' (symbols: mod_name, colon_pos, main_name, mod_to_page,
add_page_dependency, page_a), which creates cycles; remove or disable that
add_page_dependency call for the partition->main case so only the
main->partition dependency (already created elsewhere) remains and no reverse
edge is introduced.

---

Outside diff comments:
In `@src/extract/scan.cppm`:
- Around line 279-305: Normalize each entry and included path relative to the
compile-command working directory (entry.directory) before inserting into
entry_files, graph.files, and cache so keys match preprocessor-reported
absolute/relative includes; specifically, replace usages of
fs::path(entry.file).lexically_normal() with fs::path(entry.directory) /
entry.file then .lexically_normal() (and for includes, if inc.path is relative,
resolve it against entry.directory before normalizing) in the loops that
populate entry_files/file_set/graph.files, when calling scan_file(entry) results
are inserted into cache.emplace(normalized, ...), and when pushing
DependencyEdge use the normalized resolved paths for .from and .to.

---

Duplicate comments:
In `@src/generate/llm.cppm`:
- Around line 296-325: The curl invocation lacks HTTP-failure flags and a
subprocess timeout: update arg_storage (used to build args) to include
"--fail-with-body", "--connect-timeout", and "--max-time" with appropriate
numeric values (e.g., connect and total timeouts) and ensure these are inserted
into the std::vector<std::string> arg_storage before executing; also change the
SecondsToWait argument passed to llvm::sys::ExecuteAndWait (currently 0) to a
non-zero timeout value (or a named timeout variable) so the subprocess is
bounded, and keep references to arg_storage, args, and the call to
ExecuteAndWait when making these edits.

---

Nitpick comments:
In `@src/generate/generate.cppm`:
- Around line 448-485: In sort_page_graph the ready set is non-deterministic
because you use an unordered_map and a FIFO queue q to hold zero-degree nodes;
replace q with a deterministic container (e.g., a min-heap or ordered set) so
nodes are always popped in a stable order: change q to a priority queue or
std::set<string> and push initial zero-degree paths from in_degree into it, then
when a dependent hits zero (inside the loop where you decrement
in_degree[dependent]) push that dependent into the same ordered container; leave
generation_order and the rest of the algorithm unchanged so the topological
order becomes reproducible across runs.

In `@src/support/logging.cppm`:
- Around line 15-23: The exported mutable global options
(clore::logging::Options options) must be made non-exported and accessed via a
small API: move options into private scope (e.g., anonymous namespace or static
inside the translation unit) and provide functions such as
clore::logging::setLevel / clore::logging::getLevel or
clore::logging::setOptions / clore::logging::getOptions that operate on the
hidden Options instance; protect mutations with a std::mutex (or std::atomic for
the level field) to avoid data races and ensure thread-safety when callers
update or read the logging level.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 729c7da6-6bb7-4e80-a23e-347ae5ac3055

📥 Commits

Reviewing files that changed from the base of the PR and between bc32261 and 5d50080.

📒 Files selected for processing (27)
  • .github/workflows/generate-docs.yml
  • cmake/toolchain.cmake
  • src/config/config.cppm
  • src/config/load.cppm
  • src/config/normalize.cppm
  • src/config/schema.cppm
  • src/config/validate.cppm
  • src/extract/ast.cppm
  • src/extract/compdb.cppm
  • src/extract/extract.cppm
  • src/extract/model.cppm
  • src/extract/scan.cppm
  • src/extract/symbol.cppm
  • src/extract/tooling.cppm
  • src/generate/generate.cppm
  • src/generate/llm.cppm
  • src/main.cpp
  • src/support/logging.cppm
  • tests/unit/config/config.cpp
  • tests/unit/config/validate.cpp
  • tests/unit/extract/ast.cpp
  • tests/unit/extract/compdb.cpp
  • tests/unit/extract/extract.cpp
  • tests/unit/extract/scan.cpp
  • tests/unit/extract/symbol.cpp
  • tests/unit/generate/generate.cpp
  • tests/unit/generate/llm.cpp
✅ Files skipped from review due to trivial changes (7)
  • tests/unit/config/config.cpp
  • src/main.cpp
  • tests/unit/extract/ast.cpp
  • tests/unit/extract/compdb.cpp
  • tests/unit/config/validate.cpp
  • tests/unit/generate/llm.cpp
  • tests/unit/extract/symbol.cpp
🚧 Files skipped from review as they are similar to previous changes (12)
  • .github/workflows/generate-docs.yml
  • src/config/config.cppm
  • src/config/schema.cppm
  • cmake/toolchain.cmake
  • src/extract/tooling.cppm
  • tests/unit/extract/extract.cpp
  • src/config/normalize.cppm
  • src/extract/model.cppm
  • src/config/validate.cppm
  • src/extract/extract.cppm
  • tests/unit/generate/generate.cpp
  • src/extract/ast.cppm

Comment thread src/extract/compdb.cppm
Comment thread src/generate/generate.cppm Outdated
@tsurumi-yizhou tsurumi-yizhou merged commit 76164cd into main Apr 11, 2026
1 of 2 checks passed
@tsurumi-yizhou tsurumi-yizhou deleted the modules branch April 11, 2026 06:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants