feat(logs): add log rotate#470
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR replaces per-daemon filesystem logs with a centralized SQLite LogStore. Supervisor appends lines to the DB; CLI, MCP, TUI, Web, and API query/poll the store; hourly retention pruning runs; legacy text logs are auto-imported; config, schema, docs, and tests updated accordingly. ChangesLog Store Implementation and Integration
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Greptile SummaryThis PR replaces per-daemon text log files with a single centralised SQLite database (
Confidence Score: 4/5Safe to merge with the API status-code fix addressed; the remaining items are non-blocking quality improvements. The raw HTTP tail endpoint returns 404 to callers when the underlying SQLite query fails, causing clients to misinterpret a transient database error as a missing resource. All other changes are well-structured; retention logic, SSE clear detection, and auto-migration guard against the in-memory fallback case are all correct. src/web/routes/api/logs.rs (wrong HTTP status on DB error) and src/log_store/sqlite.rs (empty legacy files, single-connection mutex). Important Files Changed
Reviews (7): Last reviewed commit: "fix(web/logs): wrap all blocking SQLite ..." | Re-trigger Greptile |
1819b28 to
44eb89a
Compare
There was a problem hiding this comment.
Actionable comments posted: 15
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/cli/logs.rs (1)
149-192:⚠️ Potential issue | 🔴 CriticalFix
pitchfork logs --migrateCLI/docs mismatch
--migrateis documented inpitchfork.usage.kdlanddocs/cli/logs.md/docs/guides/logs.md, butsrc/cli/logs.rs’sLogsstruct has nomigratefield andrun()never checksself.migrate(it only always callsmigrate_legacy_log_dirs()).Also, legacy import into SQLite is already performed automatically by
src/log_store/sqlite.rswhenLOG_STOREis opened, so the flag’s “manual one-time operation” description doesn’t match runtime behavior.Either wire
--migrateto control/trigger both stages (including ensuring the SQLite import runs when the flag is passed), or remove--migratefrom the CLI/docs to reflect the current automatic migration; then regenerate rendered docs (mise run render) after updating the clap/usage source.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cli/logs.rs` around lines 149 - 192, The CLI flag `--migrate` is documented but not implemented: add a migrate toggle to the Logs CLI or remove the flag/docs; to fix either (A) implement the flag by adding a field migrate: bool to the Logs struct and update impl Logs::run to only call migrate_legacy_log_dirs() when self.migrate is true and also force the SQLite import by triggering LOG_STORE initialization (call or reference LOG_STORE so auto_migrate_legacy_logs(&store) runs) or (B) remove `--migrate` from pitchfork.usage.kdl and the docs (docs/cli/logs.md, docs/guides/logs.md) and update any references since migration is already automatic via src/log_store/sqlite.rs auto_migrate_legacy_logs; then regenerate rendered docs. Ensure you modify the Logs struct, impl Logs::run, migrate_legacy_log_dirs(), and references to LOG_STORE accordingly so runtime behavior matches documentation.tests/test_e2e_namespace.rs (1)
76-87:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUse polling instead of single-shot reads after fixed sleeps.
At Line 77/83 and Line 365, reading logs once after
sleepcan race daemon startup/log flush under CI load. Usewait_for_logsso tests wait deterministically for expected content.Proposed change
- let log_a_content = env.read_logs("project-a/api"); + let log_a_content = env.wait_for_logs( + "project-a/api", + "Hello from project-a", + Duration::from_secs(5), + ); @@ - let log_b_content = env.read_logs("project-b/api"); + let log_b_content = env.wait_for_logs( + "project-b/api", + "Hello from project-b", + Duration::from_secs(5), + ); @@ - let log_content = env.read_logs("my-cool-project/my-service"); + let log_content = env.wait_for_logs( + "my-cool-project/my-service", + "encoding test", + Duration::from_secs(5), + );Also applies to: 363-369
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_e2e_namespace.rs` around lines 76 - 87, Replace the single-shot env.read_logs calls with the polling helper env.wait_for_logs to avoid flakiness: instead of calling env.read_logs("project-a/api") and asserting contains immediately, call env.wait_for_logs("project-a/api", "Hello from project-a") (and similarly for "project-b/api") so the test polls until the expected string appears; also update the other occurrence that reads logs after a fixed sleep (the read_logs at the other noted location) to use wait_for_logs with the expected message to ensure deterministic waiting for daemon startup/log flush.
🧹 Nitpick comments (2)
src/cli/logs.rs (2)
549-569: 💤 Low valueVerify the initial tail position logic when start_from_end is true.
When
start_from_endis true, the code queries withlimit: Some(1)andorder_desc: trueto find the most recent log entry's ID. With these options, the SQL backend returns only the single most recent entry, so.last()on a 1-element Vec simply returns that entry. While this works, using.first()would be more semantically clear sinceorder_desc: truemeans the first result is the newest.♻️ Optional clarification
match LOG_STORE.query(&LogQuery { daemon_ids: vec![id.qualified()], from: None, to: None, limit: Some(1), order_desc: true, after_id: None, }) { - Ok(entries) => entries.last().map(|e| e.id).unwrap_or(0), + Ok(entries) => entries.first().map(|e| e.id).unwrap_or(0), Err(_) => 0, }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cli/logs.rs` around lines 549 - 569, The initial tail-position logic for building the states map uses LOG_STORE.query with order_desc: true and limit: Some(1) but then calls entries.last(), which is semantically confusing; change the code that computes since (inside the map over names when start_from_end is true) to use entries.first() instead of entries.last() so the newest entry (first result with order_desc: true) is selected — update the closure that builds states (refer to names, states, LOG_STORE, and LogQuery) accordingly.
597-600: 💤 Low valueMinor inefficiency: avoid cloning strings in sort key.
The sort key
(l.0.to_string(), l.1.to_string())clones the timestamp and daemon_id strings that are already owned. Sincel.0andl.1are alreadyString, calling.to_string()creates unnecessary copies. You can use references or tuples of the owned strings directly.♻️ Optional optimization
if !out.is_empty() { let out = out .into_iter() - .sorted_by_cached_key(|l| (l.0.to_string(), l.1.to_string())) + .sorted_by_cached_key(|l| (l.0.clone(), l.1.clone())) .collect_vec();Or, since the tuple elements are already owned:
- .sorted_by_cached_key(|l| (l.0.to_string(), l.1.to_string())) + .sorted_by(|a, b| (&a.0, &a.1).cmp(&(&b.0, &b.1)))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/cli/logs.rs` around lines 597 - 600, The sort currently clones strings via .sorted_by_cached_key(|l| (l.0.to_string(), l.1.to_string())), causing unnecessary allocations; change the key closure to borrow the existing Strings instead (e.g. .sorted_by_cached_key(|l| (&l.0, &l.1))) so you compare by reference without cloning—update the call on the variable out where sorted_by_cached_key is used.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/reference/file-locations.md`:
- Around line 78-80: The markdown fenced code blocks containing the path strings
"~/.local/state/pitchfork/logs/logs.db" and
"~/.local/state/pitchfork/logs/<namespace>--<daemon-name>/" are unlabeled and
trigger MD040; update each triple-backtick fence to include a language
identifier (e.g., add "text" after the opening ```), so the fences become
```text ... ``` to satisfy markdownlint without changing the path content.
In `@src/cli/mcp.rs`:
- Around line 358-388: Currently errors from LOG_STORE.list_daemon_ids() and
LOG_STORE.query(...) are ignored and treated as empty results; change these
handlers to propagate an internal error instead of swallowing them: when calling
LOG_STORE.list_daemon_ids() inside the daemon_ids assignment, map Err(e) to
return internal_err with a message that includes the error and context (e.g.,
"Failed to list daemon IDs: {e}"); similarly, in the loop where
LOG_STORE.query(&LogQuery { ... }) currently matches Err(_) => continue, replace
that continue with returning internal_err including the daemon_id and the query
error (e.g., "Failed to query logs for daemon {daemon_id}: {e}"), so both
failures surface to the caller rather than producing a misleading "No daemon
logs found" success.
- Around line 378-401: The current loop queries LOG_STORE per daemon causing
params.n to be applied per-daemon; instead build a single LogQuery with
daemon_ids: daemon_ids.iter().map(|d| d.qualified()).collect(), limit:
Some(params.n), order_desc: true and call LOG_STORE.query once to retrieve the
global most-recent N entries. Then, for output, iterate over the original
daemon_ids and for each daemon filter the single returned entries for entries
matching that daemon (e.g., entry.daemon_id or equivalent), reverse each
per-daemon slice to chronological order and format timestamp/message as before;
update references to LOG_STORE.query, LogQuery, daemon_ids, params.n, and
entry.timestamp/entry.message accordingly.
- Around line 356-388: The MCP handler currently calls blocking SQLite functions
inline: LOG_STORE.list_daemon_ids() and LOG_STORE.query(&LogQuery { ... }),
which can block the Tokio reactor; wrap these synchronous calls in a blocking
context (tokio::task::block_in_place or tokio::task::spawn_blocking) and await
the result so the async function (pitchfork_logs / the MCP handler) does not
perform rusqlite actions on the async executor; ensure you offload both the
list_daemon_ids and query usages and propagate errors/results back into the
async flow the same way append already does.
In `@src/log_store/sqlite.rs`:
- Around line 203-225: The append() method currently does blocking rusqlite
writes under self.conn.lock() (serialized per line); instead introduce an async
Tokio writer: add an mpsc::Sender (e.g., log_tx) to the Sqlite log store struct
and spawn a background worker task that receives messages, batches them into a
single transaction and performs the blocking DB writes inside spawn_blocking (or
a dedicated sync task) using the existing connection and prepared INSERT
statements; change append(&self, ...) to be async (or return quickly) by sending
the (daemon_id.qualified(), timestamp, message) tuple to log_tx (use try_send or
await send) so callers are not blocked, and ensure the background worker handles
graceful shutdown/flush and contention by using batched INSERT/transaction to
reduce mutex contention on self.conn.lock().
In `@src/supervisor/watchers.rs`:
- Around line 309-318: The code silently ignores malformed retention settings by
using humantime::parse_duration(...).ok() and treating non-positive
line_retention as None; change this to validate and surface errors: in the block
computing global_age and global_count (variables global_age, global_count, using
settings() and settings.logs.time_retention / settings.logs.line_retention),
detect parse errors and invalid values and either return/config error or log a
clear error including the offending key/value (don't fall back silently), and
ensure per-daemon resolution that uses policy.count.or(...) / policy.age.or(...)
respects and propagates these validation errors instead of silently inheriting
globals; update the code paths at the other occurrences you noted (around the
blocks that set per-daemon policy at lines referenced: 336-345 and 377-379) to
use the same validation/logging approach.
In `@src/tui/app.rs`:
- Around line 1459-1474: The query is currently returning the oldest
TUI_LOG_LIMIT rows because order_desc is false while LIMIT is applied after
ORDER BY; update the LogQuery passed to LOG_STORE.query (in the self.log_content
assignment) to request the newest rows by setting order_desc: true (keep limit:
Some(TUI_LOG_LIMIT) and the same daemon_ids/after_id handling) so the database
returns the most recent 5,000 entries.
- Around line 1455-1475: The synchronous LOG_STORE.query call in load_logs is
blocking the TUI; change the flow so the DB query runs on a Tokio blocking
background task (use tokio::task::spawn_blocking around SqliteLogStore::query)
and send the formatted Vec<String> lines back to the App instead of doing the
query inline. Concretely: make load_logs spawn a blocking task that calls
LOG_STORE.query (or introduce a new async helper called from apply_refresh),
format timestamps/messages inside that background task, and deliver the result
via an async channel or a task handle so the App can update self.log_content
when the task completes; ensure apply_refresh (or the async handler) uses
tokio::select! to drop or ignore stale in-flight tasks for previous refreshes.
Reference functions/types: App::load_logs, App::apply_refresh, LOG_STORE, and
SqliteLogStore::query when implementing the spawn_blocking + result delivery and
cancellation logic.
In `@src/web/routes/logs.rs`:
- Around line 336-345: The SSE poll currently uses LogQuery { limit: None }
which will fetch an unbounded backlog; change it to query in fixed-size batches
(e.g., limit: Some(BATCH_SIZE)) using LOG_STORE.query and then drain in a loop:
after each successful batch update after_id/last_id to the last returned entry's
id and continue fetching until the returned batch size is less than BATCH_SIZE,
breaking the loop; ensure this logic lives in the same SSE poll loop around
LOG_STORE.query and references LogQuery.limit, LogQuery.after_id, last_id, and
the variable holding the returned entries to advance the cursor for subsequent
batches.
- Line 366: The call to LOG_STORE.clear(&[daemon_id]) currently discards its
Result causing the route to always return success; change this to propagate
failures instead of ignoring them by handling the Result from LOG_STORE.clear
(e.g. use the ? operator or map_err to convert the error into the route's error
type) so that errors bubble up to the HTTP response, and add a contextual log
message referencing daemon_id when the clear fails; update the surrounding
handler (the route function containing the LOG_STORE.clear call) to return an
appropriate Result/Response so a SQLite or clear error results in a non-OK HTTP
response instead of silently reporting success.
- Around line 64-76: fetch_latest_logs and the SSE/route code call
LOG_STORE.query, LOG_STORE.last_clear_generation, and LOG_STORE.clear
synchronously but those methods block on a mutexed rusqlite connection; move
each DB call onto a blocking thread (eg. wrap calls to LOG_STORE.query /
last_clear_generation / clear in tokio::task::spawn_blocking or
tokio::task::block_in_place) so async handlers and stream_sse don't stall the
Tokio runtime, change the stream_sse periodic query to pass a bounded limit
instead of None to avoid unbounded backlog fetches, and stop swallowing errors
from LOG_STORE.clear (propagate or log and only advance clear-generation/SSE
state on successful clear) — update references in code to fetch_latest_logs,
stream_sse, LOG_STORE.query, LOG_STORE.last_clear_generation, and
LOG_STORE.clear accordingly.
In `@tests/common/mod.rs`:
- Around line 187-193: The current use of rows.flatten() in the loop silently
discards any row-level rusqlite errors and can return partial/incorrect logs;
change the code to explicitly handle Result<Row, rusqlite::Error> from rows:
iterate over rows (replace rows.flatten()) and on Err(e) propagate the error
(e.g., change the function signature to return Result<Option<String>,
rusqlite::Error> and return Err(e)) or collect rows via
rows.collect::<Result<Vec<_>, _>>() before building out, ensuring you either
return the error or fail the test instead of dropping it; update references to
out, rows.flatten(), and the Option return accordingly.
- Around line 163-166: read_logs currently falls back to legacy file reads when
read_logs_from_db returns None, allowing DB regressions to be missed; add a
strict DB-only helper (e.g., read_logs_db_only or read_logs_from_db_only) that
calls read_logs_from_db(daemon_id) and returns an Err or empty string only if
the DB read explicitly yields failure (do NOT call fs::read_to_string or use
log_path), update tests that assert SQLite behavior to call this new helper
instead of read_logs, and keep the existing read_logs for compatibility tests;
reference the existing read_logs_from_db and read_logs functions to implement
the new DB-only path.
In `@tests/test_e2e_migration.rs`:
- Around line 273-275: The invocation env.run_command(&["logs"]) in the test
(both the first call around the auto-migration check and the second idempotency
run) does not assert the command exit status; capture the result of
env.run_command(&["logs"]) into a variable (e.g., let out =
env.run_command(&["logs"]) ) and assert out.status.success() before using
warn_count or other outputs so a failing command cannot silently pass; update
both occurrences (the first call near the migration import and the later
idempotency run) to perform this success assertion and then proceed to inspect
out.stdout/stderr or warn_count.
---
Outside diff comments:
In `@src/cli/logs.rs`:
- Around line 149-192: The CLI flag `--migrate` is documented but not
implemented: add a migrate toggle to the Logs CLI or remove the flag/docs; to
fix either (A) implement the flag by adding a field migrate: bool to the Logs
struct and update impl Logs::run to only call migrate_legacy_log_dirs() when
self.migrate is true and also force the SQLite import by triggering LOG_STORE
initialization (call or reference LOG_STORE so auto_migrate_legacy_logs(&store)
runs) or (B) remove `--migrate` from pitchfork.usage.kdl and the docs
(docs/cli/logs.md, docs/guides/logs.md) and update any references since
migration is already automatic via src/log_store/sqlite.rs
auto_migrate_legacy_logs; then regenerate rendered docs. Ensure you modify the
Logs struct, impl Logs::run, migrate_legacy_log_dirs(), and references to
LOG_STORE accordingly so runtime behavior matches documentation.
In `@tests/test_e2e_namespace.rs`:
- Around line 76-87: Replace the single-shot env.read_logs calls with the
polling helper env.wait_for_logs to avoid flakiness: instead of calling
env.read_logs("project-a/api") and asserting contains immediately, call
env.wait_for_logs("project-a/api", "Hello from project-a") (and similarly for
"project-b/api") so the test polls until the expected string appears; also
update the other occurrence that reads logs after a fixed sleep (the read_logs
at the other noted location) to use wait_for_logs with the expected message to
ensure deterministic waiting for daemon startup/log flush.
---
Nitpick comments:
In `@src/cli/logs.rs`:
- Around line 549-569: The initial tail-position logic for building the states
map uses LOG_STORE.query with order_desc: true and limit: Some(1) but then calls
entries.last(), which is semantically confusing; change the code that computes
since (inside the map over names when start_from_end is true) to use
entries.first() instead of entries.last() so the newest entry (first result with
order_desc: true) is selected — update the closure that builds states (refer to
names, states, LOG_STORE, and LogQuery) accordingly.
- Around line 597-600: The sort currently clones strings via
.sorted_by_cached_key(|l| (l.0.to_string(), l.1.to_string())), causing
unnecessary allocations; change the key closure to borrow the existing Strings
instead (e.g. .sorted_by_cached_key(|l| (&l.0, &l.1))) so you compare by
reference without cloning—update the call on the variable out where
sorted_by_cached_key is used.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 049ed609-9f18-437c-a79e-b98970419d99
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (27)
Cargo.tomldocs/cli/commands.jsondocs/cli/logs.mddocs/guides/logs.mddocs/public/schema.jsondocs/reference/file-locations.mddocs/reference/settings.mdpitchfork.usage.kdlsettings.tomlsrc/cli/logs.rssrc/cli/mcp.rssrc/daemon.rssrc/lib.rssrc/log_store/mod.rssrc/log_store/sqlite.rssrc/main.rssrc/pitchfork_toml.rssrc/supervisor/lifecycle.rssrc/supervisor/watchers.rssrc/tui/app.rssrc/web/assets/logs.jssrc/web/routes/logs.rstest/test_helper/common_setup.bashtests/common/mod.rstests/test_e2e_logs.rstests/test_e2e_migration.rstests/test_e2e_namespace.rs
💤 Files with no reviewable changes (2)
- tests/test_e2e_logs.rs
- src/daemon.rs
| for msg in rows.flatten() { | ||
| if !out.is_empty() { | ||
| out.push('\n'); | ||
| } | ||
| out.push_str(&msg); | ||
| } | ||
| if out.is_empty() { None } else { Some(out) } |
There was a problem hiding this comment.
Don’t drop SQLite row errors silently.
At Line 187, rows.flatten() discards row-level errors and can return partial logs as if successful. This makes failures harder to diagnose and can produce false positives.
Proposed change
- for msg in rows.flatten() {
+ for msg in rows {
+ let msg = msg.ok()?;
if !out.is_empty() {
out.push('\n');
}
out.push_str(&msg);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for msg in rows.flatten() { | |
| if !out.is_empty() { | |
| out.push('\n'); | |
| } | |
| out.push_str(&msg); | |
| } | |
| if out.is_empty() { None } else { Some(out) } | |
| for msg in rows { | |
| let msg = msg.ok()?; | |
| if !out.is_empty() { | |
| out.push('\n'); | |
| } | |
| out.push_str(&msg); | |
| } | |
| if out.is_empty() { None } else { Some(out) } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/common/mod.rs` around lines 187 - 193, The current use of
rows.flatten() in the loop silently discards any row-level rusqlite errors and
can return partial/incorrect logs; change the code to explicitly handle
Result<Row, rusqlite::Error> from rows: iterate over rows (replace
rows.flatten()) and on Err(e) propagate the error (e.g., change the function
signature to return Result<Option<String>, rusqlite::Error> and return Err(e))
or collect rows via rows.collect::<Result<Vec<_>, _>>() before building out,
ensuring you either return the error or fail the test instead of dropping it;
update references to out, rows.flatten(), and the Option return accordingly.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/supervisor/watchers.rs (2)
331-334:⚠️ Potential issue | 🟠 MajorUse safe
std::time::Duration→chrono::Durationconversion (remove unchecked casts).
src/supervisor/watchers.rsconvertshumantime::parse_durationoutput viastd_dur.as_secs() as i64/subsec_nanos() as i64(lines 331-334 and 373-377). For oversizedtime_retentionvalues, those casts can overflow into a negativechrono::Duration, androtate_by_agecomputesLocal::now() - max_age, making a future cutoff and pruning almost everything. Switch tochrono::Duration::from_std(std_dur)and reject/ignore values that returnErr(OutOfRangeError)(with a warning) instead of doing unchecked casts.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/supervisor/watchers.rs` around lines 331 - 334, The current conversion of std::time::Duration to chrono::Duration (used for age/global_age and the other occurrence) does unchecked casts via as_secs() as i64 and as_nanos() which can overflow; replace those casts with chrono::Duration::from_std(std_dur) and handle the Result: on Ok(...) use that chrono::Duration, on Err(OutOfRangeError) log a warning (via the existing logger) and treat the value as invalid/ignore it (e.g., set age/global_age to None or skip applying it) so rotate_by_age/Local::now() won't compute a future cutoff; update both conversion sites that currently construct chrono::Duration from std_dur to follow this pattern and ensure any callers of age/global_age handle the None/ignored case.
308-409:⚠️ Potential issue | 🟠 MajorOffload log-retention pruning off the Tokio interval task
apply_log_retention()isasyncbut performs blocking work inline:PitchforkToml::all_merged()?does synchronous config reads, andLOG_STORE.apply_retention*locks astd::sync::Mutex<Connection>and runs synchronousrusqliteDELETEqueries (rotate_by_age/rotate_by_count) directly. Large pruning can therefore stall the supervisor interval task—wrap the whole retention pass (including config/policy building and SQLite calls) intokio::task::spawn_blocking(or add an async store API).Also, the
std::time::Duration→chrono::Durationconversion uses lossyas i64casts (as_secs() as i64), which can wrap for extremely large configured durations; add bounds/validation before casting.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/supervisor/watchers.rs` around lines 308 - 409, apply_log_retention performs blocking work on the Tokio task by calling PitchforkToml::all_merged() and LOG_STORE.apply_retention / apply_retention_for_daemon (which lock a std::sync::Mutex and run rusqlite DELETEs); move the entire retention pass (config collection, per-daemon policy construction, and all LOG_STORE SQLite calls) into tokio::task::spawn_blocking so it runs off the async runtime (or provide an async store API) and await its result in apply_log_retention; additionally validate/limit durations before converting std::time::Duration to chrono::Duration in both the global_policy and per-daemon policy construction (avoid direct as_secs() as i64 casting that can wrap—check bounds and return an error/warn for too-large durations).Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@src/supervisor/watchers.rs`:
- Around line 331-334: The current conversion of std::time::Duration to
chrono::Duration (used for age/global_age and the other occurrence) does
unchecked casts via as_secs() as i64 and as_nanos() which can overflow; replace
those casts with chrono::Duration::from_std(std_dur) and handle the Result: on
Ok(...) use that chrono::Duration, on Err(OutOfRangeError) log a warning (via
the existing logger) and treat the value as invalid/ignore it (e.g., set
age/global_age to None or skip applying it) so rotate_by_age/Local::now() won't
compute a future cutoff; update both conversion sites that currently construct
chrono::Duration from std_dur to follow this pattern and ensure any callers of
age/global_age handle the None/ignored case.
- Around line 308-409: apply_log_retention performs blocking work on the Tokio
task by calling PitchforkToml::all_merged() and LOG_STORE.apply_retention /
apply_retention_for_daemon (which lock a std::sync::Mutex and run rusqlite
DELETEs); move the entire retention pass (config collection, per-daemon policy
construction, and all LOG_STORE SQLite calls) into tokio::task::spawn_blocking
so it runs off the async runtime (or provide an async store API) and await its
result in apply_log_retention; additionally validate/limit durations before
converting std::time::Duration to chrono::Duration in both the global_policy and
per-daemon policy construction (avoid direct as_secs() as i64 casting that can
wrap—check bounds and return an error/warn for too-large durations).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 1a313d9e-0e01-4ffb-84ce-e5c92809804a
📒 Files selected for processing (5)
src/cli/logs.rssrc/log_store/sqlite.rssrc/supervisor/watchers.rssrc/web/routes/logs.rstests/common/mod.rs
🚧 Files skipped from review as they are similar to previous changes (4)
- tests/common/mod.rs
- src/log_store/sqlite.rs
- src/web/routes/logs.rs
- src/cli/logs.rs
44eb89a to
fe2b461
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (4)
src/web/routes/logs.rs (4)
273-304:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
lines_partialperforms blocking SQLite query on async executor.Same issue as
fetch_latest_logs:LOG_STORE.query()is called directly withoutspawn_blocking. Wrap the query intokio::task::spawn_blockingto avoid stalling the Tokio runtime.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/web/routes/logs.rs` around lines 273 - 304, lines_partial currently calls LOG_STORE.query synchronously, which blocks the async runtime; wrap the query call in tokio::task::spawn_blocking and await its JoinHandle. Move or clone needed locals (daemon_id.qualified(), log_lines) into the blocking closure, perform LOG_STORE.query(&LogQuery { ... }) inside it, and map both Ok(entries) and Err(_) back to the same Html(String::new()) error path; then continue formatting the entries as before using the returned entries. Ensure you handle JoinError and query error uniformly and keep the function async.Source: Coding guidelines
326-365:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftBlocking SQLite calls inside SSE stream loop.
LOG_STORE.query()andLOG_STORE.last_clear_generation()are called directly inside the async stream. These block the Tokio runtime on every poll iteration. Wrap these calls intokio::task::spawn_blocking:let gen = tokio::task::spawn_blocking({ let d = daemon_id.clone(); move || LOG_STORE.last_clear_generation(&d) }) .await .ok() .and_then(|r| r.ok()) .flatten() .unwrap_or(0);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/web/routes/logs.rs` around lines 326 - 365, The SSE stream is calling blocking SQLite methods directly (LOG_STORE.query and LOG_STORE.last_clear_generation) which blocks the Tokio runtime; wrap each synchronous call in tokio::task::spawn_blocking and await the JoinHandle before using results (preserve the same return/error handling) for computing last_clear_gen, current_gen, and entries so last_id and BATCH_SIZE logic remains unchanged; ensure you clone or move daemon_id into the spawn_blocking closures and convert the nested Result/Option exactly as before (so last_clear_gen and entries are set using the spawned task's Ok values or fallback to 0/continue on error) and then yield the Event::default().event("clear").data("") or stream entries as before.Source: Coding guidelines
380-393:⚠️ Potential issue | 🟠 Major | ⚡ Quick winBlocking
LOG_STORE.clear()and silent UI failure.Two issues remain:
LOG_STORE.clear()is a blocking SQLite call not wrapped inspawn_blocking.The error handling returns an HTML error fragment, but the caller uses
hx-swap="none"(line 170), so the response body is discarded. The UI optimistically blanks the logs viaclearTabLogs, but won't know if the clear failed. Consider returning an error status code so HTMX can trigger an error event.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/web/routes/logs.rs` around lines 380 - 393, The clear handler currently calls the blocking LOG_STORE.clear(&[daemon_id]) directly and returns an HTML fragment that gets discarded by the caller's hx-swap="none"; change clear to run the blocking DB call inside tokio::task::spawn_blocking (await its JoinHandle and map the result) so the async runtime isn't blocked, and on error return an HTTP error status (e.g., StatusCode::INTERNAL_SERVER_ERROR) rather than only an HTML fragment so HTMX sees a non-2xx response and can trigger its error handler; refer to the clear function, LOG_STORE.clear, and the caller behavior (clearTabLogs / hx-swap="none") when making these changes.Source: Coding guidelines
64-90:⚠️ Potential issue | 🟠 Major | ⚡ Quick winBlocking SQLite call in sync helper used by async handlers.
fetch_latest_logscallsLOG_STORE.query()synchronously. Since this helper is invoked fromasync fn index()andasync fn show(), the blocking rusqlite work runs on a Tokio worker thread. Either make this an async function that usesspawn_blockinginternally, or wrap the calls at the call sites.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/web/routes/logs.rs` around lines 64 - 90, Convert fetch_latest_logs to async and move the synchronous DB call into a spawn_blocking closure: call tokio::task::spawn_blocking to run LOG_STORE.query(...) (using the same LogQuery built from settings().web.log_lines and daemon_id.qualified()), await the JoinHandle, map Err or query errors to an empty String, then perform the existing formatting (timestamp + message), reverse/join, strip ANSI via console::strip_ansi_codes and html_escape and return the String; update callers (async fn index and async fn show) to await fetch_latest_logs. Ensure error paths match the current behavior (return empty string).Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@src/web/routes/logs.rs`:
- Around line 273-304: lines_partial currently calls LOG_STORE.query
synchronously, which blocks the async runtime; wrap the query call in
tokio::task::spawn_blocking and await its JoinHandle. Move or clone needed
locals (daemon_id.qualified(), log_lines) into the blocking closure, perform
LOG_STORE.query(&LogQuery { ... }) inside it, and map both Ok(entries) and
Err(_) back to the same Html(String::new()) error path; then continue formatting
the entries as before using the returned entries. Ensure you handle JoinError
and query error uniformly and keep the function async.
- Around line 326-365: The SSE stream is calling blocking SQLite methods
directly (LOG_STORE.query and LOG_STORE.last_clear_generation) which blocks the
Tokio runtime; wrap each synchronous call in tokio::task::spawn_blocking and
await the JoinHandle before using results (preserve the same return/error
handling) for computing last_clear_gen, current_gen, and entries so last_id and
BATCH_SIZE logic remains unchanged; ensure you clone or move daemon_id into the
spawn_blocking closures and convert the nested Result/Option exactly as before
(so last_clear_gen and entries are set using the spawned task's Ok values or
fallback to 0/continue on error) and then yield the
Event::default().event("clear").data("") or stream entries as before.
- Around line 380-393: The clear handler currently calls the blocking
LOG_STORE.clear(&[daemon_id]) directly and returns an HTML fragment that gets
discarded by the caller's hx-swap="none"; change clear to run the blocking DB
call inside tokio::task::spawn_blocking (await its JoinHandle and map the
result) so the async runtime isn't blocked, and on error return an HTTP error
status (e.g., StatusCode::INTERNAL_SERVER_ERROR) rather than only an HTML
fragment so HTMX sees a non-2xx response and can trigger its error handler;
refer to the clear function, LOG_STORE.clear, and the caller behavior
(clearTabLogs / hx-swap="none") when making these changes.
- Around line 64-90: Convert fetch_latest_logs to async and move the synchronous
DB call into a spawn_blocking closure: call tokio::task::spawn_blocking to run
LOG_STORE.query(...) (using the same LogQuery built from
settings().web.log_lines and daemon_id.qualified()), await the JoinHandle, map
Err or query errors to an empty String, then perform the existing formatting
(timestamp + message), reverse/join, strip ANSI via console::strip_ansi_codes
and html_escape and return the String; update callers (async fn index and async
fn show) to await fetch_latest_logs. Ensure error paths match the current
behavior (return empty string).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: d0eee2f1-3e17-4aa5-8fc3-f4bad7b9bd68
📒 Files selected for processing (11)
docs/guides/logs.mddocs/reference/file-locations.mdsrc/cli/logs.rssrc/cli/mcp.rssrc/log_store/sqlite.rssrc/supervisor/watchers.rssrc/tui/app.rssrc/web/routes/logs.rstests/common/mod.rstests/test_e2e_migration.rstests/test_e2e_namespace.rs
✅ Files skipped from review due to trivial changes (1)
- docs/guides/logs.md
🚧 Files skipped from review as they are similar to previous changes (7)
- tests/test_e2e_namespace.rs
- src/tui/app.rs
- tests/common/mod.rs
- src/supervisor/watchers.rs
- tests/test_e2e_migration.rs
- src/cli/logs.rs
- src/log_store/sqlite.rs
…igration and SSE clear
- tui: reverse DESC-ordered entries so oldest logs appear at top - watchers: use chrono::Duration::from_std to avoid overflow, offload retention to spawn_blocking - api/logs: rewrite tail endpoint to use SQLite store with spawn_blocking and clear-gen tracking
fe2b461 to
48bc978
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/web/routes/api/logs.rs`:
- Around line 69-96: The code currently re-queries LOG_STORE to get last_id
after constructing `initial` which creates a race that can skip entries;
instead, capture the highest entry ID from the initial query results before you
reverse/map them (use the original `initial` iterator/collection or the query
result struct) and set `last_id` to that max ID (falling back to 0 if none)
rather than calling LOG_STORE.query again; update the `last_id` assignment logic
(the variable named `last_id` and the place where `initial: Vec<String>` is
constructed and reversed) so it derives the cursor from the initial query result
and removes the extra spawn_blocking/query call.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: b3a7f1d1-554a-4713-9444-b84338c2365f
📒 Files selected for processing (20)
docs/guides/logs.mddocs/public/schema.jsondocs/reference/file-locations.mddocs/reference/settings.mdsettings.tomlsrc/cli/logs.rssrc/cli/mcp.rssrc/daemon.rssrc/log_store/mod.rssrc/log_store/sqlite.rssrc/pitchfork_toml.rssrc/supervisor/lifecycle.rssrc/supervisor/watchers.rssrc/tui/app.rssrc/web/assets/logs.jssrc/web/routes/api/logs.rssrc/web/routes/logs.rstests/common/mod.rstests/test_e2e_migration.rstests/test_e2e_namespace.rs
✅ Files skipped from review due to trivial changes (3)
- docs/reference/settings.md
- docs/reference/file-locations.md
- docs/guides/logs.md
🚧 Files skipped from review as they are similar to previous changes (15)
- src/web/assets/logs.js
- tests/test_e2e_migration.rs
- settings.toml
- tests/test_e2e_namespace.rs
- docs/public/schema.json
- src/supervisor/watchers.rs
- src/pitchfork_toml.rs
- tests/common/mod.rs
- src/daemon.rs
- src/log_store/mod.rs
- src/tui/app.rs
- src/cli/mcp.rs
- src/log_store/sqlite.rs
- src/supervisor/lifecycle.rs
- src/cli/logs.rs
…ition Remove redundant re-query for last_id after constructing initial history. Instead, capture the highest entry ID before reversing the DESC query result. This fixes a race where entries added between the two queries could be skipped. Suggested by CodeRabbit in PR jdx#470 review.
…ition, include timestamps in output Remove redundant re-query for last_id after constructing initial history. Instead, capture the highest entry ID before reversing the DESC query result. This fixes a race where entries added between the two queries could be skipped. Also prepend timestamp (%Y-%m-%d %H:%M:%S) to each log line in the HTTP tail stream, matching the format used by CLI, web UI SSE, and TUI. Fixes the missing-timestamp regression identified in PR jdx#470 review.
5a48c37 to
3829624
Compare
|
@coderabbitai review |
✅ Action performedReview finished.
|
- fetch_latest_logs: now async, wraps LOG_STORE.query in spawn_blocking - lines_partial: wraps LOG_STORE.query in spawn_blocking - stream_sse: wraps all LOG_STORE.query and last_clear_generation calls in spawn_blocking; also fixes race condition by using .first() instead of .last() for initial cursor capture - clear: wraps LOG_STORE.clear in spawn_blocking and handles errors explicitly instead of using if-let This brings consistency with the API logs handler which already wrapped every call correctly. Fixes Greptile feedback in PR jdx#470.
Summary by CodeRabbit
New Features
Behavior Changes
Documentation
Tests