Skip to content

Commit d1c481f

Browse files
committed
Analytics: Start real beta DAU flowing with a privacy-clean heartbeat + opt-out
Lands the desktop foundation for anonymous beta usage analytics (plan M3): true daily-active data starts flowing, and the consent + settings surface ships. No PostHog events yet. - `install_id.rs` (new, neutral): mints two Rust-owned per-install ids that never meet by construction. `analytics_id()` → `anal_<uuid>` (heartbeat key, never on a report), `diagnostics_id()` → `diag_<uuid>` (reports only, never on analytics). Both persist to one `install-ids.json` via `config::durable_write_json`, resolve the data dir without an `AppHandle` (so they're callable from the panic hook / crash assembly / the analytics loop), and `init()` snapshots the diag id into a `OnceLock` for the signal-safe panic path. - `analytics/` (new): hourly `/heartbeat` sender mirroring `space_poller`'s spawn pattern. `config_shape.rs` is the ONE place the PII-free rule lives: include every bool/number setting, plus a small `CATEGORICAL_STRING_KEYS` allowlist (theme, palettes, density, sort mode, AI provider, etc.), exclude every other string and all objects/arrays, add `fdaGranted`. Never redaction, always allowlist. A shared tri-state `analytics_consent_granted()` (`None`/`Some(true)` → send, `Some(false)` → fully silent) gates the loop; dev/CI suppressed unless `CMDR_ANALYTICS_FORCE=1`. - `platform.rs` (new): promotes the duplicated `get_os_version()` out of the crash + error reporters into one shared `os_version()` helper, now also used by the heartbeat. - Settings loader: adds `analytics_enabled` + `analytics_email` (manual dot-key extraction, struct + `Default`). - Frontend: `analytics.enabled` (switch, default on) + `analytics.email` (text input) in the registry + `SettingsValues`; renames the Updates settings section to "Updates & privacy" and adds the opt-out switch + the email field with its "never sent with your usage data" note. The email persists locally; the beta-signup call lands separately. Heartbeat payload (`HeartbeatPayload`, camelCase, `None` → `null`): `analId`, `appVersion`, `osVersion`, `arch`, `buildMode`, nested `config`. Matches the M2 Worker validator. Test-first on the privacy invariants: install-id prefixes/stability/reload/regen, the config-shape allowlist (seeded PII-shaped strings produce a snapshot containing none of them), the consent tri-state, and the payload camelCase shape. Docs: new `analytics/CLAUDE.md`, updated settings + architecture docs.
1 parent 801f1e4 commit d1c481f

24 files changed

Lines changed: 1135 additions & 201 deletions
Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
# Analytics (beta usage stats)
2+
3+
Anonymous beta usage analytics. A background loop posts a `/heartbeat` (true daily-active signal +
4+
a PII-free config snapshot) on launch and hourly. PostHog feature events ride the same consent gate
5+
and the same install id, added on top of this foundation.
6+
7+
The two install ids live in the neutral [`crate::install_id`] module (not here), so the crash and
8+
error reporters can depend on it without pulling in `analytics`.
9+
10+
## Two ids that never meet
11+
12+
We mint two random per-install ids (both in `install_id.rs`, both written to one Rust-owned
13+
`install-ids.json`):
14+
15+
- `anal_<uuid>` ([`install_id::analytics_id`]): the heartbeat key and the PostHog `distinct_id`.
16+
Never attached to a crash or error report.
17+
- `diag_<uuid>` ([`install_id::diagnostics_id`]): attached only to crash and error reports so
18+
sequential reports from one install group together. Never sent through the analytics pipeline.
19+
20+
Why two: a tester can voluntarily attach their email to a report so we can reply. If reports carried
21+
the analytics id, then email → analytics-id → the install's whole usage history would be joinable on
22+
our servers, exactly the linkage we promise not to have. With a separate `diag_` id, an attached
23+
email links only to the diagnostics stream; the analytics stream stays unjoinable to any identity.
24+
The GDPR principle: if two datasets *can* be joined, treat them as joined, so we make them genuinely
25+
unjoinable. The `anal_`/`diag_` prefixes are intentional and make the ids self-identifying in
26+
payloads, PostHog, and the D1 tables.
27+
28+
## Why the ids are Rust-owned, AppHandle-free files
29+
30+
The ids live in `install-ids.json`, not `settings.json`, and resolve their data dir WITHOUT an
31+
`AppHandle` (mirroring `settings/loader.rs`'s `early_load_*` helpers: `CMDR_DATA_DIR` if set, else
32+
the OS default for the bundle id).
33+
34+
- **Rust-owned, not a setting**: the frontend owns every `settings.json` write; Rust only reads it.
35+
Minting an id into `settings.json` from Rust would race the frontend's store ownership on first
36+
launch. A separate Rust-owned file sidesteps that.
37+
- **AppHandle-free accessors**: `analytics_id()` / `diagnostics_id()` stay no-arg so they're
38+
callable from the panic hook, next-launch crash assembly, and the analytics loop alike, none of
39+
which always have an `AppHandle` at hand.
40+
41+
**Signal-safety**: the crash signal handler is async-signal-safe (no alloc, no locks). It must NOT
42+
call `diagnostics_id()`. `install_id::init()` (run at startup) snapshots the diag id into a cheap
43+
`OnceLock<String>` that the panic-hook path reads via `diagnostics_id_snapshot()`; the signal path
44+
attaches the diag id at next-launch report assembly (full stdlib), not inside the handler.
45+
46+
## Consent is tri-state, default-on, fully-silent opt-out
47+
48+
The opt-out is `analytics.enabled` in `settings.json`. The frontend store only persists non-default
49+
values, so an opted-in install has NO key. The gate
50+
([`analytics_consent_granted`](mod.rs)) treats:
51+
52+
- `None` (no key, the opted-in default) → granted.
53+
- `Some(true)` → granted.
54+
- `Some(false)` → opted out.
55+
56+
This single helper is the one consent gate for both the heartbeat loop and (later) `track_event`.
57+
Opt-out is **fully silent**: an opted-out install sends nothing at all, not even an "I opted out"
58+
bit (that would be collecting from someone who declined). So we can't measure the opt-out rate from
59+
this channel; estimate it indirectly from the update-check denominator (everyone sends update
60+
checks regardless of this toggle).
61+
62+
## Dev/CI suppression and the FORCE override
63+
64+
`suppressed()` returns true in debug builds and when `CI` is set, so dev runs and CI never pollute
65+
production analytics. Set `CMDR_ANALYTICS_FORCE=1` to override and force beats even in a debug/CI
66+
build, so an integration test can drive the loop against a localhost Worker.
67+
68+
## PII-free by allowlist, never by redaction
69+
70+
`config_shape.rs` builds the config snapshot by an explicit allowlist, never by redacting a free-form
71+
blob. Settings hold SMB hostnames, paths, recent lists, AI key refs, and the beta email, all as
72+
strings, so a denylist would eventually leak one. The rule (the ONE place it lives):
73+
74+
- Include every key whose JSON value is a boolean or number (auto-extends as new bool/number
75+
settings land, zero maintenance; bools/numbers are PII-free by nature).
76+
- Plus the small `CATEGORICAL_STRING_KEYS` allowlist: categorical enum-strings (theme, app color,
77+
size/date palettes, date format, density, size display/unit, sort mode, AI provider, etc.) that
78+
are non-PII despite being strings.
79+
- Exclude every other string, and all objects and arrays.
80+
- Add `fdaGranted` (runtime state, not a setting) explicitly.
81+
82+
The `excludes_pii_shaped_strings` test is the privacy invariant: a seeded settings JSON with an
83+
email, an SMB host, a recents list, and a path produces a snapshot containing none of them. When you
84+
add a new categorical string setting worth shipping, add its id to `CATEGORICAL_STRING_KEYS`; never
85+
loosen the bool/number rule to "include all strings."
86+
87+
Hard nevers across the whole pipeline: file names, contents, paths, search queries, AI prompts,
88+
keystrokes, screenshots.
89+
90+
## Heartbeat payload
91+
92+
`HeartbeatPayload` (camelCase on the wire, `Option::None``null`) matches the M2 Worker's
93+
validator:
94+
95+
- `analId` (required): `anal_` + lowercase hyphenated v4 UUID, `^anal_[0-9a-f-]{36}$`.
96+
- `appVersion` (required): semver from `CARGO_PKG_VERSION`.
97+
- `osVersion` (required): from the shared `crate::platform::os_version()`, always non-empty.
98+
- `arch` (required): `std::env::consts::ARCH`.
99+
- `buildMode` (optional): `"release"` / `"debug"`.
100+
- `config` (optional): the config-shape object, stored verbatim.
101+
102+
Fire-and-forget POST mirroring the crash/error reporters (10 s timeout, errors logged at debug, the
103+
next hourly tick retries). Endpoint: `http://localhost:8787/heartbeat` (debug) /
104+
`https://api.getcmdr.com/heartbeat` (release).
105+
106+
## Files
107+
108+
- `mod.rs`: the heartbeat loop (launch beat + hourly), the consent gate, the payload struct, the
109+
fire-and-forget send. `init(app)` + `start()` mirror `space_poller`'s spawn pattern, wired from
110+
`lib.rs` setup.
111+
- `config_shape.rs`: the pure, unit-tested config-shape builder and the `CATEGORICAL_STRING_KEYS`
112+
allowlist. The only place the PII-free rule lives.
113+
114+
## Wiring
115+
116+
`analytics::init(app.handle())` + `analytics::start()` run from `lib.rs` setup, alongside
117+
`space_poller`. `install_id::init()` runs earlier (before the crash reporter) to snapshot the diag
118+
id for the panic hook.
Lines changed: 175 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,175 @@
1+
//! Builds the PII-free config-shape snapshot shipped with each heartbeat (and, later, mirrored as
2+
//! PostHog person properties).
3+
//!
4+
//! This module owns the ONE rule for what's in the snapshot, by allowlist, never by redaction (see
5+
//! `analytics/CLAUDE.md` § "PII-free by allowlist"). Settings hold SMB hostnames, paths, recent
6+
//! lists, AI key refs, and the beta email, all as strings, so a denylist would eventually leak one.
7+
//!
8+
//! The rule (David: "the whole config except string fields"):
9+
//!
10+
//! - Include every key whose JSON value is a boolean or a number. Bools and numbers are PII-free by
11+
//! nature, so this auto-extends as new bool/number settings land, zero maintenance.
12+
//! - Plus the small [`CATEGORICAL_STRING_KEYS`] allowlist: categorical enum-strings (theme, view
13+
//! preferences, AI mode, sort mode, etc.) that are non-PII despite being strings.
14+
//! - Exclude every other string, and all objects and arrays.
15+
//! - Add `fdaGranted` explicitly (it's runtime state, not a setting).
16+
17+
use serde_json::{Map, Value};
18+
19+
/// The categorical enum-string settings worth keeping. These hold a fixed, non-PII vocabulary
20+
/// (light/dark/system, off/cloud/local, etc.), so they're safe to ship even though they're strings.
21+
///
22+
/// Deliberately excludes free-text and identifier strings: `appearance.customDateTimeFormat` (user
23+
/// free-text), `ai.cloudProviderConfigs` (a JSON blob with per-provider model/baseUrl),
24+
/// `behavior.fileSystemWatching.globalGoToLatestShortcut.binding` (a key combo), and
25+
/// `analytics.email` (PII). Those stay out by being absent from this list.
26+
const CATEGORICAL_STRING_KEYS: &[&str] = &[
27+
"theme.mode",
28+
"appearance.appColor",
29+
"appearance.sizeColors",
30+
"appearance.dateColors",
31+
"appearance.dateTimeFormat",
32+
"appearance.uiDensity",
33+
"appearance.fileSizeFormat",
34+
"appearance.tintLocal",
35+
"appearance.tintSmb",
36+
"appearance.tintMtp",
37+
"listing.sizeDisplay",
38+
"listing.sizeUnit",
39+
"listing.directorySortMode",
40+
"listing.briefColumnWidthMode",
41+
"fileOperations.allowFileExtensionChanges",
42+
"behavior.fileSystemWatching.downloadsNotifications",
43+
"behavior.fileSystemWatching.lowDiskSpaceNotifications",
44+
"ai.provider",
45+
"ai.cloudProvider",
46+
"ai.localContextSize",
47+
"network.timeoutMode",
48+
];
49+
50+
/// Builds the config-shape object from the raw `settings.json` value plus the runtime FDA-granted
51+
/// flag. Pure: no I/O, so it's directly unit-testable against a seeded settings JSON.
52+
///
53+
/// `raw_settings` is the parsed `settings.json` (a flat object with dot-notation string keys). A
54+
/// non-object value (missing/corrupt file) yields a snapshot with only `fdaGranted`.
55+
pub fn build_config_shape(raw_settings: &Value, fda_granted: bool) -> Value {
56+
let mut shape = Map::new();
57+
58+
if let Some(obj) = raw_settings.as_object() {
59+
for (key, value) in obj {
60+
if include_key(key, value) {
61+
shape.insert(key.clone(), value.clone());
62+
}
63+
}
64+
}
65+
66+
// FDA-granted is runtime state, not a setting, so add it explicitly. Last so it can't be
67+
// shadowed by a (nonexistent) same-named setting.
68+
shape.insert("fdaGranted".to_string(), Value::Bool(fda_granted));
69+
70+
Value::Object(shape)
71+
}
72+
73+
/// The allowlist decision for one key/value pair: bools and numbers always pass; strings pass only
74+
/// if categorical; everything else (objects, arrays, null) is excluded.
75+
fn include_key(key: &str, value: &Value) -> bool {
76+
match value {
77+
Value::Bool(_) | Value::Number(_) => true,
78+
Value::String(_) => CATEGORICAL_STRING_KEYS.contains(&key),
79+
_ => false,
80+
}
81+
}
82+
83+
#[cfg(test)]
84+
mod tests {
85+
use super::*;
86+
use serde_json::json;
87+
88+
#[test]
89+
fn includes_bools_and_numbers() {
90+
let settings = json!({
91+
"showHiddenFiles": true,
92+
"listing.briefColumnWidthMaxPx": 320,
93+
"appearance.textSize": 125.0,
94+
});
95+
let shape = build_config_shape(&settings, false);
96+
assert_eq!(shape["showHiddenFiles"], json!(true));
97+
assert_eq!(shape["listing.briefColumnWidthMaxPx"], json!(320));
98+
assert_eq!(shape["appearance.textSize"], json!(125.0));
99+
}
100+
101+
#[test]
102+
fn includes_categorical_string_keys() {
103+
let settings = json!({
104+
"theme.mode": "dark",
105+
"ai.provider": "cloud",
106+
"listing.directorySortMode": "name",
107+
});
108+
let shape = build_config_shape(&settings, false);
109+
assert_eq!(shape["theme.mode"], json!("dark"));
110+
assert_eq!(shape["ai.provider"], json!("cloud"));
111+
assert_eq!(shape["listing.directorySortMode"], json!("name"));
112+
}
113+
114+
#[test]
115+
fn excludes_pii_shaped_strings() {
116+
// This is the privacy invariant: PII-shaped string values must NOT appear in the snapshot.
117+
let settings = json!({
118+
"analytics.email": "person@example.com",
119+
"network.lastHost": "smb://192.168.1.42/share",
120+
"fileExplorer.recentPaths": "/Users/dave/secret",
121+
"appearance.customDateTimeFormat": "YYYY-MM-DD",
122+
"ai.cloudProviderConfigs": "{\"openai\":{\"baseUrl\":\"https://api.openai.com\"}}",
123+
"behavior.fileSystemWatching.globalGoToLatestShortcut.binding": "\u{2303}\u{2325}\u{2318}J",
124+
});
125+
let shape = build_config_shape(&settings, false);
126+
let obj = shape.as_object().expect("object");
127+
128+
// None of the PII-shaped keys are present.
129+
assert!(!obj.contains_key("analytics.email"));
130+
assert!(!obj.contains_key("network.lastHost"));
131+
assert!(!obj.contains_key("fileExplorer.recentPaths"));
132+
assert!(!obj.contains_key("appearance.customDateTimeFormat"));
133+
assert!(!obj.contains_key("ai.cloudProviderConfigs"));
134+
assert!(!obj.contains_key("behavior.fileSystemWatching.globalGoToLatestShortcut.binding"));
135+
136+
// And no value in the whole snapshot carries the PII substrings, by construction.
137+
let serialized = shape.to_string();
138+
assert!(!serialized.contains("person@example.com"), "email leaked: {serialized}");
139+
assert!(!serialized.contains("192.168.1.42"), "host leaked: {serialized}");
140+
assert!(!serialized.contains("/Users/dave"), "path leaked: {serialized}");
141+
}
142+
143+
#[test]
144+
fn excludes_objects_and_arrays() {
145+
let settings = json!({
146+
"someObject": { "nested": true },
147+
"someArray": [1, 2, 3],
148+
"someNull": null,
149+
});
150+
let shape = build_config_shape(&settings, false);
151+
let obj = shape.as_object().expect("object");
152+
assert!(!obj.contains_key("someObject"));
153+
assert!(!obj.contains_key("someArray"));
154+
assert!(!obj.contains_key("someNull"));
155+
}
156+
157+
#[test]
158+
fn adds_fda_granted_explicitly() {
159+
let shape = build_config_shape(&json!({}), true);
160+
assert_eq!(shape["fdaGranted"], json!(true));
161+
162+
let shape_denied = build_config_shape(&json!({}), false);
163+
assert_eq!(shape_denied["fdaGranted"], json!(false));
164+
}
165+
166+
#[test]
167+
fn non_object_settings_yields_only_fda() {
168+
// A missing/corrupt settings file parses to something non-object; the snapshot still has
169+
// a valid shape carrying just the runtime flag.
170+
let shape = build_config_shape(&json!("not an object"), true);
171+
let obj = shape.as_object().expect("object");
172+
assert_eq!(obj.len(), 1);
173+
assert_eq!(obj["fdaGranted"], json!(true));
174+
}
175+
}

0 commit comments

Comments
 (0)