You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Version: agent-browser 0.27.2 (macOS arm64, Chrome 149; also reproduced on 0.25.3 Linux/Docker)
Summary
Snapshot element refs (e1, e2, …) are positional within the latest snapshot's registry. When the DOM mutates and a new snapshot renumbers the registry, a ref remembered from the previous snapshot resolves against the new numbering — and silently clicks/fills whichever element now occupies that number, returning ✓ Done / exit 0.
Out-of-range refs already fail safely (✗ Unknown ref: e9 in cli/src/native/element.rs) — this ask is narrowly the remembered-ref-after-renumbering case: the ref still exists, but denotes a different element than when it was emitted. For LLM agents, that's the worst failure shape: not an error, not a no-op, but a wrong action reported as success.
Minimal repro (5 commands, no real site needed)
agent-browser open 'data:text/html,<button onclick="document.title=this.textContent">Alpha</button><button onclick="document.title=this.textContent">Beta</button><button onclick="document.title=this.textContent">Gamma</button>'
agent-browser snapshot -i
# - button "Alpha" [ref=e1]# - button "Beta" [ref=e2] <- agent remembers: e2 == Beta# - button "Gamma" [ref=e3]# Page inserts a node (what cmdk/Radix multi-selects do on every committed selection)
agent-browser eval"document.body.insertBefore(Object.assign(document.createElement('button'),{textContent:'NEW'}),document.body.firstChild); 'ok'"
agent-browser snapshot -i
# - button "NEW" [ref=e1]# - button "Alpha" [ref=e2] <- everything shifted +1# - button "Beta" [ref=e3]# - button "Gamma" [ref=e4]
agent-browser click @e2 # agent still believes e2 == Beta# ✓ Done
agent-browser get title
# Alpha <- wrong element clicked, success reported
(Without the second snapshot, click @e2 correctly hits Beta — the old registry mapping is honored. The hazard is specifically remembered-ref + renumbering.)
Real-world impact
Combobox/multi-select widgets (cmdk, Radix, HeadlessUI) insert a DOM node per committed selection, shifting every later ref +1 — an LLM agent selecting multiple options from one snapshot silently checks adjacent options on each subsequent click. In a large agent-automation audit of ours, this mechanism shipped wrong category selections to multiple live product-directory listings, each wrong click acknowledged with ✓ Done. #853 documents the same renumbering in the wild (Google Maps feed re-render: Hostel Siri e36→e28, sequential extraction grabbing wrong items), and #1351's measurement that ~30% of agent calls are defensive re-snapshots "because refs can become stale" quantifies the tax agents pay working around it.
Prior art
PR Make refs strict across snapshots #1112 ("Make refs strict across snapshots") described this exact defect and fix shape — "Fail stale refs instead of silently rebinding them after DOM changes, and preserve ref ids for surviving nodes across repeated snapshots" — but was self-closed seconds after opening and never reviewed; no successor exists.
fix: re-query accessibility tree when backend_node_id is stale #806 (merged, v0.20.6) handles the sibling mode — a ref whose cached backend_node_id died is re-queried by role+name — but doesn't apply here: after renumbering, the registry entry is perfectly valid, just bound to a different element.
Snapshot epoch validation — emit a generation marker with each snapshot (in the ref, e.g. e42@s7, or a header line) and have action commands reject refs minted under an older generation with a teaching error, mirroring the bare-integer tab id error: ✗ ref e2 is from snapshot 6; the registry renumbered (now 7) — re-snapshot and use fresh refs.
Happy to contribute a PR for either direction if maintainers indicate a preference.
Version: agent-browser 0.27.2 (macOS arm64, Chrome 149; also reproduced on 0.25.3 Linux/Docker)
Summary
Snapshot element refs (
e1,e2, …) are positional within the latest snapshot's registry. When the DOM mutates and a new snapshot renumbers the registry, a ref remembered from the previous snapshot resolves against the new numbering — and silently clicks/fills whichever element now occupies that number, returning✓ Done/ exit 0.Out-of-range refs already fail safely (
✗ Unknown ref: e9incli/src/native/element.rs) — this ask is narrowly the remembered-ref-after-renumbering case: the ref still exists, but denotes a different element than when it was emitted. For LLM agents, that's the worst failure shape: not an error, not a no-op, but a wrong action reported as success.Minimal repro (5 commands, no real site needed)
(Without the second snapshot,
click @e2correctly hits Beta — the old registry mapping is honored. The hazard is specifically remembered-ref + renumbering.)Real-world impact
Combobox/multi-select widgets (cmdk, Radix, HeadlessUI) insert a DOM node per committed selection, shifting every later ref +1 — an LLM agent selecting multiple options from one snapshot silently checks adjacent options on each subsequent click. In a large agent-automation audit of ours, this mechanism shipped wrong category selections to multiple live product-directory listings, each wrong click acknowledged with
✓ Done. #853 documents the same renumbering in the wild (Google Maps feed re-render:Hostel Sirie36→e28, sequential extraction grabbing wrong items), and #1351's measurement that ~30% of agent calls are defensive re-snapshots "because refs can become stale" quantifies the tax agents pay working around it.Prior art
backend_node_iddied is re-queried by role+name — but doesn't apply here: after renumbering, the registry entry is perfectly valid, just bound to a different element.ref_mapleakage across tab switches precisely because "it could click the wrong one" — precedent that silent wrong-element actuation is treated as a bug.t1convention explicitly mirrors, per PR feat(tabs): t<N> prefix for tab ids; --label for named tabs; drop --tab peek flag #1250) are the remaining positional handles.Proposals (either resolves it)
Bind refs to stable node identity — mint refs against
backendNodeIdand preserve ids for surviving nodes across snapshots (PR Make refs strict across snapshots #1112's shape). A remembered ref keeps meaning the node it was minted for; if detached, fail likeUnknown refdoes today:✗ stale ref: node no longer in DOM — take a new snapshot. As a bonus this directly reduces the defensive re-snapshot churn measured in High LLM turn count due to frequentsnapshotcalls when usingagent-browserskills #1351.Snapshot epoch validation — emit a generation marker with each snapshot (in the ref, e.g.
e42@s7, or a header line) and have action commands reject refs minted under an older generation with a teaching error, mirroring the bare-integer tab id error:✗ ref e2 is from snapshot 6; the registry renumbered (now 7) — re-snapshot and use fresh refs.Happy to contribute a PR for either direction if maintainers indicate a preference.