Skip to content

Stale snapshot refs silently actuate the wrong element after DOM-mutation renumbering #1443

@iddogino

Description

@iddogino

Version: agent-browser 0.27.2 (macOS arm64, Chrome 149; also reproduced on 0.25.3 Linux/Docker)

Summary

Snapshot element refs (e1, e2, …) are positional within the latest snapshot's registry. When the DOM mutates and a new snapshot renumbers the registry, a ref remembered from the previous snapshot resolves against the new numbering — and silently clicks/fills whichever element now occupies that number, returning ✓ Done / exit 0.

Out-of-range refs already fail safely (✗ Unknown ref: e9 in cli/src/native/element.rs) — this ask is narrowly the remembered-ref-after-renumbering case: the ref still exists, but denotes a different element than when it was emitted. For LLM agents, that's the worst failure shape: not an error, not a no-op, but a wrong action reported as success.

Minimal repro (5 commands, no real site needed)

agent-browser open 'data:text/html,<button onclick="document.title=this.textContent">Alpha</button><button onclick="document.title=this.textContent">Beta</button><button onclick="document.title=this.textContent">Gamma</button>'

agent-browser snapshot -i
# - button "Alpha" [ref=e1]
# - button "Beta"  [ref=e2]   <- agent remembers: e2 == Beta
# - button "Gamma" [ref=e3]

# Page inserts a node (what cmdk/Radix multi-selects do on every committed selection)
agent-browser eval "document.body.insertBefore(Object.assign(document.createElement('button'),{textContent:'NEW'}),document.body.firstChild); 'ok'"

agent-browser snapshot -i
# - button "NEW"   [ref=e1]
# - button "Alpha" [ref=e2]   <- everything shifted +1
# - button "Beta"  [ref=e3]
# - button "Gamma" [ref=e4]

agent-browser click @e2      # agent still believes e2 == Beta
# ✓ Done
agent-browser get title
# Alpha                       <- wrong element clicked, success reported

(Without the second snapshot, click @e2 correctly hits Beta — the old registry mapping is honored. The hazard is specifically remembered-ref + renumbering.)

Real-world impact

Combobox/multi-select widgets (cmdk, Radix, HeadlessUI) insert a DOM node per committed selection, shifting every later ref +1 — an LLM agent selecting multiple options from one snapshot silently checks adjacent options on each subsequent click. In a large agent-automation audit of ours, this mechanism shipped wrong category selections to multiple live product-directory listings, each wrong click acknowledged with ✓ Done. #853 documents the same renumbering in the wild (Google Maps feed re-render: Hostel Siri e36→e28, sequential extraction grabbing wrong items), and #1351's measurement that ~30% of agent calls are defensive re-snapshots "because refs can become stale" quantifies the tax agents pay working around it.

Prior art

Proposals (either resolves it)

  1. Bind refs to stable node identity — mint refs against backendNodeId and preserve ids for surviving nodes across snapshots (PR Make refs strict across snapshots #1112's shape). A remembered ref keeps meaning the node it was minted for; if detached, fail like Unknown ref does today: ✗ stale ref: node no longer in DOM — take a new snapshot. As a bonus this directly reduces the defensive re-snapshot churn measured in High LLM turn count due to frequent snapshot calls when using agent-browser skills #1351.

  2. Snapshot epoch validation — emit a generation marker with each snapshot (in the ref, e.g. e42@s7, or a header line) and have action commands reject refs minted under an older generation with a teaching error, mirroring the bare-integer tab id error: ✗ ref e2 is from snapshot 6; the registry renumbered (now 7) — re-snapshot and use fresh refs.

Happy to contribute a PR for either direction if maintainers indicate a preference.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions