Skip to content

fix: scroll element into view before click, hover, tap, and drag#1073

Open
jin-2-kakaoent wants to merge 12 commits into
vercel-labs:mainfrom
hyunjinee:fix/scroll-before-click-1044
Open

fix: scroll element into view before click, hover, tap, and drag#1073
jin-2-kakaoent wants to merge 12 commits into
vercel-labs:mainfrom
hyunjinee:fix/scroll-before-click-1044

Conversation

@jin-2-kakaoent

@jin-2-kakaoent jin-2-kakaoent commented Mar 29, 2026

Copy link
Copy Markdown
Contributor

Closes #1044

Summary

  • Automatically scroll elements into view before click, hover, tap, and drag using CDP DOM.scrollIntoViewIfNeeded with JS fallback
  • Return an error when an element cannot be scrolled into the viewport (eliminates silent failure)
  • Detect detached (stale) nodes lazily via isConnected check inside the existing getBoundingClientRect JS call — zero extra CDP round-trips on the happy path
  • Retry with a fresh accessibility tree lookup when a detached node is detected, preserving the existing stale-ref fallback behavior
  • Fix drag coordinate invalidation: mouseDown at source before scrolling to target, preventing viewport-relative coords from going stale

Root cause

Two issues combined to produce the silent failure:

1. No auto-scroll before interactions

snapshot uses Accessibility.getFullAXTree which returns all elements regardless of viewport position. When interacting with an off-screen element, Input.dispatchMouseEvent / Input.dispatchTouchEvent was dispatched to coordinates outside the viewport — CDP doesn't error on this, so the interaction silently had no effect.

2. Coordinate system mismatch in resolve_element_center

DOM.getBoxModel returns page-absolute coordinates while Input.dispatchMouseEvent expects viewport-relative coordinates. After scrolling, these diverge and interactions miss the target.

Changes

  • get_center_and_viewport: uses getBoundingClientRect for viewport-relative coordinates, returns CenterResult::Found or CenterResult::Detached
  • scroll_into_view_if_needed: CDP DOM.scrollIntoViewIfNeeded with JS scrollIntoView fallback (errors propagated)
  • assert_in_viewport: validates element is within viewport after scrolling
  • resolve_element_object_id_fresh: skips cached backend_node_id for retry after detached node detection
  • resolve_scroll_and_center: shared helper for click/hover/tap/drag — resolve, scroll, get center, retry on detach
  • handle_drag: reordered to mouseDown at source before scrolling to target, preventing source coordinate invalidation
  • Removed dead code: resolve_element_center, box_model_center

Test plan

  • E2E: e2e_offscreen_scroll_before_interactions — click, hover, click-by-ref, tap, and drag on off-screen elements in a single unified test
  • E2E: e2e_click_stale_ref_falls_back_to_role_name — stale ref fallback still works
  • Manual: off-screen button clicked, hovered, tapped, and dragged via CLI — all verified
  • cargo fmt + cargo clippy clean
  • All E2E tests pass

Clicking off-screen elements silently dispatched mouse events to
coordinates outside the viewport, causing the click to have no effect
without returning an error.

- Add `scroll_into_view_if_needed` using CDP `DOM.scrollIntoViewIfNeeded`
  with JS `scrollIntoView` fallback for unsupported environments
- Add `assert_in_viewport` to return an error when an element cannot be
  scrolled into view
- Unify `resolve_element_center` to always use `getBoundingClientRect`
  (viewport-relative) instead of `DOM.getBoxModel` (page-absolute)
- Extract `get_center_and_viewport` to fetch element center and viewport
  dimensions in a single CDP call

Closes vercel-labs#1044
@vercel

vercel Bot commented Mar 29, 2026

Copy link
Copy Markdown
Contributor

@hyunjinee is attempting to deploy a commit to the Vercel Labs Team on Vercel.

A member of the Team first needs to authorize it.

…abs#1044)

Verify that click, hover, and click-by-ref on off-screen elements
auto-scroll into view before dispatching the interaction.
DOM.resolveNode succeeds for detached nodes, which broke the stale-ref
fallback path. Instead of adding an extra CDP round-trip to check
isConnected, fold the check into the existing getBoundingClientRect JS
call (zero overhead on the happy path) and retry with a fresh
accessibility tree lookup when a detached node is detected.
@ctate

ctate commented Mar 29, 2026

Copy link
Copy Markdown
Collaborator

@jin-2-kakaoent LGTM! One minor follow-up to consider: tap_touch and drag still use the old resolve_element_center path without auto-scroll, so they'd benefit from the same treatment.

…ck-1044

# Conflicts:
#	cli/src/native/e2e_tests.rs
jin-2-kakaoent pushed a commit to hyunjinee/agent-browser that referenced this pull request Mar 30, 2026
…alidation

- tap_touch and handle_drag now use resolve_scroll_and_center for
  auto-scroll + detached-node retry (addresses review feedback on vercel-labs#1073)
- Fix drag ordering: mouseDown at source before scrolling to target,
  preventing source coordinates from being invalidated
- Remove unused resolve_element_center (dead code after migration)
- Unify e2e offscreen tests into a single e2e_offscreen_scroll_before_interactions
…alidation

- tap_touch and handle_drag now use resolve_scroll_and_center for
  auto-scroll + detached-node retry (addresses review feedback on vercel-labs#1073)
- Fix drag ordering: mouseDown at source before scrolling to target,
  preventing source coordinates from being invalidated
- Remove unused resolve_element_center (dead code after migration)
- Unify e2e offscreen tests into a single e2e_offscreen_scroll_before_interactions
jin-2-kakaoent pushed a commit to hyunjinee/agent-browser that referenced this pull request Mar 30, 2026
…alidation

- tap_touch and handle_drag now use resolve_scroll_and_center for
  auto-scroll + detached-node retry (addresses review feedback on vercel-labs#1073)
- Fix drag ordering: mouseDown at source before scrolling to target,
  preventing source coordinates from being invalidated
- Remove unused resolve_element_center (dead code after migration)
- Unify e2e offscreen tests into a single e2e_offscreen_scroll_before_interactions
@jin-2-kakaoent jin-2-kakaoent force-pushed the fix/scroll-before-click-1044 branch from 2ee77ed to f2c4d7d Compare March 30, 2026 01:09
- Remove unused box_model_center and its test
- Propagate JS scrollIntoView fallback error instead of silently ignoring
@jin-2-kakaoent jin-2-kakaoent changed the title fix: scroll element into view before click and hover fix: scroll element into view before click, hover, tap, and drag Mar 30, 2026
@jin-2-kakaoent

Copy link
Copy Markdown
Contributor Author

@ctate Thanks for the feedback! I've applied resolve_scroll_and_center (auto-scroll + detached node retry) to both tap_touch and handle_drag.

All e2e + manual tests pass.

Resolve conflict in actions.rs: keep resolve_scroll_and_center
call from branch with improved comment from main.
…ck-1044

# Conflicts:
#	cli/src/native/e2e_tests.rs
…lick-1044

# Conflicts:
#	cli/src/native/e2e_tests.rs
@Clarkkkk

Copy link
Copy Markdown

Really need this! Cost me an hour to debug the issue.

@mutewinter

Copy link
Copy Markdown

Agreed on this being essential. I just ran into this issue and made multiple changes trying to work around it, without realizing it was a fundamental difference between agent browser and Playwright behavior.

@jin-2-kakaoent

Copy link
Copy Markdown
Contributor Author

@ctate Could you take a look when you have a chance?

@ctate

ctate commented May 5, 2026

Copy link
Copy Markdown
Collaborator

@jin-2-kakaoent Thanks for your patience. Implementation looks great. Can you please sign your commits before merging?

@jin-2-kakaoent jin-2-kakaoent force-pushed the fix/scroll-before-click-1044 branch from bcd4b70 to 82b72be Compare May 19, 2026 06:12
jin-2-kakaoent pushed a commit to hyunjinee/agent-browser that referenced this pull request May 19, 2026
…alidation

- tap_touch and handle_drag now use resolve_scroll_and_center for
  auto-scroll + detached-node retry (addresses review feedback on vercel-labs#1073)
- Fix drag ordering: mouseDown at source before scrolling to target,
  preventing source coordinates from being invalidated
- Remove unused resolve_element_center (dead code after migration)
- Unify e2e offscreen tests into a single e2e_offscreen_scroll_before_interactions
@jin-2-kakaoent jin-2-kakaoent force-pushed the fix/scroll-before-click-1044 branch from 82b72be to 1fd4226 Compare May 19, 2026 06:18
@hyunjinee

Copy link
Copy Markdown
Contributor

@ctate Sorry for the delay!

All commits are now signed and you can confirm the Verified badges on the Commits tab:
https://github.com/vercel-labs/agent-browser/pull/1073/commits
I also synced the branch with the latest upstream/main, so the PR is mergeable and ready whenever you are.
Thanks again for the review.

@mutewinter

Copy link
Copy Markdown

@ctate any chance we can get this merged now that the commits are signed? Our agents keep hitting this issue (we've got special guidance to help them avoid it as much as possible, but it'd be so much better encoded into the library).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Clicking an invisible element with agent-browser seems to have no effect, yet the command doesn't throw an error?

5 participants