Skip to content

Commit 688e285

Browse files
authored
Detect click interception by overlays (#1434)
* Detect click interception by overlays - Hit-test the click point during element resolution and fail with a description of the covering element (e.g. "covered by <p inside div#consent-banner>") instead of dispatching input that silently lands on a cookie banner, modal, or sticky header - Applies to CSS-selector, snapshot-ref, and iframe resolve paths; the test descends through same-origin frames and starts from the top document so iframe content is checked in the right coordinate space - Shadow-DOM-aware relation checks plus label/control association prevent false positives on web components and custom checkboxes styled over hidden inputs * Add covered click docs and regression test
1 parent 4aa3368 commit 688e285

7 files changed

Lines changed: 256 additions & 4 deletions

File tree

README.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,10 @@ agent-browser screenshot page.png
9090
agent-browser close
9191
```
9292

93+
Clicks fail early when another element covers the target's click point,
94+
for example a consent banner or modal. Dismiss or interact with the reported
95+
covering element, then take a fresh snapshot before retrying the original ref.
96+
9397
Headless Chromium screenshots hide native scrollbars for consistent image output.
9498
Pass `--hide-scrollbars false` when launching to keep native scrollbars visible.
9599

@@ -883,6 +887,10 @@ agent-browser get text @e1 # Get heading text
883887
agent-browser hover @e4 # Hover the link
884888
```
885889

890+
When a ref click is blocked by an overlay, the error includes the covering
891+
element, such as `covered by <div#consent-banner>`. Click the banner or dialog
892+
control first, then run `snapshot` again before reusing refs.
893+
886894
**Why use refs?**
887895

888896
- **Deterministic**: Ref points to exact element from snapshot

cli/src/native/e2e_tests.rs

Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2702,6 +2702,89 @@ async fn e2e_error_handling() {
27022702
assert_success(&resp);
27032703
}
27042704

2705+
#[tokio::test]
2706+
#[ignore]
2707+
async fn e2e_click_reports_covering_overlay() {
2708+
let mut state = DaemonState::new();
2709+
2710+
let resp = execute_command(
2711+
&json!({ "id": "1", "action": "launch", "headless": true }),
2712+
&mut state,
2713+
)
2714+
.await;
2715+
assert_success(&resp);
2716+
2717+
let html = r#"
2718+
<html>
2719+
<body>
2720+
<button id="target" onclick="document.getElementById('result').textContent = 'clicked'">
2721+
Target
2722+
</button>
2723+
<div id="consent-banner" style="position:fixed;inset:0;z-index:10;background:rgba(0,0,0,0.1)">
2724+
<button id="dismiss" style="position:absolute;right:20px;bottom:20px"
2725+
onclick="document.getElementById('consent-banner').remove()">
2726+
Dismiss
2727+
</button>
2728+
</div>
2729+
<div id="result">idle</div>
2730+
</body>
2731+
</html>
2732+
"#;
2733+
let url = format!("data:text/html;base64,{}", STANDARD.encode(html));
2734+
let resp = execute_command(
2735+
&json!({ "id": "2", "action": "navigate", "url": url }),
2736+
&mut state,
2737+
)
2738+
.await;
2739+
assert_success(&resp);
2740+
2741+
let resp = execute_command(
2742+
&json!({ "id": "3", "action": "click", "selector": "#target" }),
2743+
&mut state,
2744+
)
2745+
.await;
2746+
assert_eq!(resp["success"], false, "covered target should fail: {resp}");
2747+
let error = resp["error"].as_str().unwrap_or_default();
2748+
assert!(
2749+
error.contains("covered by <div#consent-banner>"),
2750+
"unexpected covered-click error: {}",
2751+
error
2752+
);
2753+
2754+
let resp = execute_command(
2755+
&json!({ "id": "4", "action": "gettext", "selector": "#result" }),
2756+
&mut state,
2757+
)
2758+
.await;
2759+
assert_success(&resp);
2760+
assert_eq!(get_data(&resp)["text"], "idle");
2761+
2762+
let resp = execute_command(
2763+
&json!({ "id": "5", "action": "click", "selector": "#dismiss" }),
2764+
&mut state,
2765+
)
2766+
.await;
2767+
assert_success(&resp);
2768+
2769+
let resp = execute_command(
2770+
&json!({ "id": "6", "action": "click", "selector": "#target" }),
2771+
&mut state,
2772+
)
2773+
.await;
2774+
assert_success(&resp);
2775+
2776+
let resp = execute_command(
2777+
&json!({ "id": "7", "action": "gettext", "selector": "#result" }),
2778+
&mut state,
2779+
)
2780+
.await;
2781+
assert_success(&resp);
2782+
assert_eq!(get_data(&resp)["text"], "clicked");
2783+
2784+
let resp = execute_command(&json!({ "id": "99", "action": "close" }), &mut state).await;
2785+
assert_success(&resp);
2786+
}
2787+
27052788
// ---------------------------------------------------------------------------
27062789
// Profile cookie persistence across restarts
27072790
// ---------------------------------------------------------------------------

cli/src/native/element.rs

Lines changed: 149 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -231,7 +231,9 @@ async fn resolve_center_in_same_process_frame(
231231
y += frameRect.y + win.frameElement.clientTop;
232232
win = win.parent;
233233
}}
234-
return {{ x: x, y: y }};
234+
const blockerAt = {BLOCKER_AT_JS};
235+
const topDoc = win ? win.document : doc;
236+
return {{ x: x, y: y, blocker: blockerAt(topDoc, el, x, y) }};
235237
}}"#,
236238
);
237239
let result = client
@@ -246,6 +248,12 @@ async fn resolve_center_in_same_process_frame(
246248
)
247249
.await?;
248250
let value = result.get("result").and_then(|r| r.get("value"));
251+
if let Some(blocker) = value
252+
.and_then(|v| v.get("blocker"))
253+
.and_then(|v| v.as_str())
254+
{
255+
return Err(intercepted_error(selector, blocker));
256+
}
249257
let x = value.and_then(|v| v.get("x")).and_then(|v| v.as_f64());
250258
let y = value.and_then(|v| v.get("y")).and_then(|v| v.as_f64());
251259
match (x, y) {
@@ -320,6 +328,15 @@ pub async fn resolve_element_center(
320328

321329
if let Ok(r) = result {
322330
let (x, y) = box_model_center(&r.model);
331+
check_node_interception(
332+
client,
333+
effective_session_id,
334+
backend_node_id,
335+
selector_or_ref,
336+
x,
337+
y,
338+
)
339+
.await?;
323340
return Ok((x, y, effective_session_id.to_string()));
324341
}
325342
// backend_node_id is stale; re-query the accessibility tree below
@@ -349,6 +366,15 @@ pub async fn resolve_element_center(
349366
)
350367
.await?;
351368
let (x, y) = box_model_center(&result.model);
369+
check_node_interception(
370+
client,
371+
effective_session_id,
372+
fresh_id,
373+
selector_or_ref,
374+
x,
375+
y,
376+
)
377+
.await?;
352378
return Ok((x, y, effective_session_id.to_string()));
353379
}
354380

@@ -369,6 +395,73 @@ pub async fn resolve_element_center(
369395
Ok((x, y, session_id.to_string()))
370396
}
371397

398+
/// Hit-test a ref-resolved node at its computed click point and error if an
399+
/// unrelated element (overlay, banner, sticky header) would receive the input
400+
/// instead. Best effort: resolution failures skip the check rather than block
401+
/// the interaction.
402+
async fn check_node_interception(
403+
client: &CdpClient,
404+
session_id: &str,
405+
backend_node_id: i64,
406+
target: &str,
407+
x: f64,
408+
y: f64,
409+
) -> Result<(), String> {
410+
let resolved: Result<DomResolveNodeResult, String> = client
411+
.send_command_typed(
412+
"DOM.resolveNode",
413+
&DomResolveNodeParams {
414+
backend_node_id: Some(backend_node_id),
415+
node_id: None,
416+
object_group: Some("agent-browser".to_string()),
417+
},
418+
Some(session_id),
419+
)
420+
.await;
421+
let Ok(resolved) = resolved else {
422+
return Ok(());
423+
};
424+
let Some(object_id) = resolved.object.object_id else {
425+
return Ok(());
426+
};
427+
// Box-model coordinates are in the top-level viewport space, so the
428+
// hit-test starts from the top document. For an OOPIF node the
429+
// frameElement walk stops at the process boundary, where the frame's own
430+
// document and session-local coordinates are already consistent.
431+
let function = format!(
432+
r#"function(x, y) {{
433+
let topDoc = this.ownerDocument || document;
434+
while (topDoc.defaultView && topDoc.defaultView.frameElement) {{
435+
topDoc = topDoc.defaultView.frameElement.ownerDocument;
436+
}}
437+
const blockerAt = {BLOCKER_AT_JS};
438+
return blockerAt(topDoc, this, x, y);
439+
}}"#,
440+
);
441+
let result = client
442+
.send_command(
443+
"Runtime.callFunctionOn",
444+
Some(serde_json::json!({
445+
"objectId": object_id,
446+
"functionDeclaration": function,
447+
"arguments": [{ "value": x }, { "value": y }],
448+
"returnByValue": true,
449+
})),
450+
Some(session_id),
451+
)
452+
.await;
453+
if let Ok(value) = result {
454+
if let Some(blocker) = value
455+
.get("result")
456+
.and_then(|r| r.get("value"))
457+
.and_then(|v| v.as_str())
458+
{
459+
return Err(intercepted_error(target, blocker));
460+
}
461+
}
462+
Ok(())
463+
}
464+
372465
/// Coordinates from DOM.getBoxModel are viewport-relative, and input events
373466
/// only land inside the viewport, so make sure the node is visible first.
374467
/// Best effort: a node that cannot be scrolled (display:none, detached) will
@@ -628,10 +721,51 @@ fn build_count_elements_js(selector: &str) -> String {
628721
}
629722
}
630723

724+
/// JS function source for `blockerAt(doc, el, x, y)`: returns a short
725+
/// description of the element that would actually receive a click at (x, y)
726+
/// when that element is unrelated to `el`, or null when the click would land
727+
/// on `el` (or something that activates it). Relations that count as "lands
728+
/// on el": shadow-including ancestors/descendants in either direction, and
729+
/// label/control association (custom checkboxes hide the input under a styled
730+
/// sibling inside the same label).
731+
const BLOCKER_AT_JS: &str = r#"(doc, el, x, y) => {
732+
// Descend from the given document through same-origin iframes so a point
733+
// over a frame resolves to the element inside it, in that frame's space.
734+
let d = doc, lx = x, ly = y;
735+
let hit = d.elementFromPoint(lx, ly);
736+
while (hit && (hit.tagName === 'IFRAME' || hit.tagName === 'FRAME') && hit.contentDocument && hit !== el) {
737+
const r = hit.getBoundingClientRect();
738+
lx -= r.x + hit.clientLeft;
739+
ly -= r.y + hit.clientTop;
740+
d = hit.contentDocument;
741+
hit = d.elementFromPoint(lx, ly);
742+
}
743+
if (!hit || hit === el) return null;
744+
const up = (n) => n.parentNode || n.host || (n.getRootNode && n.getRootNode().host) || null;
745+
for (let n = hit; n; n = up(n)) { if (n === el) return null; }
746+
for (let n = el; n; n = up(n)) { if (n === hit) return null; }
747+
const hitLabel = hit.closest ? hit.closest('label') : null;
748+
if (hitLabel && (hitLabel.control === el || hitLabel.contains(el))) return null;
749+
const elLabel = el.closest ? el.closest('label') : null;
750+
if (elLabel && elLabel.contains(hit)) return null;
751+
let desc = hit.tagName.toLowerCase();
752+
if (hit.id) desc += '#' + hit.id;
753+
else if (typeof hit.className === 'string' && hit.className.trim())
754+
desc += '.' + hit.className.trim().split(/\s+/).slice(0, 2).join('.');
755+
if (!hit.id && hit.closest) {
756+
const anchored = hit.closest('[id]');
757+
if (anchored && anchored !== hit)
758+
desc += ' inside ' + anchored.tagName.toLowerCase() + '#' + anchored.id;
759+
}
760+
return desc;
761+
}"#;
762+
631763
fn build_selector_js(selector: &str) -> String {
632764
let find_expr = build_find_element_js(selector);
633765
// Input events dispatch at viewport coordinates, so an element outside the
634766
// viewport must be scrolled into view first or the click lands on nothing.
767+
// The blocker check reports an overlay covering the click point instead of
768+
// letting the input land on it and silently doing the wrong thing.
635769
format!(
636770
r#"(() => {{
637771
const el = {find_expr};
@@ -645,7 +779,10 @@ fn build_selector_js(selector: &str) -> String {
645779
el.scrollIntoView({{ block: 'center', inline: 'center', behavior: 'instant' }});
646780
rect = el.getBoundingClientRect();
647781
}}
648-
return {{ x: rect.x + rect.width / 2, y: rect.y + rect.height / 2 }};
782+
const x = rect.x + rect.width / 2;
783+
const y = rect.y + rect.height / 2;
784+
const blockerAt = {BLOCKER_AT_JS};
785+
return {{ x: x, y: y, blocker: blockerAt(document, el, x, y) }};
649786
}})()"#,
650787
)
651788
}
@@ -670,6 +807,9 @@ async fn resolve_by_selector(
670807
.await?;
671808

672809
let val = result.result.value.unwrap_or(Value::Null);
810+
if let Some(blocker) = val.get("blocker").and_then(|v| v.as_str()) {
811+
return Err(intercepted_error(selector, blocker));
812+
}
673813
let x = val.get("x").and_then(|v| v.as_f64());
674814
let y = val.get("y").and_then(|v| v.as_f64());
675815

@@ -679,6 +819,13 @@ async fn resolve_by_selector(
679819
}
680820
}
681821

822+
fn intercepted_error(target: &str, blocker: &str) -> String {
823+
format!(
824+
"Element '{}' is covered by <{}> at its click point, so the input would land on that element instead. Dismiss or interact with the covering element first (it is often a dialog, banner, or sticky header).",
825+
target, blocker
826+
)
827+
}
828+
682829
fn box_model_center(model: &BoxModel) -> (f64, f64) {
683830
// content quad: [x1,y1, x2,y2, x3,y3, x4,y4]
684831
if model.content.len() >= 8 {

cli/src/output.rs

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1259,6 +1259,9 @@ Usage: agent-browser click <selector> [--new-tab]
12591259
Clicks on the specified element. The selector can be a CSS selector,
12601260
XPath, or an element reference from snapshot (e.g., @e1).
12611261
1262+
If another element covers the click point, agent-browser reports the
1263+
covering element instead of dispatching a click to the wrong target.
1264+
12621265
Options:
12631266
--new-tab Open link in a new tab instead of navigating current tab
12641267
(only works on elements with href attribute)

docs/src/app/commands/page.mdx

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,11 @@ agent-browser close # Close browser (aliases: quit, exit)
3838
agent-browser close --all # Close all active sessions
3939
```
4040
41+
Clicks fail before dispatch when another element covers the target's click
42+
point. The error names the covering element, for example
43+
`covered by <div#consent-banner>`. Dismiss or interact with that element,
44+
take a fresh snapshot, then retry the original action.
45+
4146
Headless Chromium screenshots hide native scrollbars for consistent image output.
4247
Pass `--hide-scrollbars false` when launching to keep native scrollbars visible.
4348

skill-data/core/SKILL.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -367,8 +367,9 @@ agent-browser snapshot -i
367367
```
368368
369369
**Click does nothing / overlay swallows the click**
370-
Some modals and cookie banners block other clicks. Snapshot, find the
371-
dismiss/close button, click it, then re-snapshot.
370+
Some modals and cookie banners block other clicks. If `click` reports
371+
`covered by <...>`, interact with that covering element first. Otherwise,
372+
snapshot, find the dismiss/close button, click it, then re-snapshot.
372373
373374
**Fill / type doesn't work**
374375
Some custom input components intercept key events. Try:

skill-data/core/references/commands.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,11 @@ agent-browser drag @e1 @e2 # Drag and drop
7171
agent-browser upload @e1 file.pdf # Upload files
7272
```
7373
74+
Clicks fail before dispatch when another element covers the target's click
75+
point. The error names the covering element, for example
76+
`covered by <div#consent-banner>`. Dismiss or interact with that element, run a
77+
fresh snapshot, then retry the original action.
78+
7479
## Get Information
7580
7681
```bash

0 commit comments

Comments
 (0)