source-ashby: implement snapshot and child streams by nicolaslazo · Pull Request #4309 · estuary/connectors

nicolaslazo · 2026-04-24T12:14:00Z

Description:

All Ashby streams were treated as incremental and requiring no parent entity id parameters. This change

Implements snapshot streams
Sets flags to pull additional data when available
Implements a guard for the scenario when docs are yielded but no sync token is provided by Ashby at the end of pagination

Workflow steps:

(How does one use this feature, and how has it changed)

Documentation links affected:

(list any documentation links that you created, or existing ones that you've identified as needing updates, along with a brief description)

Notes for reviewers:

(anything that might help someone review this PR)

All Ashby streams were treated as incremental and requiring no parent entity id parameters. This change - Implements snapshot streams - Sets flags to pull additional data when available - Implements a guard for the scenario when docs are yielded but no sync token is provided by Ashby at the end of pagination

Alex-Bair

LGTM % the pagination question for snapshot resources and the timeout concern around concurrently fetching child records while streaming the response containing parent ids.

Alex-Bair · 2026-04-24T15:40:09Z

+    return Resource(
+        name=entity_cls.name,
+        key=["/_meta/row_id"],
+        model=entity_cls,
+        open=open,
+        initial_state=ResourceState(),
+        initial_config=ResourceConfig(
+            name=entity_cls.name,
+            interval=timedelta(minutes=5),
+        ),
+        schema_inference=True,
+    )


question: Would it work to use the recently added SnapshotResource here instead of Resource? It's not necessary, but should remove the need to specify the key and initial_state during instantiation. Using Resource instead is fine, but if there's some rough edge that's preventing the use of SnapshotResource instead I'd like to know so I can smooth it out.

Alex-Bair · 2026-04-24T15:41:56Z

+            state,
+            task,
+            fetch_snapshot=functools.partial(snapshot_fn, entity_cls, http),
+            tombstone=BaseDocument(_meta=BaseDocument.Meta(op="d")),


note: Similar to below, you shouldn't have to specify a tombstone here any longer as long as you're fine with the CDK defaulting to use BaseDocument for the tombstone like you are here. You can still specify it if you want, but it should work without it.

Alex-Bair · 2026-04-24T15:55:29Z

+) -> AsyncGenerator[ChildEntity, None]:
+    url = f"{API_BASE_URL}/{entity_cls.path}"
+
+    async for parent in snapshot_entity(entity_cls.parent_entity, http, log):


Sometimes we've seen that keeping a response open while we're iterating through parent ids & making requests for child resources can lead to TimeoutErrors (here's the spot in source-zendesk-support-native where we fixed that issue). That's seems to be possible here too - the snapshot_entity call streams the API response for parent records, and we're keeping that response open while we make requests for child resources. We've handled this in other connectors by fetching an entire response's worth of parent records at a time, buffering the parent ids in a list, closing that response, then fetching the child records for those in memory parent ids. That'd be a good pattern to use here too to avoid potential TimeoutErrors.

Alex-Bair · 2026-04-24T16:19:10Z

+async def snapshot_entity(
+    entity_cls: type[AshbyEntity],
+    http: HTTPSession,
+    log: Logger,
+) -> AsyncGenerator[AshbyEntity, None]:
+    url = f"{API_BASE_URL}/{entity_cls.path}"
+
+    _, response = await http.request_stream(
+        log, url, method="POST", json={**entity_cls.extra_body}
+    )
+    processor = IncrementalJsonProcessor(
+        response(), "results.item", entity_cls, remainder_cls=ResponseMeta
+    )
+
+    async for item in processor:
+        yield item
+
+    meta = processor.get_remainder()
+    if not meta.success:
+        log.error(
+            "Ashby API error during snapshot",
+            extra={"entity": entity_cls.name, "errors": meta.errors},
+        )
+        raise RuntimeError(f"Ashby API error for {entity_cls.name}: {meta.errors}")


Both here and in snapshot_child_entity, do we need to paginate through results using meta.nextCursor like we do in fetch_entity? The API docs' interviewEvent.list page seems to suggest pagination is necessary. It feels like the "paginate through all pages" logic could be abstracted to a helper and re-used across all three functions, although that's not always worth the extra effort if the code's fairly simple.

nicolaslazo added 2 commits April 24, 2026 08:52

claude: add child-entities skill

2037244

nicolaslazo requested a review from a team April 24, 2026 12:14

nicolaslazo self-assigned this Apr 24, 2026

Alex-Bair approved these changes Apr 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

source-ashby: implement snapshot and child streams#4309

source-ashby: implement snapshot and child streams#4309
nicolaslazo wants to merge 2 commits intomainfrom
nlazo/fix-source-ashby-stream-types

nicolaslazo commented Apr 24, 2026

Uh oh!

Alex-Bair left a comment

Uh oh!

Alex-Bair Apr 24, 2026

Uh oh!

Alex-Bair Apr 24, 2026

Uh oh!

Alex-Bair Apr 24, 2026

Uh oh!

Alex-Bair Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nicolaslazo commented Apr 24, 2026

Uh oh!

Alex-Bair left a comment

Choose a reason for hiding this comment

Uh oh!

Alex-Bair Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Alex-Bair Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Alex-Bair Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Alex-Bair Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants