Skip to content

Commit 861ba53

Browse files
kavpreetgrewalHeartSaVioR
authored andcommitted
[SPARK-55300][DOCS] Add timestamp offset option visualization in Kafka docs
### What changes were proposed in this pull request? Adds a diagram illustrating the behavior the the timestamp offset options for Kafka. ### Why are the changes needed? The text explanation of the timestamp offset options in the guide doc was not sufficient for many users to form the above illustration in their minds. Their feedback is something like “A picture is worth a thousand words”. ### Does this PR introduce _any_ user-facing change? No. Documentation updates only. ### How was this patch tested? N/A. Verified that the docs build as expected. <img width="595" height="526" alt="Screenshot 2026-01-30 at 4 24 13 PM" src="https://github.com/user-attachments/assets/dfad6db0-22f1-45dd-b4ea-a2b2592402ee" /> [site.zip](https://github.com/user-attachments/files/24975955/site.zip) - See `/streaming/structured-streaming-kafka-integration.html` ### Was this patch authored or co-authored using generative AI tooling? No. AI was only used for consultation purposes. Closes #54080 from kavpreetgrewal/update-kafka-offset-doc. Authored-by: Kavpreet Grewal <kavpreet.grewal@databricks.com> Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
1 parent f0d9f99 commit 861ba53

File tree

1 file changed

+40
-0
lines changed

1 file changed

+40
-0
lines changed

docs/streaming/structured-streaming-kafka-integration.md

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -587,6 +587,46 @@ Also, the meaning of <code>timestamp</code> here can be vary according to Kafka
587587

588588
Timestamp offset options require Kafka 0.10.1.0 or higher.
589589

590+
#### Timestamp offset behavior illustrated
591+
592+
The following diagrams illustrate how Kafka resolves offsets when querying by timestamp for any particular topic partition. Each diagram shows a timeline with records (r1, r2, r3, ...) and the provided timestamp (ts), which can be global or per partition. The returned offset points to the **first record whose timestamp is greater than or equal to the provided timestamp**.
593+
594+
**Scenario 1: Timestamp is before existing records**
595+
596+
```
597+
Timeline: ───────|────────|───|───|───|───|──▶
598+
ts r1 r2 r3 r4 r5
599+
600+
returned
601+
602+
Result: r1 is returned (first record with timestamp >= ts)
603+
```
604+
605+
**Scenario 2: Timestamp is after all existing records**
606+
607+
```
608+
Timeline: ──|───|───|───|───|─────────────|──▶
609+
r1 r2 r3 r4 r5 ts
610+
611+
no match found
612+
613+
Result: No offset returned by Kafka; behavior falls back to
614+
startingOffsetsByTimestampStrategy ("error" or "latest")
615+
```
616+
617+
**Scenario 3: Timestamp falls between records**
618+
619+
```
620+
Timeline: ─────|─────────|──────|──────────|──▶
621+
r1 ts r2 r3
622+
623+
returned
624+
625+
Result: r2 is returned (first record with timestamp >= ts)
626+
```
627+
628+
**Note:** For starting offsets, ingestion begins from the returned offset (inclusive), or follows <code>startingOffsetsByTimestampStrategy</code> in scenario 2. For ending offsets, ingestion stops at the returned offset (exclusive), or falls back to latest in scenario 2.
629+
590630
### Offset fetching
591631

592632
In Spark 3.0 and before Spark uses <code>KafkaConsumer</code> for offset fetching which could cause infinite wait in the driver.

0 commit comments

Comments
 (0)