Commit 61fb41f
[SPARK-55147][SS] Scope timestamp range for time-interval join retrieval in V4 state format
### What changes were proposed in this pull request?
This PR improves the retrieval operation in the V4 stream-stream join state manager to scope the timestamp range for time-interval joins. Instead of scanning all timestamps for a given key during prefix scan, V4 now extracts constant interval offsets from the join condition and computes a `(minTs, maxTs)` range per input row, enabling the prefix scan to skip entries before `minTs` and terminate early past `maxTs`.
- Add `scanRangeOffsets` and `computeTimestampRange` to `OneSideHashJoiner`, using `StreamingJoinHelper.getStateValueWatermark(eventWatermark=0)` to extract interval bounds from the join condition
- Add `timestampRange` parameter to `getJoinedRows` in the state manager trait, V4 implementation, and V1-V3 base class (ignored by V1-V3)
- Add `getValuesInRange` to `KeyWithTsToValuesStore` that filters by range and stops early past the upper bound
- `getValues` now delegates to `getValuesInRange(Long.MinValue, Long.MaxValue)`
### Why are the changes needed?
For time-interval joins, the V4 state format stores values indexed by `(key, timestamp)`. Without range scoping, retrieving matches requires scanning all timestamps for a key via prefix scan, even though the join condition constrains matching to a specific time window. With this change, the scan is bounded to only the relevant timestamp range, reducing I/O proportionally to the ratio of the interval width to the total timestamp span in state.
### Does this PR introduce _any_ user-facing change?
No. V4 state format is experimental and gated behind `spark.sql.streaming.join.stateFormatV4.enabled`.
### How was this patch tested?
New unit tests in `SymmetricHashJoinStateManagerEventTimeInValueSuite`:
- `getJoinedRows with timestampRange`: boundary conditions, exact matches, empty ranges, full range
- `timestampRange with multiple values per timestamp`: multiple values at the same timestamp
Existing V4 join suites (Inner, Outer, FullOuter, LeftSemi) all pass.
### Was this patch authored or co-authored using generative AI tooling?
Yes. (Claude Opus 4.6)
Closes #54879 from nicholaschew11/SPARK-55147-range-scan-v4.
Authored-by: Nicholas Chew <chew.nicky@gmail.com>
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>1 parent e7a9976 commit 61fb41f
File tree
3 files changed
+151
-50
lines changed- sql/core/src
- main/scala/org/apache/spark/sql/execution/streaming/operators/stateful/join
- test/scala/org/apache/spark/sql/execution/streaming/state
3 files changed
+151
-50
lines changedLines changed: 48 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
26 | | - | |
| 26 | + | |
| 27 | + | |
27 | 28 | | |
28 | 29 | | |
29 | 30 | | |
| |||
682 | 683 | | |
683 | 684 | | |
684 | 685 | | |
| 686 | + | |
| 687 | + | |
| 688 | + | |
| 689 | + | |
| 690 | + | |
| 691 | + | |
| 692 | + | |
| 693 | + | |
| 694 | + | |
| 695 | + | |
| 696 | + | |
| 697 | + | |
| 698 | + | |
| 699 | + | |
| 700 | + | |
| 701 | + | |
| 702 | + | |
| 703 | + | |
| 704 | + | |
| 705 | + | |
| 706 | + | |
| 707 | + | |
| 708 | + | |
| 709 | + | |
| 710 | + | |
| 711 | + | |
| 712 | + | |
| 713 | + | |
| 714 | + | |
| 715 | + | |
| 716 | + | |
| 717 | + | |
| 718 | + | |
| 719 | + | |
| 720 | + | |
| 721 | + | |
| 722 | + | |
| 723 | + | |
| 724 | + | |
| 725 | + | |
| 726 | + | |
| 727 | + | |
| 728 | + | |
| 729 | + | |
685 | 730 | | |
686 | 731 | | |
687 | 732 | | |
| |||
758 | 803 | | |
759 | 804 | | |
760 | 805 | | |
761 | | - | |
| 806 | + | |
| 807 | + | |
762 | 808 | | |
763 | 809 | | |
764 | 810 | | |
| |||
Lines changed: 52 additions & 48 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
67 | 67 | | |
68 | 68 | | |
69 | 69 | | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
70 | 74 | | |
71 | 75 | | |
72 | 76 | | |
73 | 77 | | |
74 | | - | |
| 78 | + | |
| 79 | + | |
75 | 80 | | |
76 | 81 | | |
77 | 82 | | |
| |||
343 | 348 | | |
344 | 349 | | |
345 | 350 | | |
346 | | - | |
347 | | - | |
348 | | - | |
| 351 | + | |
| 352 | + | |
349 | 353 | | |
350 | 354 | | |
351 | 355 | | |
| |||
399 | 403 | | |
400 | 404 | | |
401 | 405 | | |
402 | | - | |
| 406 | + | |
| 407 | + | |
403 | 408 | | |
404 | 409 | | |
405 | 410 | | |
| |||
626 | 631 | | |
627 | 632 | | |
628 | 633 | | |
| 634 | + | |
| 635 | + | |
| 636 | + | |
| 637 | + | |
| 638 | + | |
| 639 | + | |
| 640 | + | |
| 641 | + | |
| 642 | + | |
629 | 643 | | |
630 | 644 | | |
631 | 645 | | |
632 | 646 | | |
633 | 647 | | |
634 | 648 | | |
| 649 | + | |
635 | 650 | | |
636 | 651 | | |
| 652 | + | |
| 653 | + | |
| 654 | + | |
| 655 | + | |
| 656 | + | |
| 657 | + | |
| 658 | + | |
| 659 | + | |
| 660 | + | |
| 661 | + | |
| 662 | + | |
| 663 | + | |
| 664 | + | |
637 | 665 | | |
638 | 666 | | |
639 | | - | |
| 667 | + | |
| 668 | + | |
| 669 | + | |
640 | 670 | | |
641 | | - | |
642 | 671 | | |
643 | 672 | | |
644 | | - | |
645 | | - | |
| 673 | + | |
| 674 | + | |
| 675 | + | |
| 676 | + | |
| 677 | + | |
| 678 | + | |
646 | 679 | | |
647 | | - | |
648 | | - | |
649 | | - | |
650 | | - | |
651 | | - | |
652 | | - | |
653 | | - | |
654 | | - | |
655 | | - | |
656 | | - | |
| 680 | + | |
| 681 | + | |
| 682 | + | |
| 683 | + | |
| 684 | + | |
| 685 | + | |
657 | 686 | | |
658 | | - | |
659 | 687 | | |
660 | 688 | | |
| 689 | + | |
661 | 690 | | |
662 | | - | |
663 | | - | |
664 | | - | |
665 | | - | |
666 | | - | |
667 | | - | |
668 | | - | |
669 | | - | |
670 | | - | |
671 | | - | |
672 | | - | |
673 | | - | |
674 | | - | |
675 | | - | |
676 | | - | |
677 | | - | |
678 | | - | |
679 | | - | |
680 | | - | |
681 | | - | |
682 | | - | |
683 | | - | |
684 | | - | |
685 | | - | |
686 | | - | |
687 | | - | |
688 | | - | |
| 691 | + | |
689 | 692 | | |
690 | 693 | | |
691 | 694 | | |
| |||
1051 | 1054 | | |
1052 | 1055 | | |
1053 | 1056 | | |
1054 | | - | |
| 1057 | + | |
| 1058 | + | |
1055 | 1059 | | |
1056 | 1060 | | |
1057 | 1061 | | |
| |||
Lines changed: 51 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1009 | 1009 | | |
1010 | 1010 | | |
1011 | 1011 | | |
| 1012 | + | |
| 1013 | + | |
| 1014 | + | |
| 1015 | + | |
| 1016 | + | |
| 1017 | + | |
| 1018 | + | |
| 1019 | + | |
| 1020 | + | |
| 1021 | + | |
| 1022 | + | |
| 1023 | + | |
| 1024 | + | |
| 1025 | + | |
| 1026 | + | |
| 1027 | + | |
| 1028 | + | |
| 1029 | + | |
| 1030 | + | |
| 1031 | + | |
| 1032 | + | |
| 1033 | + | |
| 1034 | + | |
| 1035 | + | |
| 1036 | + | |
| 1037 | + | |
| 1038 | + | |
| 1039 | + | |
| 1040 | + | |
| 1041 | + | |
| 1042 | + | |
| 1043 | + | |
| 1044 | + | |
| 1045 | + | |
| 1046 | + | |
| 1047 | + | |
| 1048 | + | |
| 1049 | + | |
| 1050 | + | |
| 1051 | + | |
| 1052 | + | |
| 1053 | + | |
| 1054 | + | |
| 1055 | + | |
| 1056 | + | |
| 1057 | + | |
| 1058 | + | |
| 1059 | + | |
| 1060 | + | |
| 1061 | + | |
| 1062 | + | |
1012 | 1063 | | |
0 commit comments