Commit cbcc7f8
[SPARK-55304][SS][PYTHON] Introduce support of Admission Control and Trigger.AvailableNow in Python data source - streaming reader
### What changes were proposed in this pull request?
This PR proposes to introduce the support of Admission Control and Trigger.AvailableNow in Python data source - streaming reader.
To support Admission control, we propose to change `DataSourceStreamReader` interface as following:
(Created a table to perform side-by-side comparison)
| **Before** | **After** |
| :---: | :---: |
| `class DataSourceStreamReader(ABC):` | `class DataSourceStreamReader(ABC):` |
| `def initialOffset(self) -> dict` | `def initialOffset(self) -> dict` |
| `def latestOffset() -> dict` | `def latestOffset(self, start: dict, limit: ReadLimit) -> dict` |
| | `# NOTE: Optional to implement, default = ReadAllAvailable()` |
| | `def getDefaultReadLimit(self) -> ReadLimit` |
| | `# NOTE: Optional to implement, default = None` |
| | `def reportLatestOffset(self) -> Optional[dict]` |
| `def partitions(self, start: dict, end: dict) -> Sequence[InputPartition]` | `def partitions(self, start: dict, end: dict) -> Sequence[InputPartition]` |
| `abstractmethod def read(self, partition: InputPartition) -> Union[Iterator[Tuple], Iterator["RecordBatch"]]` | `abstractmethod def read(self, partition: InputPartition) -> Union[Iterator[Tuple], Iterator["RecordBatch"]]` |
| `def commit(self, end: dict) -> None` | `def commit(self, end: dict) -> None` |
| `def stop(self) -> None` | `def stop(self) -> None` |
The main change is following:
* The method signature for `latestOffset` is changed. The method is mandatory.
* The method `getDefaultReadLimit` is added, as optional.
* The method `reportLatestOffset` is added, as optional.
This way, new implementations would support Admission Control by default. We ensure the engine can handle the case of the old method signature, via Python’s built-in inspect module (similar to Java’s reflection). If the method “latestOffset” is implemented without parameters, we fall back to the source which does not enable admission control. For all new sources, implementing latestOffset with parameters is strongly recommended.
ReadLimit interface and built-in implementations will be available for source implementations to leverage. Built-in implementations are as follows: `ReadAllAvailable`, `ReadMinRows`, `ReadMaxRows`, `ReadMaxFiles`, `ReadMaxBytes`. We won’t support custom implementation of `ReadLimit` interface at this point since it requires major efforts and we don’t see a demand, but we can plan for it if there is a strong demand.
We do not make any change to `SimpleDataSourceStreamReader` for Admission Control, since it is designed for small data fetch and could be considered as already limiting the data. We could still add the `ReadLimit` later if we see strong demand of limiting the fetch size via the source option.
To support `Trigger.AvailableNow`, we propose to introduce a new interface as following:
```
class SupportsTriggerAvailableNow(ABC):
abstractmethod
def prepareForTriggerAvailableNow(self) -> None
```
The above interface can be “mixed-up” with both `DataSourceStreamReader` and `SimpleDataSourceStreamReader`. It won’t work with `DataSourceStreamReader` implementations having the old method signature of `latestOffset()`, likewise mentioned above.
### Why are the changes needed?
This is to catch up with supported features in Scala DSv2 API, which we got reports from developers that missing features block them to implement some data sources.
### Does this PR introduce _any_ user-facing change?
Yes, users implementing streaming reader via python data source API will be able to add the support of Admission Control and Trigger.AvailableNow, which had been major lacks of features.
### How was this patch tested?
New UTs.
### Was this patch authored or co-authored using generative AI tooling?
Co-authored using claude-4.5-sonnet
Closes #54085 from HeartSaVioR/SPARK-55304.
Lead-authored-by: Jungtaek Lim <[email protected]>
Co-authored-by: Jitesh Soni <[email protected]>
Signed-off-by: Jungtaek Lim <[email protected]>1 parent b75a329 commit cbcc7f8
File tree
10 files changed
+1276
-102
lines changed- python/pyspark/sql
- streaming
- tests
- sql/core/src
- main/scala/org/apache/spark/sql/execution
- datasources/v2/python
- python/streaming
- test/scala/org/apache/spark/sql/execution/python/streaming
10 files changed
+1276
-102
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
| 35 | + | |
35 | 36 | | |
36 | 37 | | |
37 | 38 | | |
| |||
714 | 715 | | |
715 | 716 | | |
716 | 717 | | |
717 | | - | |
| 718 | + | |
718 | 719 | | |
719 | | - | |
| 720 | + | |
| 721 | + | |
| 722 | + | |
| 723 | + | |
| 724 | + | |
| 725 | + | |
| 726 | + | |
| 727 | + | |
| 728 | + | |
| 729 | + | |
| 730 | + | |
| 731 | + | |
| 732 | + | |
| 733 | + | |
| 734 | + | |
| 735 | + | |
| 736 | + | |
| 737 | + | |
| 738 | + | |
| 739 | + | |
| 740 | + | |
| 741 | + | |
| 742 | + | |
| 743 | + | |
| 744 | + | |
| 745 | + | |
| 746 | + | |
720 | 747 | | |
721 | 748 | | |
722 | 749 | | |
| |||
726 | 753 | | |
727 | 754 | | |
728 | 755 | | |
729 | | - | |
730 | | - | |
| 756 | + | |
| 757 | + | |
| 758 | + | |
| 759 | + | |
| 760 | + | |
| 761 | + | |
| 762 | + | |
731 | 763 | | |
| 764 | + | |
| 765 | + | |
| 766 | + | |
732 | 767 | | |
733 | 768 | | |
734 | 769 | | |
735 | 770 | | |
736 | 771 | | |
| 772 | + | |
| 773 | + | |
| 774 | + | |
| 775 | + | |
| 776 | + | |
| 777 | + | |
| 778 | + | |
| 779 | + | |
| 780 | + | |
| 781 | + | |
| 782 | + | |
| 783 | + | |
| 784 | + | |
| 785 | + | |
| 786 | + | |
| 787 | + | |
| 788 | + | |
| 789 | + | |
| 790 | + | |
| 791 | + | |
| 792 | + | |
| 793 | + | |
| 794 | + | |
| 795 | + | |
| 796 | + | |
| 797 | + | |
| 798 | + | |
| 799 | + | |
| 800 | + | |
| 801 | + | |
| 802 | + | |
| 803 | + | |
| 804 | + | |
| 805 | + | |
| 806 | + | |
| 807 | + | |
737 | 808 | | |
738 | 809 | | |
739 | 810 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
22 | | - | |
| 22 | + | |
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
30 | 38 | | |
31 | 39 | | |
| 40 | + | |
32 | 41 | | |
33 | 42 | | |
34 | 43 | | |
| |||
62 | 71 | | |
63 | 72 | | |
64 | 73 | | |
65 | | - | |
66 | | - | |
67 | | - | |
68 | | - | |
69 | 74 | | |
70 | | - | |
71 | | - | |
| 75 | + | |
| 76 | + | |
72 | 77 | | |
73 | 78 | | |
74 | 79 | | |
| |||
79 | 84 | | |
80 | 85 | | |
81 | 86 | | |
82 | | - | |
83 | | - | |
84 | 87 | | |
85 | 88 | | |
86 | 89 | | |
87 | | - | |
88 | | - | |
89 | | - | |
90 | | - | |
91 | | - | |
92 | | - | |
93 | | - | |
94 | | - | |
95 | | - | |
96 | | - | |
97 | | - | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
98 | 104 | | |
99 | 105 | | |
100 | 106 | | |
101 | | - | |
102 | | - | |
103 | | - | |
104 | 107 | | |
105 | 108 | | |
106 | 109 | | |
| |||
112 | 115 | | |
113 | 116 | | |
114 | 117 | | |
115 | | - | |
116 | | - | |
117 | | - | |
118 | | - | |
119 | | - | |
120 | 118 | | |
121 | 119 | | |
122 | 120 | | |
| |||
144 | 142 | | |
145 | 143 | | |
146 | 144 | | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
914 | 914 | | |
915 | 915 | | |
916 | 916 | | |
| 917 | + | |
| 918 | + | |
| 919 | + | |
| 920 | + | |
| 921 | + | |
| 922 | + | |
| 923 | + | |
| 924 | + | |
917 | 925 | | |
918 | 926 | | |
919 | 927 | | |
920 | | - | |
921 | | - | |
922 | | - | |
| 928 | + | |
| 929 | + | |
| 930 | + | |
923 | 931 | | |
924 | 932 | | |
925 | 933 | | |
| |||
0 commit comments