Add StreamProvider for configuring StreamTable#10600
Conversation
|
@metesynnada @mustafasrepo i believe you were both involved in the |
alamb
left a comment
There was a problem hiding this comment.
Thanks @matthewmturner -- this looks pretty neat.
I wonder if there is some example we could add to datafusion-examples that would show this new API in action? That might help motivate what the API changes are for / help this PR review along as well as help others see the feature
| } | ||
|
|
||
| /// The configuration for a [`StreamTable`] | ||
| pub trait StreamSource: std::fmt::Debug + Send + Sync { |
There was a problem hiding this comment.
Perhaps some documentation about what this trait is meant to represent would help
|
@alamb i would be happy to add example - for this PR it would likely just be copying from https://github.com/apache/datafusion/blob/main/datafusion/core/tests/fifo.rs. I will get to that shortly. Ultimately the motivation here is to start working towards a more generic interface where |
|
Im hoping to get to a similar API as
Where In the case of streams i had in mind:
Where there are different |
|
@alamb i have a working example now. i have idea to update it to show more of the streaming nature (i.e. write to the fifo and get batches multiple times) but wont have time for that today. Do you have thoughts in general on whether this type of API could be supported? |
alamb
left a comment
There was a problem hiding this comment.
I think this API looks pretty good @matthewmturner thank you
@metesynnada or @ozankabak or @devinjdangelo do you have any thoughts / suggestions on these patterns?
|
@berkaysynnada PTAL |
|
I will review it in detail tomorrow |
| /// The StreamProvider trait is used as a generic interface for reading and writing from streaming | ||
| /// data sources (such as FIFO, Websocket, Kafka, etc.). Implementations of the provider are | ||
| /// responsible for providing a `RecordBatchReader` and optionally a `RecordBatchWriter`. | ||
| pub trait StreamProvider: std::fmt::Debug + Send + Sync { |
There was a problem hiding this comment.
StreamFormat might be a better name here, to align with FileFormat
There was a problem hiding this comment.
I am not sure people associate location with 'Format'. FileFormat does not deal with location. I think StreamProvider is descriptive enough.
There was a problem hiding this comment.
i viewed location as specific to the FileStreamProvider and Format as a concept that provided reading and writing capabilities, similar to FileFormat. But i dont feel strongly either way so im fine to leave at StreamProvider.
berkaysynnada
left a comment
There was a problem hiding this comment.
After addressing @alamb's feedback, this PR looks good and separates responsibilities with an additional layer. However, FileStreamProvider currently handles only the CSV format. I suggest focusing more on file format abstraction. Thanks for the example; it makes understanding and usage easier.
| /// The StreamProvider trait is used as a generic interface for reading and writing from streaming | ||
| /// data sources (such as FIFO, Websocket, Kafka, etc.). Implementations of the provider are | ||
| /// responsible for providing a `RecordBatchReader` and optionally a `RecordBatchWriter`. | ||
| pub trait StreamProvider: std::fmt::Debug + Send + Sync { |
There was a problem hiding this comment.
I am not sure people associate location with 'Format'. FileFormat does not deal with location. I think StreamProvider is descriptive enough.
StreamProvider for configuring StreamTable
alamb
left a comment
There was a problem hiding this comment.
This PR looks good to me. Are we waiting on anything else? Otherwise I think it is good to go from my perspective.
Thank you @matthewmturner and @berkaysynnada for your code and review
|
thank you @alamb @berkaysynnada @ozankabak |
|
Thanks again everyone |
* Start setting up new StreamTable config * Cleanup * Cleanup * Fix some tests * Cleanup * Start adding example * Feedback
Which issue does this PR close?
Closes #10599
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Not yet, but I will add tests if there is agreement the API is going in the right direction
Are there any user-facing changes?
Yes, new interface for
StreamTable