Read CSV format text from stdin or memory#54
Conversation
alamb
left a comment
There was a problem hiding this comment.
I think this PR is good -- the only thing that is wonky to me is that if you Clone one of these readers it won't work. However, in this case the error will be clear so I think this PR can be merged and if anyone cares about the Clone behavior we can fix it in a follow on PR.
Thanks again @heymind
What do you think @Dandandan / @andygrove ?
Codecov Report
@@ Coverage Diff @@
## master #54 +/- ##
==========================================
- Coverage 76.24% 76.20% -0.05%
==========================================
Files 134 134
Lines 23051 23199 +148
==========================================
+ Hits 17576 17679 +103
- Misses 5475 5520 +45
Continue to review full report at Codecov.
|
| Path(String), | ||
|
|
||
| /// Read CSV data from a reader | ||
| Reader(Mutex<Option<Box<dyn Read + Send + Sync + 'static>>>), |
There was a problem hiding this comment.
How would this look if we would support multiple readers/partitions?
There was a problem hiding this comment.
Also I am wondering, if we do it like this, we would we need to do the same for json/xml/etc sources that should support essentially the same to avoid reimplementing it for each format.
There was a problem hiding this comment.
I suggest we file a follow on ticket for this work (supporting Reader input for JSON / XML sources). As you say I don't think it needs to be part of this PR.
|
Looks good to me, thanks @heymind . |
fix: More dangling references (apache#54) * fix: More dangling references * test: Add tests for remove_dangling_identifiers UPSTREAM NOTE: This PR was attempted to be upstreamed in apache#13405 - but it was not accepted due to the complexity it brought. Phillip needs to figure out what a good solution that solves our problem and can be upstreamed is.
fix: More dangling references (apache#54) * fix: More dangling references * test: Add tests for remove_dangling_identifiers UPSTREAM NOTE: This PR was attempted to be upstreamed in apache#13405 - but it was not accepted due to the complexity it brought. Phillip needs to figure out what a good solution that solves our problem and can be upstreamed is.
Migrate from apache/arrow#10066.
Which issue does this PR close?
Close Jira issue ARROW-12306. Closes #198
Rationale for this change
Let CsvFile and CsvExec support reading from a reader (not only files)
What changes are included in this PR?
This pr adds the following new pub functions:
CsvFile::try_new_from_readerCsvFile::try_new_from_reader_infer_schemaCsvExec::try_new_from_readerCsvStream::try_new_from_readerAre there any user-facing changes?
No.