Skip to content

When retrieving data from S3, account for chunking#1085

Merged
dimitri-yatsenko merged 5 commits intodatajoint:masterfrom
horsto:s3_chunked
May 15, 2023
Merged

When retrieving data from S3, account for chunking#1085
dimitri-yatsenko merged 5 commits intodatajoint:masterfrom
horsto:s3_chunked

Conversation

@horsto
Copy link
Copy Markdown
Contributor

@horsto horsto commented May 12, 2023

There seems to be an issue with retrieval of larger ("external") files from S3 compatible buckets. I documented the error here: #1083. I brought it up as issue in the minio-api repository as well: minio/minio-py#1280.

The problem is that files can be returned as chunks via the minio API, such that .data cannot be called directly. Instead, .stream and subsequent concatenation of bytes seems to work for all cases. This behavior is already correctly implemented in fget() (

for d in data.stream(1 << 16):
)

@horsto
Copy link
Copy Markdown
Contributor Author

horsto commented May 12, 2023

This PR changes the .data method to .stream in the get() method for S3 (external) objects, which accommodates chunking of larger files (opposed to assuming that every / the whole file is loaded at once).

@dimitri-yatsenko
Copy link
Copy Markdown
Member

@horsto Would you merge this PR horsto#1 ?

pull from upstream and update changelog
@horsto
Copy link
Copy Markdown
Contributor Author

horsto commented May 15, 2023

Done!

@dimitri-yatsenko dimitri-yatsenko merged commit 97e34a6 into datajoint:master May 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants