Skip to content

Add bytes scanned metric to ParquetExec#2273

Merged
andygrove merged 1 commit intoapache:masterfrom
coralogix:metrics
Apr 21, 2022
Merged

Add bytes scanned metric to ParquetExec#2273
andygrove merged 1 commit intoapache:masterfrom
coralogix:metrics

Conversation

@thinkharderdev
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #.

Rationale for this change

Tracking total number of bytes read from storage is an important metric for query execution (especially for use cases where data is read from object storage or NAS rather than a local SSD). It's especially important for Parquet data sources where we have many levers we can pull to reduce the amount of data read.

What changes are included in this PR?

Add a bytes_scanned counter to ParquetFileMetrics and track total number of bytes read through ChunkObjectReader. This should be quite cheap since we pass the number of bytes to read as an argument.

Are there any user-facing changes?

ParquetExec will have a bytes_scanned metric.

No

fn get_read(&self, start: u64, length: usize) -> ParquetResult<Self::T> {
self.0
if let Some(m) = self.bytes_scanned.as_ref() {
m.add(length)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just curious .. the length here is the number of bytes requested and could potentially be different from the actual number of bytes read?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this interface actually specifies so it would really be up to the specific ObjectStore implementation. I know that that is the way the LocalFileSystem works and I believe that is the way the AWS S3 API works with file ranges.

@andygrove andygrove merged commit 4b8f1d5 into apache:master Apr 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants