Skip to content

df shape in lazy loading or query builder for optimised query #2639

@himsheda

Description

@himsheda

Is your feature request related to a problem? Please describe.
I want this simple operation. I have a huge data with timestamp index. I want just the number of rows based tn the date range query. In future may be any type of filter. The issue is I tried even with columns=[] or just a single column. But the size is so huge that even a single year data blew up 128GB RAM.

Describe the solution you'd like
Erther it should have a count or shape function which just aggregates values while going over each segment, instead of storing all the index in memory and I had to do len(df)

Describe alternatives you've considered
I tried, but still it blew up memory

lazy_df = store_new["library"].read("symbol", columns=[], lazy=True)
print(lazy_df.collect().data.shape[0])
# or
q = QueryBuilder()
q = q.date_range((pd.Timestamp("2023-01-01"), pd.Timestamp("2024-01-01"))).optimise_for_speed()
df = store_new["library"].read("symbol", columns=["ID"], query_builder=q).data
print(len(df))

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions