Optimizations for the exponential backoff and for indexing the sqlite db#70
Conversation
hoist adjective/noun lists and regex into module-level constants speed up name generation and column simplification improve numeric detection & row iteration for CSV import create composite index on project/run/step refactor project lookup using set and SQL fix wait_until_space_exists retry logic add downsample_df and use it in UI plots ensure tests import local package and add utils tests
Fix formatting for ruff
|
@abidlabs I have fixed the minor ruff formatting. I am optimising the low hanging fruits that I am spotting. Highly recommend checking my current state of the pull request locally first to see if anything breaks, as you know the best about this project. And I have fixed the ruff formatting issue, there needed to be a line space between all functions. |
|
Hi @stabgan I started looking through this and while I appreciate the PR, it's a little hard to understand the rationale for all of the changes. For now, I'll just cherry pick the specific changes which seem like the most impactful optimizations and those I understand clearly and revert the rest. If there are other changes you'd like to see, it would be great if you would open individual PRs with those changes. Thanks! |
|
Hi @stabgan thanks again for the suggested changes. To keep this PR more atomic, I've only kept the optimizations for the exponential backoff and for indexing the sqlite db. I'm sure there's changes in other values but it was hard to parse / test with all of them in a single PR, so feel free to make individual PRs with those changes. Will merge this in and do a release soon! |
Word lists for generating readable names and the simplify regex are now module-level constants for faster access
A new downsample_df() helper trims large data sets before plotting, keeping dashboards responsive
CSV import uses numeric column detection with select_dtypes() and iterates rows efficiently with itertuples()
SQLite initialization creates a composite index to speed up project/run lookups
Project discovery scans databases using a set for unique names
optimize column simplification and name generation
add project index caching and metrics cache
improve update_last_steps and run caching
simplify downsample algorithm
revise wait_until_space_exists backoff