Skip to content

Slow Dataset Query Updates#2534

Merged
phixMe merged 4 commits into
mainfrom
update/datasets-sql
Jul 17, 2023
Merged

Slow Dataset Query Updates#2534
phixMe merged 4 commits into
mainfrom
update/datasets-sql

Conversation

@phixMe

@phixMe phixMe commented Jun 30, 2023

Copy link
Copy Markdown
Member

Problem

These queries were slow for Marquez instances with many datasets, dataset versions, and facets.

Solution

These queries were slow for Marquez instances with many datasets, dataset versions, and facets.

One-line summary: Scopes down nested facet queries to be the same scope as the outer query.

Checklist

  • You've signed-off your work
  • Your changes are accompanied by tests (if relevant)
  • Your change contains a small diff and is self-contained
  • You've updated any relevant documentation (if relevant)
  • You've included a one-line summary of your change for the CHANGELOG.md (Depending on the change, this may not be necessary).
  • You've versioned your .sql database schema migration according to Flyway's naming convention (if relevant)
  • You've included a header in any source code files (if relevant)

@boring-cyborg boring-cyborg Bot added the api API layer changes label Jun 30, 2023
@codecov

codecov Bot commented Jun 30, 2023

Copy link
Copy Markdown

Codecov Report

Merging #2534 (15aef67) into main (e99ebc9) will not change coverage.
The diff coverage is n/a.

@@            Coverage Diff            @@
##               main    #2534   +/-   ##
=========================================
  Coverage     83.86%   83.86%           
  Complexity     1245     1245           
=========================================
  Files           238      238           
  Lines          5657     5657           
  Branches        271      271           
=========================================
  Hits           4744     4744           
  Misses          769      769           
  Partials        144      144           
Impacted Files Coverage Δ
api/src/main/java/marquez/db/DatasetDao.java 98.64% <ø> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@wslulciuc wslulciuc left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phixMe, great work on identifying ways to optimize our dataset query! But, mind adding a query plan, or analysis of the query before and after? I agree with the changes, but also think an analysis would be helpful to better understand the change.

@wslulciuc wslulciuc left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed offline, these changes look great! Thanks for the perf improvements 💯

@phixMe phixMe merged commit 52b70a7 into main Jul 17, 2023
@phixMe phixMe deleted the update/datasets-sql branch July 17, 2023 18:26
jonathanpmoraes referenced this pull request in nubank/NuMarquez Feb 6, 2025
* Updating the sql for dataset get by name and namespace, and list endpoint

* Update for test failure.

* Adding qualifiers back in to list query

---------

Co-authored-by: phix <peter.hicks@astronomer.io>
Co-authored-by: Willy Lulciuc <willy@datakin.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api API layer changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants