Perf/improve jobdao query by algorithmy1 · Pull Request #2609 · MarquezProject/marquez

algorithmy1 · 2023-09-06T20:38:57Z

Problem

Hello,

While working with Marquez, I noticed a significant performance bottleneck with a specific SQL query in the JobDao.java file. For the namespaceName "MyNameSpace", the query was originally taking 17 seconds to execute with a limit of 100, and 12 seconds with a limit of 25. Given that this query runs every time the Marquez web UI is accessed, this presented a major user experience challenge.
db.t4g.medium (vCPU: 2, RAM: 4 GB)

See : #2608

Solution

To address this, I've revised the query. The optimized query makes use of Common Table Expressions to fetch the required data more efficiently and before the join. Here's the optimized query:

    WITH jobs_view_page
    AS (
      SELECT
        *
      FROM 
        jobs_view AS j
      WHERE 
        j.namespace_name = :namespaceName
      ORDER BY
        j.name
      LIMIT 
        :limit
      OFFSET 
        :offset
    ),
    job_versions_temp AS (
      SELECT 
        *
      FROM 
        job_versions AS j
      WHERE 
        j.namespace_name = :namespaceName
    ),
    facets_temp AS (
    SELECT 
      run_uuid,
        JSON_AGG(e.facet) AS facets
    FROM (
        SELECT 
          jf.run_uuid,
            jf.facet
        FROM 
          job_facets_view AS jf
        INNER JOIN job_versions_temp jv2 
          ON jv2.latest_run_uuid = jf.run_uuid
        INNER JOIN jobs_view_page j2 
          ON j2.current_version_uuid = jv2.uuid
        ORDER BY 
          lineage_event_time ASC
        ) e
    GROUP BY e.run_uuid
    )
    SELECT 
      j.*,
      f.facets
    FROM 
      jobs_view_page AS j
    LEFT OUTER JOIN job_versions_temp AS jv 
      ON jv.uuid = j.current_version_uuid
    LEFT OUTER JOIN facets_temp AS f
      ON f.run_uuid = jv.latest_run_uuid
    ORDER BY
        j.name

On the same cluster db.t4g.medium (vCPU: 2, RAM: 4 GB), the optimization reduced the execution time from 17 seconds with limit=100 to a mere 300ms. For limit=25, it dropped from 12 seconds to under 100ms.

Furthermore, I believe there's potential for even more optimization. If job_facets_view included the column namespace_name, it might allow for further refinements.

One-line summary: Optimized a critical SQL query in JobDao.java, resulting in a significant reduction in execution time.

Checklist

I've signed-off on my work.
My changes are accompanied by tests (if relevant).
The change contains a small diff and is self-contained.
I've updated the relevant documentation (if necessary).
I've included a one-line summary of my change for the CHANGELOG.md (Depending on the change, this may not be necessary).
I've versioned my .sql database schema migration according to Flyway's naming convention (if relevant).
I've included the appropriate header in any source code files (if relevant).

Signed-off-by: Abdallah Terrab <abdallah@terrab.me>

boring-cyborg · 2023-09-06T20:39:00Z

Thanks for opening your first pull request in the Marquez project! Please check out our contributing guidelines (https://github.com/MarquezProject/marquez/blob/main/CONTRIBUTING.md).

codecov · 2023-09-06T20:44:46Z

Codecov Report

Merging #2609 (0db03bc) into main (3f19508) will not change coverage.
The diff coverage is n/a.

@@            Coverage Diff            @@
##               main    #2609   +/-   ##
=========================================
  Coverage     83.31%   83.31%           
  Complexity     1289     1289           
=========================================
  Files           243      243           
  Lines          5940     5940           
  Branches        280      280           
=========================================
  Hits           4949     4949           
  Misses          844      844           
  Partials        147      147

Files Changed	Coverage Δ
api/src/main/java/marquez/db/JobDao.java	`100.00% <ø> (ø)`

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

pawel-big-lebowski

The queries are equivalent and new one is proven to be faster, so the PR deserves an approval 👍 🔢 🥇

I thought that CTEs do just work in a way as copy pasting the syntax to make it more readable, while not affecting the query plan and performance. In this case, it looks like it allowed limit before the joins. Great finding.

@algorithmy1 First-class work. Thank you.

algorithmy1 · 2023-09-07T12:41:11Z

Thank you for the compliment 🥰

By the way we can improve the query without using CETs but we will need to use limit 2 times (filter on the page two times). Moreover this way, the query is more readable.

boring-cyborg · 2023-09-07T12:42:24Z

Great job! Congrats on your first merged pull request in the Marquez project!

* perf/ dao query updated Signed-off-by: Abdallah Terrab <abdallah@terrab.me> * ./gradlew spotlessJavaCheck Signed-off-by: Abdallah Terrab <abdallah@terrab.me> --------- Signed-off-by: Abdallah Terrab <abdallah@terrab.me>

algorithmy1 added 2 commits September 6, 2023 21:16

perf/ dao query updated

6891fb3

Signed-off-by: Abdallah Terrab <abdallah@terrab.me>

./gradlew spotlessJavaCheck

0db03bc

Signed-off-by: Abdallah Terrab <abdallah@terrab.me>

boring-cyborg Bot added the api API layer changes label Sep 6, 2023

julienledem requested a review from pawel-big-lebowski September 6, 2023 22:29

pawel-big-lebowski approved these changes Sep 7, 2023

View reviewed changes

pawel-big-lebowski merged commit 3243a0f into MarquezProject:main Sep 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perf/improve jobdao query#2609

Perf/improve jobdao query#2609
pawel-big-lebowski merged 2 commits into
MarquezProject:mainfrom
algorithmy1:perf/improve-jobdao-query

algorithmy1 commented Sep 6, 2023

Uh oh!

boring-cyborg Bot commented Sep 6, 2023

Uh oh!

codecov Bot commented Sep 6, 2023 •

edited

Loading

Uh oh!

pawel-big-lebowski left a comment

Uh oh!

algorithmy1 commented Sep 7, 2023 •

edited

Loading

Uh oh!

boring-cyborg Bot commented Sep 7, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

algorithmy1 commented Sep 6, 2023

Problem

Solution

Checklist

Uh oh!

boring-cyborg Bot commented Sep 6, 2023

Uh oh!

codecov Bot commented Sep 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

pawel-big-lebowski left a comment

Choose a reason for hiding this comment

Uh oh!

algorithmy1 commented Sep 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

boring-cyborg Bot commented Sep 7, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented Sep 6, 2023 •

edited

Loading

algorithmy1 commented Sep 7, 2023 •

edited

Loading