Perf/improve jobdao query#2609
Conversation
Signed-off-by: Abdallah Terrab <abdallah@terrab.me>
Signed-off-by: Abdallah Terrab <abdallah@terrab.me>
|
Thanks for opening your first pull request in the Marquez project! Please check out our contributing guidelines (https://github.com/MarquezProject/marquez/blob/main/CONTRIBUTING.md). |
Codecov Report
@@ Coverage Diff @@
## main #2609 +/- ##
=========================================
Coverage 83.31% 83.31%
Complexity 1289 1289
=========================================
Files 243 243
Lines 5940 5940
Branches 280 280
=========================================
Hits 4949 4949
Misses 844 844
Partials 147 147
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
pawel-big-lebowski
left a comment
There was a problem hiding this comment.
The queries are equivalent and new one is proven to be faster, so the PR deserves an approval 👍 🔢 🥇
I thought that CTEs do just work in a way as copy pasting the syntax to make it more readable, while not affecting the query plan and performance. In this case, it looks like it allowed limit before the joins. Great finding.
@algorithmy1 First-class work. Thank you.
|
Thank you for the compliment 🥰 By the way we can improve the query without using |
|
Great job! Congrats on your first merged pull request in the Marquez project! |
* perf/ dao query updated Signed-off-by: Abdallah Terrab <abdallah@terrab.me> * ./gradlew spotlessJavaCheck Signed-off-by: Abdallah Terrab <abdallah@terrab.me> --------- Signed-off-by: Abdallah Terrab <abdallah@terrab.me>
Problem
Hello,
While working with Marquez, I noticed a significant performance bottleneck with a specific SQL query in the
JobDao.javafile. For the namespaceName "MyNameSpace", the query was originally taking 17 seconds to execute with a limit of 100, and 12 seconds with a limit of 25. Given that this query runs every time the Marquez web UI is accessed, this presented a major user experience challenge.db.t4g.medium (vCPU: 2, RAM: 4 GB)See : #2608
Solution
To address this, I've revised the query. The optimized query makes use of Common Table Expressions to fetch the required data more efficiently and before the join. Here's the optimized query:
WITH jobs_view_page AS ( SELECT * FROM jobs_view AS j WHERE j.namespace_name = :namespaceName ORDER BY j.name LIMIT :limit OFFSET :offset ), job_versions_temp AS ( SELECT * FROM job_versions AS j WHERE j.namespace_name = :namespaceName ), facets_temp AS ( SELECT run_uuid, JSON_AGG(e.facet) AS facets FROM ( SELECT jf.run_uuid, jf.facet FROM job_facets_view AS jf INNER JOIN job_versions_temp jv2 ON jv2.latest_run_uuid = jf.run_uuid INNER JOIN jobs_view_page j2 ON j2.current_version_uuid = jv2.uuid ORDER BY lineage_event_time ASC ) e GROUP BY e.run_uuid ) SELECT j.*, f.facets FROM jobs_view_page AS j LEFT OUTER JOIN job_versions_temp AS jv ON jv.uuid = j.current_version_uuid LEFT OUTER JOIN facets_temp AS f ON f.run_uuid = jv.latest_run_uuid ORDER BY j.nameOn the same cluster
db.t4g.medium (vCPU: 2, RAM: 4 GB), the optimization reduced the execution time from 17 seconds withlimit=100to a mere 300ms. Forlimit=25, it dropped from 12 seconds to under 100ms.Furthermore, I believe there's potential for even more optimization. If
job_facets_viewincluded the columnnamespace_name, it might allow for further refinements.One-line summary: Optimized a critical SQL query in
JobDao.java, resulting in a significant reduction in execution time.Checklist
CHANGELOG.md(Depending on the change, this may not be necessary)..sqldatabase schema migration according to Flyway's naming convention (if relevant).