Add col `current_run_uuid` to `jobs` by wslulciuc · Pull Request #2929 · MarquezProject/marquez

wslulciuc · 2024-10-17T07:10:03Z

This PR is a slightly optimized approach replacing #2928 by adding jobs.current_run_uuid.

`SQL` Perf

EXPLAIN plan

Limit  (cost=77.73..77.74 rows=5 width=742)
  CTE jobs_view_page
    ->  Hash Left Join  (cost=1.11..2.23 rows=5 width=565)
          Hash Cond: (j_2.parent_job_uuid = p.uuid)
          ->  Seq Scan on jobs j_2  (cost=0.00..1.06 rows=5 width=385)
                Filter: ((is_hidden IS FALSE) AND (symlink_target_uuid IS NULL) AND ((namespace_name)::text = 'namespace1996501068'::text))
          ->  Hash  (cost=1.05..1.05 rows=5 width=29)
                ->  Seq Scan on jobs p  (cost=0.00..1.05 rows=5 width=29)
  CTE job_versions_temp
    ->  Seq Scan on job_versions j_3  (cost=0.00..2.61 rows=49 width=661)
          Filter: ((namespace_name)::text = 'namespace1996501068'::text)
  ->  Sort  (cost=72.89..72.90 rows=5 width=742)
        Sort Key: j.updated_at DESC
        ->  Nested Loop Left Join  (cost=68.61..72.83 rows=5 width=742)
              Join Filter: (j.uuid = jt.uuid)
              ->  Hash Left Join  (cost=8.48..12.24 rows=5 width=710)
                    Hash Cond: (jv.latest_run_uuid = f.run_uuid)
                    ->  Hash Right Join  (cost=1.44..5.20 rows=5 width=694)
                          Hash Cond: (r.uuid = j.current_run_uuid)
                          Filter: (((r.current_run_state)::text = ANY ('{RUNNING,COMPLETED,FAILED}'::text[])) OR (r.uuid IS NULL))
                          ->  Seq Scan on runs r  (cost=0.00..3.50 rows=50 width=24)
                          ->  Hash  (cost=1.38..1.38 rows=5 width=694)
                                ->  Hash Right Join  (cost=0.16..1.38 rows=5 width=694)
                                      Hash Cond: (jv.uuid = j.current_version_uuid)
                                      ->  CTE Scan on job_versions_temp jv  (cost=0.00..0.98 rows=49 width=32)
                                      ->  Hash  (cost=0.10..0.10 rows=5 width=678)
                                            ->  CTE Scan on jobs_view_page j  (cost=0.00..0.10 rows=5 width=678)
                    ->  Hash  (cost=6.97..6.97 rows=5 width=48)
                          ->  Subquery Scan on f  (cost=6.86..6.97 rows=5 width=48)
                                ->  HashAggregate  (cost=6.86..6.92 rows=5 width=48)
                                      Group Key: job_facets.run_uuid
                                      ->  Sort  (cost=6.77..6.79 rows=5 width=56)
                                            Sort Key: job_facets.lineage_event_time
                                            ->  Nested Loop  (cost=0.31..6.72 rows=5 width=56)
                                                  ->  Hash Join  (cost=0.16..1.38 rows=5 width=16)
                                                        Hash Cond: (jv2.uuid = j2.current_version_uuid)
                                                        ->  CTE Scan on job_versions_temp jv2  (cost=0.00..0.98 rows=49 width=32)
                                                        ->  Hash  (cost=0.10..0.10 rows=5 width=16)
                                                              ->  CTE Scan on jobs_view_page j2  (cost=0.00..0.10 rows=5 width=16)
                                                  ->  Index Scan using job_facets_run_uuid_index on job_facets  (cost=0.14..1.06 rows=1 width=56)
                                                        Index Cond: (run_uuid = jv2.latest_run_uuid)
              ->  Materialize  (cost=60.14..60.28 rows=5 width=48)
                    ->  Subquery Scan on jt  (cost=60.14..60.25 rows=5 width=48)
                          ->  HashAggregate  (cost=60.14..60.20 rows=5 width=48)
                                Group Key: j_1.uuid
                                ->  Hash Join  (cost=25.75..54.14 rows=1200 width=48)
                                      Hash Cond: (jtm.tag_uuid = t.uuid)
                                      ->  Hash Join  (cost=1.12..26.34 rows=1200 width=32)
                                            Hash Cond: (jtm.job_uuid = j_1.uuid)
                                            ->  Seq Scan on jobs_tag_mapping jtm  (cost=0.00..22.00 rows=1200 width=32)
                                            ->  Hash  (cost=1.06..1.06 rows=5 width=16)
                                                  ->  Seq Scan on jobs j_1  (cost=0.00..1.06 rows=5 width=16)
                                                        Filter: ((namespace_name)::text = 'namespace1996501068'::text)
                                      ->  Hash  (cost=16.50..16.50 rows=650 width=48)
                                            ->  Seq Scan on tags t  (cost=0.00..16.50 rows=650 width=48)

This PR also updates how the job type is display within the jobs list:

Signed-off-by: Willy Lulciuc <willy.lulciuc@gmail.com>

netlify · 2024-10-17T07:10:27Z

✅ Deploy Preview for peppy-sprite-186812 canceled.

Name	Link
🔨 Latest commit	`caa49cf`
🔍 Latest deploy log	https://app.netlify.com/sites/peppy-sprite-186812/deploys/67116955b41c6600082f01a4

Signed-off-by: Willy Lulciuc <willy.lulciuc@gmail.com>

codecov · 2024-10-17T07:29:31Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.13%. Comparing base (05d16aa) to head (caa49cf).
Report is 1 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #2929      +/-   ##
============================================
+ Coverage     81.12%   81.13%   +0.01%     
- Complexity     1505     1507       +2     
============================================
  Files           268      268              
  Lines          7358     7364       +6     
  Branches        330      330              
============================================
+ Hits           5969     5975       +6     
  Misses         1228     1228              
  Partials        161      161

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Willy Lulciuc <willy.lulciuc@gmail.com>

phixMe

🐑 it!

* Add col current_run_uuid to jobs Signed-off-by: Willy Lulciuc <willy.lulciuc@gmail.com> * Apply formatting Signed-off-by: Willy Lulciuc <willy.lulciuc@gmail.com> --------- Signed-off-by: Willy Lulciuc <willy.lulciuc@gmail.com>

…roject#2987) The findByLatestJob query was scanning millions of rows in run_facets and dataset_facets by joining runs_view with jobs_view in the WHERE clause. This caused 7+ minute query times per job, making the jobs list page unusable at scale. Two changes: - Add findCurrentRunByJob using jobs.current_run_uuid (from MarquezProject#2929) for the jobs list page, reducing it to a single indexed UUID lookup - Optimize findByLatestJob to filter on runs.job_uuid (indexed) with symlink resolution, instead of the expensive runs_view/jobs_view join Closes MarquezProject#2987 Signed-off-by: jni-bot <jni-bot@users.noreply.github.com>

…3091) The findByLatestJob query was scanning millions of rows in run_facets and dataset_facets by joining runs_view with jobs_view in the WHERE clause. This caused 7+ minute query times per job, making the jobs list page unusable at scale. Two changes: - Add findCurrentRunByJob using jobs.current_run_uuid (from #2929) for the jobs list page, reducing it to a single indexed UUID lookup - Optimize findByLatestJob to filter on runs.job_uuid (indexed) with symlink resolution, instead of the expensive runs_view/jobs_view join Closes #2987 Signed-off-by: jni-bot <jni-bot@users.noreply.github.com> Co-authored-by: jni-bot <jni-bot@users.noreply.github.com> Co-authored-by: Michael Robinson <68482867+merobi-hub@users.noreply.github.com>

Add col current_run_uuid to jobs

43d0e8a

Signed-off-by: Willy Lulciuc <willy.lulciuc@gmail.com>

boring-cyborg Bot added api API layer changes web labels Oct 17, 2024

Apply formatting

0521b04

Signed-off-by: Willy Lulciuc <willy.lulciuc@gmail.com>

wslulciuc enabled auto-merge (squash) October 17, 2024 07:21

wslulciuc added 2 commits October 17, 2024 12:02

Remove transaction in V74__alter_jobs_to_add_current_run_uuid.sql

a099140

Signed-off-by: Willy Lulciuc <willy.lulciuc@gmail.com>

Order by started_at (desc)

caa49cf

Signed-off-by: Willy Lulciuc <willy.lulciuc@gmail.com>

wslulciuc requested a review from phixMe October 17, 2024 19:46

wslulciuc added this to the 0.50.0 milestone Oct 17, 2024

phixMe approved these changes Oct 17, 2024

View reviewed changes

wslulciuc merged commit 90a2f65 into main Oct 17, 2024

wslulciuc deleted the feature/add-current-run-uuid-to-jobs branch October 17, 2024 19:59

jni-bot mentioned this pull request Jan 23, 2026

Very slow /api/v1/jobs endpoint after upgrading to 0.50.0 #2987

Closed

jni-bot mentioned this pull request Feb 12, 2026

Fix slow /api/v1/jobs endpoint by optimizing RunDao queries #3091

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add col `current_run_uuid` to `jobs`#2929

Add col `current_run_uuid` to `jobs`#2929
wslulciuc merged 4 commits into
mainfrom
feature/add-current-run-uuid-to-jobs

wslulciuc commented Oct 17, 2024 •

edited

Loading

Uh oh!

netlify Bot commented Oct 17, 2024 •

edited

Loading

Uh oh!

codecov Bot commented Oct 17, 2024 •

edited

Loading

Uh oh!

phixMe left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wslulciuc commented Oct 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

SQL Perf

Uh oh!

netlify Bot commented Oct 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for peppy-sprite-186812 canceled.

Uh oh!

codecov Bot commented Oct 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

phixMe left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wslulciuc commented Oct 17, 2024 •

edited

Loading

`SQL` Perf

netlify Bot commented Oct 17, 2024 •

edited

Loading

codecov Bot commented Oct 17, 2024 •

edited

Loading