Skip to content

Add col current_run_uuid to jobs#2929

Merged
wslulciuc merged 4 commits into
mainfrom
feature/add-current-run-uuid-to-jobs
Oct 17, 2024
Merged

Add col current_run_uuid to jobs#2929
wslulciuc merged 4 commits into
mainfrom
feature/add-current-run-uuid-to-jobs

Conversation

@wslulciuc

@wslulciuc wslulciuc commented Oct 17, 2024

Copy link
Copy Markdown
Member

This PR is a slightly optimized approach replacing #2928 by adding jobs.current_run_uuid.

SQL Perf

EXPLAIN plan

Limit  (cost=77.73..77.74 rows=5 width=742)
  CTE jobs_view_page
    ->  Hash Left Join  (cost=1.11..2.23 rows=5 width=565)
          Hash Cond: (j_2.parent_job_uuid = p.uuid)
          ->  Seq Scan on jobs j_2  (cost=0.00..1.06 rows=5 width=385)
                Filter: ((is_hidden IS FALSE) AND (symlink_target_uuid IS NULL) AND ((namespace_name)::text = 'namespace1996501068'::text))
          ->  Hash  (cost=1.05..1.05 rows=5 width=29)
                ->  Seq Scan on jobs p  (cost=0.00..1.05 rows=5 width=29)
  CTE job_versions_temp
    ->  Seq Scan on job_versions j_3  (cost=0.00..2.61 rows=49 width=661)
          Filter: ((namespace_name)::text = 'namespace1996501068'::text)
  ->  Sort  (cost=72.89..72.90 rows=5 width=742)
        Sort Key: j.updated_at DESC
        ->  Nested Loop Left Join  (cost=68.61..72.83 rows=5 width=742)
              Join Filter: (j.uuid = jt.uuid)
              ->  Hash Left Join  (cost=8.48..12.24 rows=5 width=710)
                    Hash Cond: (jv.latest_run_uuid = f.run_uuid)
                    ->  Hash Right Join  (cost=1.44..5.20 rows=5 width=694)
                          Hash Cond: (r.uuid = j.current_run_uuid)
                          Filter: (((r.current_run_state)::text = ANY ('{RUNNING,COMPLETED,FAILED}'::text[])) OR (r.uuid IS NULL))
                          ->  Seq Scan on runs r  (cost=0.00..3.50 rows=50 width=24)
                          ->  Hash  (cost=1.38..1.38 rows=5 width=694)
                                ->  Hash Right Join  (cost=0.16..1.38 rows=5 width=694)
                                      Hash Cond: (jv.uuid = j.current_version_uuid)
                                      ->  CTE Scan on job_versions_temp jv  (cost=0.00..0.98 rows=49 width=32)
                                      ->  Hash  (cost=0.10..0.10 rows=5 width=678)
                                            ->  CTE Scan on jobs_view_page j  (cost=0.00..0.10 rows=5 width=678)
                    ->  Hash  (cost=6.97..6.97 rows=5 width=48)
                          ->  Subquery Scan on f  (cost=6.86..6.97 rows=5 width=48)
                                ->  HashAggregate  (cost=6.86..6.92 rows=5 width=48)
                                      Group Key: job_facets.run_uuid
                                      ->  Sort  (cost=6.77..6.79 rows=5 width=56)
                                            Sort Key: job_facets.lineage_event_time
                                            ->  Nested Loop  (cost=0.31..6.72 rows=5 width=56)
                                                  ->  Hash Join  (cost=0.16..1.38 rows=5 width=16)
                                                        Hash Cond: (jv2.uuid = j2.current_version_uuid)
                                                        ->  CTE Scan on job_versions_temp jv2  (cost=0.00..0.98 rows=49 width=32)
                                                        ->  Hash  (cost=0.10..0.10 rows=5 width=16)
                                                              ->  CTE Scan on jobs_view_page j2  (cost=0.00..0.10 rows=5 width=16)
                                                  ->  Index Scan using job_facets_run_uuid_index on job_facets  (cost=0.14..1.06 rows=1 width=56)
                                                        Index Cond: (run_uuid = jv2.latest_run_uuid)
              ->  Materialize  (cost=60.14..60.28 rows=5 width=48)
                    ->  Subquery Scan on jt  (cost=60.14..60.25 rows=5 width=48)
                          ->  HashAggregate  (cost=60.14..60.20 rows=5 width=48)
                                Group Key: j_1.uuid
                                ->  Hash Join  (cost=25.75..54.14 rows=1200 width=48)
                                      Hash Cond: (jtm.tag_uuid = t.uuid)
                                      ->  Hash Join  (cost=1.12..26.34 rows=1200 width=32)
                                            Hash Cond: (jtm.job_uuid = j_1.uuid)
                                            ->  Seq Scan on jobs_tag_mapping jtm  (cost=0.00..22.00 rows=1200 width=32)
                                            ->  Hash  (cost=1.06..1.06 rows=5 width=16)
                                                  ->  Seq Scan on jobs j_1  (cost=0.00..1.06 rows=5 width=16)
                                                        Filter: ((namespace_name)::text = 'namespace1996501068'::text)
                                      ->  Hash  (cost=16.50..16.50 rows=650 width=48)
                                            ->  Seq Scan on tags t  (cost=0.00..16.50 rows=650 width=48)

This PR also updates how the job type is display within the jobs list:

Screenshot 2024-10-17 at 12 19 23 AM

Signed-off-by: Willy Lulciuc <willy.lulciuc@gmail.com>
@boring-cyborg boring-cyborg Bot added api API layer changes web labels Oct 17, 2024
@netlify

netlify Bot commented Oct 17, 2024

Copy link
Copy Markdown

Deploy Preview for peppy-sprite-186812 canceled.

Name Link
🔨 Latest commit caa49cf
🔍 Latest deploy log https://app.netlify.com/sites/peppy-sprite-186812/deploys/67116955b41c6600082f01a4

Signed-off-by: Willy Lulciuc <willy.lulciuc@gmail.com>
@wslulciuc wslulciuc enabled auto-merge (squash) October 17, 2024 07:21
@codecov

codecov Bot commented Oct 17, 2024

Copy link
Copy Markdown

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.13%. Comparing base (05d16aa) to head (caa49cf).
Report is 1 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2929      +/-   ##
============================================
+ Coverage     81.12%   81.13%   +0.01%     
- Complexity     1505     1507       +2     
============================================
  Files           268      268              
  Lines          7358     7364       +6     
  Branches        330      330              
============================================
+ Hits           5969     5975       +6     
  Misses         1228     1228              
  Partials        161      161              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Willy Lulciuc <willy.lulciuc@gmail.com>
Signed-off-by: Willy Lulciuc <willy.lulciuc@gmail.com>
@wslulciuc wslulciuc requested a review from phixMe October 17, 2024 19:46
@wslulciuc wslulciuc added this to the 0.50.0 milestone Oct 17, 2024

@phixMe phixMe left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐑 it!

@wslulciuc wslulciuc merged commit 90a2f65 into main Oct 17, 2024
@wslulciuc wslulciuc deleted the feature/add-current-run-uuid-to-jobs branch October 17, 2024 19:59
jonathanpmoraes referenced this pull request in nubank/NuMarquez Feb 6, 2025
* Add col current_run_uuid to jobs

Signed-off-by: Willy Lulciuc <willy.lulciuc@gmail.com>

* Apply formatting

Signed-off-by: Willy Lulciuc <willy.lulciuc@gmail.com>

---------

Signed-off-by: Willy Lulciuc <willy.lulciuc@gmail.com>
jni-bot added a commit to jni-bot/marquez that referenced this pull request Feb 12, 2026
…roject#2987)

The findByLatestJob query was scanning millions of rows in run_facets
and dataset_facets by joining runs_view with jobs_view in the WHERE
clause. This caused 7+ minute query times per job, making the jobs
list page unusable at scale.

Two changes:
- Add findCurrentRunByJob using jobs.current_run_uuid (from MarquezProject#2929)
  for the jobs list page, reducing it to a single indexed UUID lookup
- Optimize findByLatestJob to filter on runs.job_uuid (indexed) with
  symlink resolution, instead of the expensive runs_view/jobs_view join

Closes MarquezProject#2987

Signed-off-by: jni-bot <jni-bot@users.noreply.github.com>
merobi-hub added a commit that referenced this pull request Apr 12, 2026
…3091)

The findByLatestJob query was scanning millions of rows in run_facets
and dataset_facets by joining runs_view with jobs_view in the WHERE
clause. This caused 7+ minute query times per job, making the jobs
list page unusable at scale.

Two changes:
- Add findCurrentRunByJob using jobs.current_run_uuid (from #2929)
  for the jobs list page, reducing it to a single indexed UUID lookup
- Optimize findByLatestJob to filter on runs.job_uuid (indexed) with
  symlink resolution, instead of the expensive runs_view/jobs_view join

Closes #2987

Signed-off-by: jni-bot <jni-bot@users.noreply.github.com>
Co-authored-by: jni-bot <jni-bot@users.noreply.github.com>
Co-authored-by: Michael Robinson <68482867+merobi-hub@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api API layer changes web

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants