Skip to content

Runless events - consume job event#2661

Merged
pawel-big-lebowski merged 5 commits into
mainfrom
static/job-event
Nov 16, 2023
Merged

Runless events - consume job event#2661
pawel-big-lebowski merged 5 commits into
mainfrom
static/job-event

Conversation

@pawel-big-lebowski

@pawel-big-lebowski pawel-big-lebowski commented Oct 26, 2023

Copy link
Copy Markdown
Collaborator

Problem

As a followup of #2641, this PR introduces support for JobEvent. PR shall be merged after #2641.

Solution

  • Schema changes: job_version_uuid column is added to job_facets table. This requires db migration to backfill existing job_facets entries.
  • JobDao as findJobByName method should work without join to runs table.
  • Add spec_event_type to lineage_events table to indicate which type of event is stored.

Limitations:

  • listLineage endpoints filters RunEvent only and does not support introduced event types. This can be implemented within other PR and issue.

Note: All database schema changes require discussion. Please link the issue for context.

One-line summary:

Checklist

  • You've signed-off your work
  • Your changes are accompanied by tests (if relevant)
  • Your change contains a small diff and is self-contained
  • You've updated any relevant documentation (if relevant)
  • You've included a one-line summary of your change for the CHANGELOG.md (Depending on the change, this may not be necessary).
  • You've versioned your .sql database schema migration according to Flyway's naming convention (if relevant)
  • You've included a header in any source code files (if relevant)

@boring-cyborg boring-cyborg Bot added the api API layer changes label Oct 26, 2023
@pawel-big-lebowski pawel-big-lebowski force-pushed the static/job-event branch 3 times, most recently from 5158fc2 to bdcfec7 Compare October 31, 2023 12:48
@codecov

codecov Bot commented Oct 31, 2023

Copy link
Copy Markdown

Codecov Report

Attention: 10 lines in your changes are missing coverage. Please review.

Comparison is base (3a26e50) 83.76% compared to head (0a3f98a) 84.05%.

Files Patch % Lines
api/src/main/java/marquez/db/OpenLineageDao.java 92.10% 4 Missing and 2 partials ⚠️
.../migrations/V66_3_JobFacetsBackfillJobVersion.java 81.81% 2 Missing ⚠️
...src/main/java/marquez/api/OpenLineageResource.java 75.00% 0 Missing and 1 partial ⚠️
api/src/main/java/marquez/db/models/ModelDaos.java 75.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2661      +/-   ##
============================================
+ Coverage     83.76%   84.05%   +0.28%     
- Complexity     1338     1379      +41     
============================================
  Files           247      248       +1     
  Lines          6112     6297     +185     
  Branches        281      286       +5     
============================================
+ Hits           5120     5293     +173     
- Misses          843      851       +8     
- Partials        149      153       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pawel-big-lebowski pawel-big-lebowski force-pushed the static/job-event branch 4 times, most recently from 32deb1a to 151ed6b Compare November 1, 2023 11:09
@boring-cyborg boring-cyborg Bot added the docs label Nov 1, 2023
@pawel-big-lebowski pawel-big-lebowski marked this pull request as ready for review November 1, 2023 11:32
@pawel-big-lebowski pawel-big-lebowski marked this pull request as draft November 1, 2023 11:54
@pawel-big-lebowski pawel-big-lebowski marked this pull request as ready for review November 1, 2023 12:30
Base automatically changed from static/dataset-event to main November 6, 2023 07:16
Comment thread CHANGELOG.md Outdated
) e
GROUP BY e.run_uuid
) f ON f.run_uuid=jv.latest_run_uuid
LEFT OUTER JOIN job_versions_facets f ON j.current_version_uuid = f.job_version_uuid

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

* @param jobRow The job.
* @return A {@link BagOfJobVersionInfo} object.
*/
default BagOfJobVersionInfo upsertRunlessJobVersion(

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: We can consider factoring out common code in JobVersionDao.upsertJobVersionOnRunTransition() and upsertRunlessJobVersion() to avoid duplication. But, not a major concern as we are aware we'll need to revisit some of this code later.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can do later further refactor with this section.

Comment thread api/src/main/resources/marquez/db/migration/V66.1__job_facets_changes.sql Outdated
Comment thread api/src/main/resources/marquez/db/migration/V66.1__job_facets_changes.sql Outdated
CREATE INDEX job_facets_job_version_uuid ON job_facets (job_version_uuid);

ALTER TABLE lineage_events ADD COLUMN spec_event_type VARCHAR(64);
UPDATE lineage_events SET spec_event_type = 'RunEvent'; No newline at end of file

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to set all events to RunEvent? Also, I'd define the _event_type enum as RUN_EVENT, DATASET_EVENT , or JOB_EVENT.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched to enum and set a default value instead of running update table.

@netlify

netlify Bot commented Nov 6, 2023

Copy link
Copy Markdown

Deploy Preview for peppy-sprite-186812 canceled.

Name Link
🔨 Latest commit 0a3f98a
🔍 Latest deploy log https://app.netlify.com/sites/peppy-sprite-186812/deploys/65549cb91e31ca0008c63c38

@@ -0,0 +1,4 @@
CREATE TYPE EVENT_TYPE AS ENUM ('RUN_EVENT', 'DATASET_EVENT', 'JOB_EVENT');

@wslulciuc wslulciuc Nov 15, 2023

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: We standardized on not defining enums types in the DB layer but rather enforce them in the application layer. This way, it avoids a DB migration every time we (might) add a new event type.

tl;dr, I'd just define _event_type as a string

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added extra commit for that: 0a3f98a


@SqlUpdate(
"INSERT INTO lineage_events ("
+ "event_type, "

@wslulciuc wslulciuc Nov 15, 2023

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: We can consider defining a _run_state column and eventually dropping the event_type. That is, we can consider columns prefixed with _ to be "remappings" of OL properties to Marquez.

@wslulciuc wslulciuc left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments / suggests, otherwise 💯 💯 🥇

Signed-off-by: Pawel Leszczynski <leszczynski.pawel@gmail.com>
Signed-off-by: Pawel Leszczynski <leszczynski.pawel@gmail.com>
Signed-off-by: Pawel Leszczynski <leszczynski.pawel@gmail.com>
Signed-off-by: Pawel Leszczynski <leszczynski.pawel@gmail.com>
Signed-off-by: Pawel Leszczynski <leszczynski.pawel@gmail.com>
@pawel-big-lebowski pawel-big-lebowski merged commit 60d7d90 into main Nov 16, 2023
@pawel-big-lebowski pawel-big-lebowski deleted the static/job-event branch November 16, 2023 06:57
@wslulciuc wslulciuc added this to the 0.43.0 milestone Dec 13, 2023
jonathanpmoraes referenced this pull request in nubank/NuMarquez Feb 6, 2025
* Runless events - consume job event

Signed-off-by: Pawel Leszczynski <leszczynski.pawel@gmail.com>

* Runless events - fix listLineage API

Signed-off-by: Pawel Leszczynski <leszczynski.pawel@gmail.com>

* fix rebase

Signed-off-by: Pawel Leszczynski <leszczynski.pawel@gmail.com>

* modify event type column

Signed-off-by: Pawel Leszczynski <leszczynski.pawel@gmail.com>

* make _event_type varchar(64) instead of enum

Signed-off-by: Pawel Leszczynski <leszczynski.pawel@gmail.com>

---------

Signed-off-by: Pawel Leszczynski <leszczynski.pawel@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api API layer changes docs

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants