Skip to content

Flink fix terminal streaming events#2768

Merged
wslulciuc merged 1 commit into
mainfrom
streaming-fix
Mar 15, 2024
Merged

Flink fix terminal streaming events#2768
wslulciuc merged 1 commit into
mainfrom
streaming-fix

Conversation

@pawel-big-lebowski

@pawel-big-lebowski pawel-big-lebowski commented Mar 14, 2024

Copy link
Copy Markdown
Collaborator

Problem

Marquez creates new job version for streaming jobs whenever hash of a job version changes. We introduced this assumption as it makes sense most of the time. However, this does not make much sense for terminal events. In other words, a terminal event for streaming job like complete with no input nor output datasets contained, should mean only a job has finished. It shouldn't mean creating a new job version which is current behaviour.

Closes: #2767

Solution

Please describe your change as it relates to the problem, or bug fix, as well as any dependencies. If your change requires a database schema migration, please describe the schema modification(s) and whether it's a backwards-incompatible or backwards-compatible change.

Note: All database schema changes require discussion. Please link the issue for context.

One-line summary:

Checklist

  • You've signed-off your work
  • Your changes are accompanied by tests (if relevant)
  • Your change contains a small diff and is self-contained
  • You've updated any relevant documentation (if relevant)
  • You've included a one-line summary of your change for the CHANGELOG.md (Depending on the change, this may not be necessary).
  • You've versioned your .sql database schema migration according to Flyway's naming convention (if relevant)
  • You've included a header in any source code files (if relevant)

@boring-cyborg boring-cyborg Bot added api API layer changes docs labels Mar 14, 2024
@netlify

netlify Bot commented Mar 14, 2024

Copy link
Copy Markdown

Deploy Preview for peppy-sprite-186812 canceled.

Name Link
🔨 Latest commit 5a0434c
🔍 Latest deploy log https://app.netlify.com/sites/peppy-sprite-186812/deploys/65f41cf1b30d59000853ca8c

@codecov

codecov Bot commented Mar 14, 2024

Copy link
Copy Markdown

Codecov Report

Attention: Patch coverage is 90.00000% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 84.47%. Comparing base (78a191b) to head (5a0434c).

Files Patch % Lines
...main/java/marquez/service/models/LineageEvent.java 83.33% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##               main    #2768   +/-   ##
=========================================
  Coverage     84.46%   84.47%           
- Complexity     1415     1429   +14     
=========================================
  Files           251      251           
  Lines          6450     6460   +10     
  Branches        292      299    +7     
=========================================
+ Hits           5448     5457    +9     
  Misses          850      850           
- Partials        152      153    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Pawel Leszczynski <leszczynski.pawel@gmail.com>

@wslulciuc wslulciuc left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for improving our lineage support for streaming jobs, @pawel-big-lebowski! The inclusion of a terminal "state" provides a path to make better decision (hopefully) on the current stage and all subsequent stages of a streaming job. I do feel we need to revisit this logic and document our reasoning, but let's first learn from real world scenarios on how the Marquez metadata model can be improved.

@wslulciuc wslulciuc merged commit 44bf397 into main Mar 15, 2024
@wslulciuc wslulciuc deleted the streaming-fix branch March 15, 2024 15:25
@pawel-big-lebowski

Copy link
Copy Markdown
Collaborator Author

@wslulciuc having the same feeling about this.

jonathanpmoraes referenced this pull request in nubank/NuMarquez Feb 6, 2025
Signed-off-by: Pawel Leszczynski <leszczynski.pawel@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api API layer changes docs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Streaming jobs do not cumulate datasets sent through a run

2 participants