Skip to content

fix column lineage returning multiple entries for job run multiple times#2176

Merged
pawel-big-lebowski merged 1 commit into
mainfrom
fix-column-lineaege-mulitple-runs
Oct 10, 2022
Merged

fix column lineage returning multiple entries for job run multiple times#2176
pawel-big-lebowski merged 1 commit into
mainfrom
fix-column-lineaege-mulitple-runs

Conversation

@pawel-big-lebowski

@pawel-big-lebowski pawel-big-lebowski commented Oct 10, 2022

Copy link
Copy Markdown
Collaborator

Signed-off-by: Pawel Leszczynski leszczynski.pawel@gmail.com

Problem

Each column lineage row contains columns: output_dataset_version_uuid, output_dataset_field_uuid, input_dataset_version_uuid, input_dataset_field_uuid. This means that in case of a single job reading one column and writing to another, if a job is run twice this will result in two rows in column-lineage table. This is fine as we want to track this information. But we shouldn't retrieve this twice through the endpoint.

Solution

Return column dependency only once if a job has been run several times.

Note: All database schema changes require discussion. Please link the issue for context.

Checklist

  • You've signed-off your work
  • Your changes are accompanied by tests (if relevant)
  • Your change contains a small diff and is self-contained
  • You've updated any relevant documentation (if relevant)
  • You've updated the CHANGELOG.md with details about your change under the "Unreleased" section (if relevant, depending on the change, this may not be necessary)
  • You've versioned your .sql database schema migration according to Flyway's naming convention (if relevant)
  • You've included a header in any source code files (if relevant)

Signed-off-by: Pawel Leszczynski <leszczynski.pawel@gmail.com>
@codecov

codecov Bot commented Oct 10, 2022

Copy link
Copy Markdown

Codecov Report

Merging #2176 (5e496ec) into main (496566e) will not change coverage.
The diff coverage is n/a.

@@            Coverage Diff            @@
##               main    #2176   +/-   ##
=========================================
  Coverage     76.33%   76.33%           
  Complexity     1099     1099           
=========================================
  Files           214      214           
  Lines          5139     5139           
  Branches        407      407           
=========================================
  Hits           3923     3923           
  Misses          762      762           
  Partials        454      454           
Impacted Files Coverage Δ
api/src/main/java/marquez/db/ColumnLineageDao.java 100.00% <ø> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@pawel-big-lebowski pawel-big-lebowski merged commit aa7a47d into main Oct 10, 2022
@pawel-big-lebowski pawel-big-lebowski deleted the fix-column-lineaege-mulitple-runs branch October 10, 2022 11:26
jonathanpmoraes referenced this pull request in nubank/NuMarquez Feb 6, 2025
…mes (#2176)

Signed-off-by: Pawel Leszczynski <leszczynski.pawel@gmail.com>

Signed-off-by: Pawel Leszczynski <leszczynski.pawel@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants