-
Notifications
You must be signed in to change notification settings - Fork 403
Model and store column lineage in Marquez DB #2096
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 11 commits
Commits
Show all changes
30 commits
Select commit
Hold shift + click to select a range
9e2cd20
Create database representation, model classes
374bdd3
Implement ColumnLevelLineageDao
mzareba382 148836c
Instantiate ColumnLevelLineageDao in updateBaseMarquezModel
mzareba382 e5d9490
Merge branch 'main' into add-column-level-lineage
mzareba382 c28ea86
Merge branch 'main' into add-column-level-lineage
mzareba382 48b1ebf
Upsert ColumnLevelLineageRow to db, model representation in LineageEvent
mzareba382 3e6674e
Fix problems in OpenLineageDao, add a list of ColumnLevelLineageRow t…
mzareba382 b1eab3f
Change wildcard imports to single class imports
mzareba382 03e2963
Change wildcard imports to single class imports
mzareba382 2ad3bc4
Change wildcard imports to single class imports
mzareba382 4a5ec90
Apply spotless
mzareba382 d547fd6
Merge branch 'main' into add-column-level-lineage
mzareba382 153a19a
Merge branch 'main' into add-column-level-lineage
mzareba382 bce38ff
Check for ds.getFacets not null
mzareba382 a0251a0
Format fix
mzareba382 f17cdda
Update testUpdateMarquezModelDatasetWithColumnLineageFacet
mzareba382 dbabd08
Merge branch 'main' into add-column-level-lineage
mzareba382 2dff0f6
Test for column_level_lineage upsert.
mzareba382 b42a2ad
Apply spotless
mzareba382 95ba2e2
Merge branch 'main' into add-column-level-lineage
mzareba382 8e3fc65
switch to data field references
pawel-big-lebowski 7ce339e
fix broken tests
pawel-big-lebowski bfd7555
test when dataset_field is missing
pawel-big-lebowski 08539dd
add input_dataset_version_uuid field
pawel-big-lebowski 6fce3df
Merge branch 'main' into add-column-level-lineage
pawel-big-lebowski bf8c84e
increase db file version
pawel-big-lebowski c816996
increase db file version
pawel-big-lebowski b6d37fe
Merge branch 'add-column-level-lineage' of github.com:MarquezProject/…
pawel-big-lebowski 1aefca2
Merge branch 'main' into add-column-level-lineage
pawel-big-lebowski 21dac22
rename ColumnLevelLineage -> ColumnLineage
pawel-big-lebowski File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,79 @@ | ||
| package marquez.db; | ||
|
|
||
| /* | ||
| * Copyright 2018-2022 contributors to the Marquez project | ||
| * SPDX-License-Identifier: Apache-2.0 | ||
| */ | ||
|
|
||
| import java.time.Instant; | ||
| import java.util.Optional; | ||
| import java.util.UUID; | ||
| import marquez.db.mappers.ColumnLevelLineageRowMapper; | ||
| import marquez.db.models.ColumnLevelLineageRow; | ||
| import org.jdbi.v3.sqlobject.config.RegisterRowMapper; | ||
| import org.jdbi.v3.sqlobject.statement.SqlQuery; | ||
| import org.jdbi.v3.sqlobject.statement.SqlUpdate; | ||
|
|
||
| @RegisterRowMapper(ColumnLevelLineageRowMapper.class) | ||
| public interface ColumnLevelLineageDao extends BaseDao { | ||
|
|
||
| default ColumnLevelLineageRow upsertColumnLevelLineageRow( | ||
| UUID uuid, | ||
| UUID dataset_version_uuid, | ||
| String output_column_name, | ||
| String input_field, | ||
| String transformation_description, | ||
| String transformation_type, | ||
| Instant now) { | ||
| doUpsertColumnLevelLineageRow( | ||
| uuid, | ||
| dataset_version_uuid, | ||
| output_column_name, | ||
| input_field, | ||
| transformation_description, | ||
| transformation_type, | ||
| now); | ||
| return findColumnLevelLineageByDatasetVersionColumnAndInput( | ||
| dataset_version_uuid, output_column_name, input_field) | ||
| .orElseThrow(); | ||
| } | ||
|
|
||
| @SqlQuery("SELECT * FROM column_level_lineage WHERE dataset_version_uuid = :datasetVersionUuid") | ||
| Optional<ColumnLevelLineageRow> findColumnLevelLineageByDatasetVersionColumnAndInput( | ||
| UUID datasetVersionUuid, String outputColumnName, String inputField); | ||
|
|
||
| @SqlUpdate( | ||
| "INSERT INTO column_level_lineage (" | ||
| + "uuid, " | ||
| + "dataset_version_uuid, " | ||
| + "output_column_name, " | ||
| + "input_field, " | ||
| + "transformation_description, " | ||
| + "transformation_type, " | ||
| + "created_at, " | ||
| + "updated_at" | ||
| + ") VALUES ( " | ||
| + ":uuid, " | ||
| + ":dataset_version_uuid, " | ||
| + ":output_column_name, " | ||
| + ":input_field, " | ||
| + ":transformation_description, " | ||
| + ":transformation_type, " | ||
| + ":now, " | ||
| + ":now) " | ||
| + "ON CONFLICT (dataset_version_uuid, output_column_name, input_field) " | ||
| + "DO UPDATE SET " | ||
| + "input_field = EXCLUDED.input_field, " | ||
| + "transformation_description = EXCLUDED.transformation_description, " | ||
| + "transformation_type = EXCLUDED.transformation_type, " | ||
| + "updated_at = EXCLUDED.updated_at " | ||
| + "RETURNING *") | ||
| void doUpsertColumnLevelLineageRow( | ||
|
mobuchowski marked this conversation as resolved.
Outdated
|
||
| UUID uuid, | ||
| UUID dataset_version_uuid, | ||
| String output_column_name, | ||
| String input_field, | ||
| String transformation_description, | ||
| String transformation_type, | ||
| Instant now); | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
33 changes: 33 additions & 0 deletions
33
api/src/main/java/marquez/db/mappers/ColumnLevelLineageRowMapper.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,33 @@ | ||
| package marquez.db.mappers; | ||
|
|
||
| import static marquez.db.Columns.INPUT_FIELD; | ||
| import static marquez.db.Columns.TRANSFORMATION_DESCRIPTION; | ||
| import static marquez.db.Columns.TRANSFORMATION_TYPE; | ||
| import static marquez.db.Columns.stringOrThrow; | ||
| import static marquez.db.Columns.timestampOrThrow; | ||
| import static marquez.db.Columns.uuidOrThrow; | ||
|
|
||
| import java.sql.ResultSet; | ||
| import java.sql.SQLException; | ||
| import lombok.NonNull; | ||
| import marquez.db.Columns; | ||
| import marquez.db.models.ColumnLevelLineageRow; | ||
| import org.jdbi.v3.core.mapper.RowMapper; | ||
| import org.jdbi.v3.core.statement.StatementContext; | ||
|
|
||
| public class ColumnLevelLineageRowMapper implements RowMapper<ColumnLevelLineageRow> { | ||
|
|
||
| @Override | ||
| public ColumnLevelLineageRow map(@NonNull ResultSet results, @NonNull StatementContext context) | ||
| throws SQLException { | ||
| return new ColumnLevelLineageRow( | ||
| uuidOrThrow(results, Columns.ROW_UUID), | ||
| uuidOrThrow(results, Columns.DATASET_VERSION_UUID), | ||
| stringOrThrow(results, Columns.OUTPUT_COLUMN_NAME), | ||
| stringOrThrow(results, INPUT_FIELD), | ||
| stringOrThrow(results, TRANSFORMATION_DESCRIPTION), | ||
| stringOrThrow(results, TRANSFORMATION_TYPE), | ||
| timestampOrThrow(results, Columns.CREATED_AT), | ||
| timestampOrThrow(results, Columns.UPDATED_AT)); | ||
| } | ||
| } |
23 changes: 23 additions & 0 deletions
23
api/src/main/java/marquez/db/models/ColumnLevelLineageRow.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,23 @@ | ||
| package marquez.db.models; | ||
|
|
||
| import java.time.Instant; | ||
| import java.util.UUID; | ||
| import lombok.AllArgsConstructor; | ||
| import lombok.EqualsAndHashCode; | ||
| import lombok.Getter; | ||
| import lombok.NonNull; | ||
| import lombok.ToString; | ||
|
|
||
| @AllArgsConstructor | ||
| @EqualsAndHashCode | ||
| @ToString | ||
| public class ColumnLevelLineageRow { | ||
| @Getter @NonNull private final UUID uuid; | ||
| @Getter @NonNull private final UUID datasetVersionUuid; | ||
| @Getter @NonNull private final String outputColumnName; | ||
| @Getter @NonNull private final String inputField; | ||
| @Getter @NonNull private final String transformationDescription; | ||
| @Getter @NonNull private final String transformationType; | ||
| @Getter @NonNull private final Instant createdAt; | ||
| @Getter @NonNull private Instant updatedAt; | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
25 changes: 25 additions & 0 deletions
25
api/src/main/resources/marquez/db/migration/V47__column_level_lineage.sql
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,25 @@ | ||
| /* SPDX-License-Identifier: Apache-2.0 */ | ||
|
|
||
| -- DROP TABLE column_level_lineage; | ||
|
|
||
| CREATE TABLE column_level_lineage ( | ||
| uuid uuid primary key, | ||
| dataset_version_uuid uuid REFERENCES dataset_versions(uuid), | ||
| output_column_name VARCHAR(255) NOT NULL, | ||
| input_field VARCHAR(255) NOT NULL, -- reference dataset_fields.uuid | ||
| transformation_description VARCHAR(255) NOT NULL, | ||
| transformation_type VARCHAR(255) NOT NULL, | ||
| created_at TIMESTAMP NOT NULL, | ||
| updated_at TIMESTAMP NOT NULL, | ||
| UNIQUE (dataset_version_uuid, output_column_name, input_field) | ||
| ); | ||
|
|
||
| -- INSERT INTO column_level_lineage (uuid, dataset_version_uuid, output_column_name, input_field, | ||
| -- transformation_description, transformation_type, created_at, | ||
| -- updated_at) | ||
| -- VALUES (md5('whatever')::uuid, md5('dataset_version_uuid_example')::uuid, 'column_a', 'input_field_a', 'Identity transformation', 'IDENTITY', current_timestamp, current_timestamp); | ||
| -- | ||
| -- INSERT INTO column_level_lineage (uuid, dataset_version_uuid, output_column_name, input_field, | ||
| -- transformation_description, transformation_type, created_at, | ||
| -- updated_at) | ||
| -- VALUES (md5('whatever')::uuid, md5('dataset_version_uuid_example')::uuid, 'column_a', 'input_field_b', 'Identity transformation', 'IDENTITY', current_timestamp, current_timestamp); |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.