Implement symlinks in Marquez

**Problem:** 

We need an ability to store alternative dataset names. For example hive datasets can be identified by their data files` location or metastore uri with database and table.

**Solution in Spec:** 

`SymlinksDatasetFacet` -> https://github.com/OpenLineage/OpenLineage/pull/936

Implementation in Marquez: 

**Model changes:**

 * Create extra dataset_symlink table in Marquez with columns: (symlinkUid, name, namespaceUid, symlinkType)
 * Replace `name` field in `datasets` table with `symlinkUid`

**Implementation follows the proposed DB changes:**

First PR -> reflect current behaviour in modifed schema
 * provide migration SQL for existing instances
 * create a `dataset_symlink` row whenever dataset is created
 * modify SQLs in dataset_version_dao, etc. 

Second PR
 * Extract symlink facet when posting new OpenLineage event
 * fill `dataset_symlink` with multiple entries per OL event. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement symlinks in Marquez #2066

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Implement symlinks in Marquez #2066

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions