Skip to content

Update OpenLineageDao to handle airflow run uuid conflicts#2097

Merged
collado-mike merged 2 commits into
mainfrom
fix/airflow_run_uuid_conflicts
Sep 1, 2022
Merged

Update OpenLineageDao to handle airflow run uuid conflicts#2097
collado-mike merged 2 commits into
mainfrom
fix/airflow_run_uuid_conflicts

Conversation

@collado-mike

Copy link
Copy Markdown
Collaborator

Problem

As described in OpenLineage/OpenLineage#1056, the OpenLineage Airflow integration has been generating conflicting UUIDs based on the DAG name and the DagRun id without accounting for different namespaces. In Marquez installations that have multiple Airflow deployments with duplicated DAG names, we generate jobs whose parents have the wrong namespace.

Solution

While the real root cause fix is in the OpenLineage repo, this fix alleviates the problem for Airflow installations that will continue to publish events with the older OpenLineage library. This checks the namespace of the parent run and verifies that it matches the namespace in the ParentRunFacet. If not, it generates a new parent run id that will be written with the correct namespace. A new test verifies this behavior

Note: All database schema changes require discussion. Please link the issue for context.

Checklist

  • You've signed-off your work
  • Your changes are accompanied by tests (if relevant)
  • Your change contains a small diff and is self-contained
  • You've updated any relevant documentation (if relevant)
  • You've updated the CHANGELOG.md with details about your change under the "Unreleased" section (if relevant, depending on the change, this may not be necessary)
  • You've versioned your .sql database schema migration according to Flyway's naming convention (if relevant)
  • You've included a header in any source code files (if relevant)

Signed-off-by: Michael Collado <collado.mike@gmail.com>
… have parents

Signed-off-by: Michael Collado <collado.mike@gmail.com>
@boring-cyborg boring-cyborg Bot added the api API layer changes label Sep 1, 2022
@collado-mike collado-mike requested review from fm100 and removed request for wslulciuc September 1, 2022 17:03
@collado-mike collado-mike merged commit 27b54ed into main Sep 1, 2022
@collado-mike collado-mike deleted the fix/airflow_run_uuid_conflicts branch September 1, 2022 22:33
jonathanpmoraes referenced this pull request in nubank/NuMarquez Feb 6, 2025
* Update OpenLineageDao to handle airflow run uuid conflicts

Signed-off-by: Michael Collado <collado.mike@gmail.com>

* Update integration tests job names to stop conflicting with jobs that have parents

Signed-off-by: Michael Collado <collado.mike@gmail.com>

Signed-off-by: Michael Collado <collado.mike@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api API layer changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants