Skip to content

Removed jobs_fqn table and moved FQN into jobs directly in order to enforce unique constraints#2448

Merged
wslulciuc merged 1 commit into
mainfrom
fix/unique_job_fqn
Mar 14, 2023
Merged

Removed jobs_fqn table and moved FQN into jobs directly in order to enforce unique constraints#2448
wslulciuc merged 1 commit into
mainfrom
fix/unique_job_fqn

Conversation

@collado-mike

Copy link
Copy Markdown
Collaborator

Problem

The introduction of parent jobs and the jobs_fqn table intended to allow Marquez to support jobs that had the same name, but were triggered by different parents (e.g., a Spark job fired by different Airflow DAGs). The jobs table tracked the simple name of the job, while the jobs_fqn table tracked the fully qualified name (FQN). In addition, the jobs_fqn table became responsible for tracking the FQN of symlinked jobs, as it was too expensive to determine the new FQN of a job by following symlinks at query time. Instead, the FQN of a symlinked job is updated when the symlink is created so we return only the FQN of the symlink target rather than the FQN of the original job.

Unfortunately, this means that neither the jobs table nor the jobs_fqn table can enforce the uniqueness constraint we had on the fully qualified name of a job. Thus, in production, we see errors like the following when trying to load a job by its name:

java.lang.IllegalStateException: Multiple values for optional: ['JobRow(uuid=b971d547-ea9d-44c1-908f-9dcc14faba98, type=BATCH, createdAt=2022-12-10T10:01:56.653991Z, updatedAt=2022-12-10T10:01:56.653991Z, namespaceName=...']

In particular, this happens on two occasions when receiving Airflow OpenLineage events:

  1. We receive a FAIL event with no start event - the parent facet of the run is omitted, so Marquez creates a job with no parent, but the same FQN
  2. We receive a FAIL event prior to the START event - usually, this happens when requests are queued by the load balancer or sometimes when the START event itself is particularly large and deserializing takes longer than deserializing the FAIL event.

Solution

This change eliminates the job_fqn table and reestablishes the uniqueness constraint on the jobs table's name column. It also adds a simple_name column to the table, which is used by the view to return the column of the same name. Tests for the two cases mentioned above are added to ensure we can handle Airflow events that omit the parent facet.

The jobs_view is also updated to omit symlinked jobs so that the read queries no longer have to omit them. aliases are moved from the jobs_fqn table to the jobs table so old job names can still be found.

Checklist

  • You've signed-off your work
  • Your changes are accompanied by tests (if relevant)
  • Your change contains a small diff and is self-contained
  • You've updated any relevant documentation (if relevant)
  • You've updated the CHANGELOG.md with details about your change under the "Unreleased" section (if relevant, depending on the change, this may not be necessary)
  • You've versioned your .sql database schema migration according to Flyway's naming convention (if relevant)
  • You've included a header in any source code files (if relevant)

@collado-mike collado-mike requested a review from wslulciuc March 6, 2023 19:52
@boring-cyborg boring-cyborg Bot added the api API layer changes label Mar 6, 2023
@collado-mike collado-mike force-pushed the fix/unique_job_fqn branch 2 times, most recently from be4dc89 to 0e6f707 Compare March 6, 2023 23:48
…nforce unique name constraints

Signed-off-by: Michael Collado <collado.mike@gmail.com>
@codecov

codecov Bot commented Mar 7, 2023

Copy link
Copy Markdown

Codecov Report

Merging #2448 (c81e4ed) into main (8d28ed5) will decrease coverage by 0.01%.
The diff coverage is 100.00%.

@@             Coverage Diff              @@
##               main    #2448      +/-   ##
============================================
- Coverage     83.61%   83.60%   -0.01%     
+ Complexity     1214     1213       -1     
============================================
  Files           231      231              
  Lines          5522     5520       -2     
  Branches        266      266              
============================================
- Hits           4617     4615       -2     
  Misses          762      762              
  Partials        143      143              
Impacted Files Coverage Δ
api/src/main/java/marquez/db/JobDao.java 100.00% <ø> (ø)
api/src/main/java/marquez/db/RunDao.java 92.40% <ø> (ø)
api/src/main/java/marquez/api/JobResource.java 93.05% <100.00%> (ø)
api/src/main/java/marquez/db/OpenLineageDao.java 96.29% <100.00%> (-0.02%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@wslulciuc wslulciuc left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, @collado-mike! Make sure to open a follow up issue to remove the jobs_fqn table, otherwise 👍

@wslulciuc wslulciuc merged commit 402babd into main Mar 14, 2023
@wslulciuc wslulciuc deleted the fix/unique_job_fqn branch March 14, 2023 22:37
jonathanpmoraes referenced this pull request in nubank/NuMarquez Feb 6, 2025
…nforce unique name constraints (#2448)

Signed-off-by: Michael Collado <collado.mike@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api API layer changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants