Removed jobs_fqn table and moved FQN into jobs directly in order to enforce unique constraints#2448
Merged
Merged
Conversation
be4dc89 to
0e6f707
Compare
…nforce unique name constraints Signed-off-by: Michael Collado <collado.mike@gmail.com>
0e6f707 to
c81e4ed
Compare
Codecov Report
@@ Coverage Diff @@
## main #2448 +/- ##
============================================
- Coverage 83.61% 83.60% -0.01%
+ Complexity 1214 1213 -1
============================================
Files 231 231
Lines 5522 5520 -2
Branches 266 266
============================================
- Hits 4617 4615 -2
Misses 762 762
Partials 143 143
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
wslulciuc
approved these changes
Mar 14, 2023
Member
There was a problem hiding this comment.
Great work, @collado-mike! Make sure to open a follow up issue to remove the jobs_fqn table, otherwise 👍
7 tasks
7 tasks
jonathanpmoraes
referenced
this pull request
in nubank/NuMarquez
Feb 6, 2025
…nforce unique name constraints (#2448) Signed-off-by: Michael Collado <collado.mike@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The introduction of parent jobs and the
jobs_fqntable intended to allow Marquez to support jobs that had the same name, but were triggered by different parents (e.g., a Spark job fired by different Airflow DAGs). Thejobstable tracked the simple name of the job, while thejobs_fqntable tracked the fully qualified name (FQN). In addition, thejobs_fqntable became responsible for tracking the FQN of symlinked jobs, as it was too expensive to determine the new FQN of a job by following symlinks at query time. Instead, the FQN of a symlinked job is updated when the symlink is created so we return only the FQN of the symlink target rather than the FQN of the original job.Unfortunately, this means that neither the
jobstable nor thejobs_fqntable can enforce the uniqueness constraint we had on the fully qualified name of a job. Thus, in production, we see errors like the following when trying to load a job by its name:In particular, this happens on two occasions when receiving Airflow OpenLineage events:
FAILevent with no start event - the parent facet of the run is omitted, so Marquez creates a job with no parent, but the same FQNFAILevent prior to theSTARTevent - usually, this happens when requests are queued by the load balancer or sometimes when theSTARTevent itself is particularly large and deserializing takes longer than deserializing theFAILevent.Solution
This change eliminates the
job_fqntable and reestablishes the uniqueness constraint on thejobstable'snamecolumn. It also adds asimple_namecolumn to the table, which is used by the view to return the column of the same name. Tests for the two cases mentioned above are added to ensure we can handle Airflow events that omit the parent facet.The
jobs_viewis also updated to omit symlinked jobs so that the read queries no longer have to omit them.aliasesare moved from thejobs_fqntable to thejobstable so old job names can still be found.Checklist
CHANGELOG.mdwith details about your change under the "Unreleased" section (if relevant, depending on the change, this may not be necessary).sqldatabase schema migration according to Flyway's naming convention (if relevant)