Skip to content

Only first job context is taken into consideration. #2230

@JDarDagran

Description

@JDarDagran

Problem

Job context is a structure that serves as code location / SQL container to show them in Marquez UI. Job context upsert takes only checksum on context's body on conflict. This means that when e.g. at the start and the end of job the context is different there would be 2 different entries in job_context table for this job. That still might be ok, however this has its result in exposing in API only first captured context which means if you don't send SqlJobFacet in the START event you won't see it even if you send it in the COMPLETE event.

Solutions

I foresee couple of ways to solve this problem:

  1. Update job_context_uuid when upserting into runs table. This will result in getting only most recent context exposed which might be acceptable but probably not.
  2. Add some custom logic to merge arrays when context relates to the same run (or job?).
  3. Merge contexts in API. This would change run <--> job_context relation to 1-to-many.
  4. Change structure of job_contexts table: replace context column with 3 following: code_location_type, code_location_url, sql which would be filled on upsert. Some concatenation would still be needed probably.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions