Skip to content

adding indexed created_at column to lineage events table#2299

Merged
collado-mike merged 5 commits into
MarquezProject:mainfrom
prachim-collab:feature/lineage_event_created_at_indexed
Dec 12, 2022
Merged

adding indexed created_at column to lineage events table#2299
collado-mike merged 5 commits into
MarquezProject:mainfrom
prachim-collab:feature/lineage_event_created_at_indexed

Conversation

@prachim-collab

@prachim-collab prachim-collab commented Dec 8, 2022

Copy link
Copy Markdown
Contributor

Problem

For analytics use case we need to incrementally copy and export lineage_events to DWH. For this use case there is no good way to identify incrementally created events in database. The current event_time in lineage_events table is client generated and can be back dated.

Closes: #2300

Solution

  1. Proposing to alter lineage_events table to add a new created_at timestamp column.
  2. Also index this new column.
  3. This is a backwards-compatible change
  4. Existing rows in the lineage events table will be assigned null values.
  5. New rows will have default value as DB server's UTC now() which determines the time when event was inserted in lineage_events table.

Note: All database schema changes require discussion. Please link the issue for context.

Checklist

I have tested by deploying the changes in my local database and migration script ran successfully on top of existing schema.

  • You've signed-off your work
  • Your changes are accompanied by tests (if relevant)
  • Your change contains a small diff and is self-contained
  • You've updated any relevant documentation (if relevant)
  • You've updated the CHANGELOG.md with details about your change under the "Unreleased" section (if relevant, depending on the change, this may not be necessary)
  • You've versioned your .sql database schema migration according to Flyway's naming convention (if relevant)
  • You've included a header in any source code files (if relevant)

@boring-cyborg boring-cyborg Bot added the api API layer changes label Dec 8, 2022
@codecov

codecov Bot commented Dec 8, 2022

Copy link
Copy Markdown

Codecov Report

Merging #2299 (dc70917) into main (e501e36) will not change coverage.
The diff coverage is n/a.

@@            Coverage Diff            @@
##               main    #2299   +/-   ##
=========================================
  Coverage     77.01%   77.01%           
  Complexity     1166     1166           
=========================================
  Files           222      222           
  Lines          5307     5307           
  Branches        424      424           
=========================================
  Hits           4087     4087           
  Misses          747      747           
  Partials        473      473           

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@prachim-collab prachim-collab force-pushed the feature/lineage_event_created_at_indexed branch from c495a52 to 5d3185c Compare December 8, 2022 06:08
@wslulciuc

wslulciuc commented Dec 8, 2022

Copy link
Copy Markdown
Member

@prachim-collab, having a timestamp for when an OL event was received on the server is going to be very helpful. Mind also opening an issues to link to your PR about the timestamp usage and any changes to relevant APIs?

@prachim-collab

Copy link
Copy Markdown
Contributor Author

@prachim-collab, having a timestamp for when an OL event was received on the server is going to be very helpful. Mind also opening an issues to link to your PR about the timestamp usage and any changes to relevant APIs?

@wslulciuc I have created this new issue #2304

@prachim-collab prachim-collab force-pushed the feature/lineage_event_created_at_indexed branch from 6c15266 to 4897ff1 Compare December 9, 2022 01:08
@prachim-collab prachim-collab marked this pull request as ready for review December 9, 2022 01:35
@@ -0,0 +1,9 @@
ALTER TABLE lineage_events ADD created_at TIMESTAMP;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIMESTAMP WITH TIME ZONE. I think it's good practice to just always include the time zone in the timestamp.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, i kept it as TIMESTAMP following other created_at columns in the schema. But i will update this one to TIMESTAMP WITH TIME ZONE.

Additionally , i was before backfilling old entries with event_time. I have removed that step now because it might be a costly operation during migration.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the type to TIMESTAMP WITH TIME ZONE

Signed-off-by: Prachi Mishra <prachi.mishra@astronomer.io>
Signed-off-by: Prachi Mishra <prachi.mishra@astronomer.io>
Signed-off-by: Prachi Mishra <prachi.mishra@astronomer.io>
Signed-off-by: Prachi Mishra <prachi.mishra@astronomer.io>
Signed-off-by: Prachi Mishra <prachi.mishra@astronomer.io>
@prachim-collab prachim-collab force-pushed the feature/lineage_event_created_at_indexed branch from 53b5cf7 to dc70917 Compare December 12, 2022 21:09
@collado-mike collado-mike merged commit b1ff80e into MarquezProject:main Dec 12, 2022
jonathanpmoraes referenced this pull request in nubank/NuMarquez Feb 6, 2025
* adding indexed created_at column to lineage events table

Signed-off-by: Prachi Mishra <prachi.mishra@astronomer.io>

* populate default values in lineage_events.created_at column

Signed-off-by: Prachi Mishra <prachi.mishra@astronomer.io>

* removing backfill step

Signed-off-by: Prachi Mishra <prachi.mishra@astronomer.io>

* altering type to TIMESTAMP WITH TIME ZONE

Signed-off-by: Prachi Mishra <prachi.mishra@astronomer.io>

* adding NULLS LAST in desc index on lineage_events.created_at

Signed-off-by: Prachi Mishra <prachi.mishra@astronomer.io>

Signed-off-by: Prachi Mishra <prachi.mishra@astronomer.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api API layer changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add created_at indexed column to Lineage events table

3 participants