-
Notifications
You must be signed in to change notification settings - Fork 403
Add db retention support #2486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Add db retention support #2486
Changes from all commits
Commits
Show all changes
60 commits
Select commit
Hold shift + click to select a range
c61b581
Add db migration to add cascade deletion on `fk`s
wslulciuc 4612850
Add `DbDataRetention` and `dataRetentionInDays` config
wslulciuc a9372bf
Add `DbRetentionJob`
wslulciuc 437a23c
Add `DbRetentionCommand`
wslulciuc 77db993
Add `frequencyMins` config for runs and rename `dbRetentionInDays`
wslulciuc 6f3014b
Add docs to `DbRetentionJob` and minor renaming
wslulciuc 7943a52
Wrap `DbRetention.retentionOnDbOrError()` in `try/catch`
wslulciuc 8dad10a
Add docs to DbRetention
wslulciuc 7be8ea1
continued: Add docs to `DbRetention`
wslulciuc bdf8e39
Add handling of `errorOnDbRetention`
wslulciuc 944c948
Add docs to `DbException` and `DbRetentionException`
wslulciuc 9fde898
`info` -> `debug` when inserting column lineage
wslulciuc d8e1194
Remove `dbRetention.enabled`
wslulciuc f6e4c46
Update handling of `StatementException`
wslulciuc 63a85e0
Minor changes
wslulciuc cec3982
Add `docs/faq.md`
wslulciuc f52dea8
continued: `Add docs/faq.md`
wslulciuc 71aa297
continued: Add `docs/faq.md`
wslulciuc 8e51759
continued: Add `docs/faq.md`
wslulciuc 01dc8b1
Define `DEFAULT_RETENTION_DAYS` constant in `DbRetention`
wslulciuc 4933749
Make chunk size in retention query configurable
wslulciuc cd07192
Remove `DATA_RETENTION_IN_DAYS` from `MarquezConfig`
wslulciuc 0f6f58d
Update docs for chunk size config
wslulciuc 5ad928a
Remove error log from `DbRetention.retentionOnDbOrError()`
wslulciuc 3aa24e9
Use `LOOP` for retention
wslulciuc 5136789
continued: Use `LOOP` for retention
wslulciuc e074606
Use `numberOfRowsPerBatch`
wslulciuc f217710
Use `--number-of-rows-per-batch`
wslulciuc a290e05
Add pause to prevent lock timeouts
wslulciuc 1785b27
Add `FOR UPDATE SKIP LOCKED`
wslulciuc 127e1bd
Add `sql()`
wslulciuc bd53f80
Merge branch 'main' into feature/db-retention
fd7aa35
Add `--dry-run`
wslulciuc a596291
Add `jdbi3-testcontainers`
wslulciuc 1bf38ef
Merge branch 'feature/db-retention' of github.com:MarquezProject/marq…
wslulciuc 85ff9bf
Remove shortened flag args
wslulciuc 4c5c407
Use `marquez.db.DbRetention.DEFAULT_DRY_RUN`
wslulciuc 84be539
Add DbRetention.retentionOnRuns()
wslulciuc 0289277
Add `DbMigration.migrateDbOrError(DataSource)`
wslulciuc 5a7cc75
Add `TestingDb`
wslulciuc 4882570
Add `DbTest`
wslulciuc 92e3e63
Add `testRetentionOnDbOrError_withDatasetsOlderThanXDays()`
wslulciuc 4237853
Remove `jobs.DbRetentionConfig.dryRun`
wslulciuc 8a83639
Add `--dry-run` option to `faq.md`
wslulciuc fe55341
continued: Add --dry-run option to faq.md
wslulciuc 2aee600
continued: `Add testRetentionOnDbOrError_withDatasetsOlderThanXDays`
wslulciuc e58a8fa
Fix retention query for datasets and dataset versions
wslulciuc 53e3783
Add test for retention on dataset versions
wslulciuc b4c9c2d
Add comments to tests
wslulciuc d9c7fc0
Add `testRetentionOnDbOrErrorWithDatasetVersionsOlderThanXDays_skipIf…
wslulciuc 96735f8
Add `testRetentionOnDbOrErrorWithJobsOlderThanXDays()`
wslulciuc 3ad251e
Add `testRetentionOnDbOrErrorWithJobVersionsOlderThanXDays()`
wslulciuc f554e7e
Add tests for dry run
wslulciuc eabdee9
Add testRetentionOnDbOrErrorWithRunsOlderThanXDays()
wslulciuc 476dd2a
Add `testRetentionOnDbOrErrorWithOlEventsOlderThanXDays()`
wslulciuc c2eda14
continued: `Add testRetentionOnDbOrErrorWithOlEventsOlderThanXDays()`
wslulciuc 9dd1db3
Add `javadocs` to `DbRetention`
wslulciuc d8b31aa
Run tests in order of retention
wslulciuc 33d390a
Merge branch 'main' into feature/db-retention
wslulciuc fd42414
Use `V63` for cascade delete migration
wslulciuc File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,114 @@ | ||
| /* | ||
| * Copyright 2018-2023 contributors to the Marquez project | ||
| * SPDX-License-Identifier: Apache-2.0 | ||
| */ | ||
|
|
||
| package marquez.cli; | ||
|
|
||
| import static marquez.db.DbRetention.DEFAULT_DRY_RUN; | ||
| import static marquez.db.DbRetention.DEFAULT_NUMBER_OF_ROWS_PER_BATCH; | ||
| import static marquez.db.DbRetention.DEFAULT_RETENTION_DAYS; | ||
|
|
||
| import io.dropwizard.cli.ConfiguredCommand; | ||
| import io.dropwizard.db.DataSourceFactory; | ||
| import io.dropwizard.db.ManagedDataSource; | ||
| import io.dropwizard.setup.Bootstrap; | ||
| import lombok.NonNull; | ||
| import lombok.extern.slf4j.Slf4j; | ||
| import marquez.MarquezConfig; | ||
| import marquez.db.DbRetention; | ||
| import marquez.db.exceptions.DbRetentionException; | ||
| import net.sourceforge.argparse4j.impl.Arguments; | ||
| import net.sourceforge.argparse4j.inf.Namespace; | ||
| import org.jdbi.v3.core.Jdbi; | ||
| import org.jdbi.v3.postgres.PostgresPlugin; | ||
|
|
||
| /** | ||
| * A command to apply a one-off ad-hoc retention policy directly to source, dataset, and job | ||
| * metadata collected by Marquez. | ||
| * | ||
| * <h2>Usage</h2> | ||
| * | ||
| * For example, to override the {@code retention-days}: | ||
| * | ||
| * <pre>{@code | ||
| * java -jar marquez-api.jar db-retention --retention-days 30 marquez.yml | ||
| * }</pre> | ||
| */ | ||
| @Slf4j | ||
| public class DbRetentionCommand extends ConfiguredCommand<MarquezConfig> { | ||
| private static final String DB_SOURCE_NAME = "ad-hoc-db-retention-source"; | ||
|
|
||
| /* Args for 'db-retention' command. */ | ||
| private static final String CMD_ARG_NUMBER_OF_ROWS_PER_BATCH = "numberOfRowsPerBatch"; | ||
| private static final String CMD_ARG_RETENTION_DAYS = "retentionDays"; | ||
| private static final String CMD_ARG_DRY_RUN = "dryRun"; | ||
|
|
||
| /* Define 'db-retention' command. */ | ||
| public DbRetentionCommand() { | ||
| super("db-retention", "apply one-off ad-hoc retention policy directly to database"); | ||
| } | ||
|
|
||
| @Override | ||
| public void configure(@NonNull net.sourceforge.argparse4j.inf.Subparser subparser) { | ||
| super.configure(subparser); | ||
| // Arg '--number-of-rows-per-batch' | ||
| subparser | ||
| .addArgument("--number-of-rows-per-batch") | ||
| .dest(CMD_ARG_NUMBER_OF_ROWS_PER_BATCH) | ||
| .type(Integer.class) | ||
| .required(false) | ||
| .setDefault(DEFAULT_NUMBER_OF_ROWS_PER_BATCH) | ||
| .help("the number of rows deleted per batch"); | ||
| // Arg '--retention-days' | ||
| subparser | ||
| .addArgument("--retention-days") | ||
| .dest(CMD_ARG_RETENTION_DAYS) | ||
| .type(Integer.class) | ||
| .required(false) | ||
| .setDefault(DEFAULT_RETENTION_DAYS) | ||
| .help("the number of days to retain metadata"); | ||
| // Arg '--dry-run' | ||
| subparser | ||
| .addArgument("--dry-run") | ||
| .dest(CMD_ARG_DRY_RUN) | ||
| .type(Boolean.class) | ||
| .required(false) | ||
| .setDefault(DEFAULT_DRY_RUN) | ||
| .action(Arguments.storeTrue()) | ||
| .help( | ||
| "only output an estimate of metadata deleted by the retention policy, " | ||
| + "without applying the policy on database"); | ||
| } | ||
|
|
||
| @Override | ||
| protected void run( | ||
| @NonNull Bootstrap<MarquezConfig> bootstrap, | ||
| @NonNull Namespace namespace, | ||
| @NonNull MarquezConfig config) | ||
| throws Exception { | ||
| final int numberOfRowsPerBatch = namespace.getInt(CMD_ARG_NUMBER_OF_ROWS_PER_BATCH); | ||
| final int retentionDays = namespace.getInt(CMD_ARG_RETENTION_DAYS); | ||
| final boolean dryRun = namespace.getBoolean(CMD_ARG_DRY_RUN); | ||
|
|
||
| // Configure connection. | ||
| final DataSourceFactory sourceFactory = config.getDataSourceFactory(); | ||
| final ManagedDataSource source = | ||
| sourceFactory.build(bootstrap.getMetricRegistry(), DB_SOURCE_NAME); | ||
|
|
||
| // Open connection. | ||
| final Jdbi jdbi = Jdbi.create(source); | ||
| jdbi.installPlugin(new PostgresPlugin()); // Add postgres support. | ||
|
|
||
| try { | ||
| // Attempt to apply a database retention policy. An exception is thrown on failed retention | ||
| // policy attempts requiring we handle the throwable and log the error. | ||
| DbRetention.retentionOnDbOrError(jdbi, numberOfRowsPerBatch, retentionDays, dryRun); | ||
| } catch (DbRetentionException errorOnDbRetention) { | ||
| log.error( | ||
| "Failed to apply retention policy of '{}' days to database!", | ||
| retentionDays, | ||
| errorOnDbRetention); | ||
| } | ||
| } | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.