Corrects the Log migrations, improves import/export by szoupanos · Pull Request #2447 · aiidateam/aiida-core

szoupanos · 2019-02-03T17:18:40Z

This PR improves the DbLog migrations for Django & SQLA that were created for issues #1102 and
#1759 with PR #2393.

More specifically the changes are described in issue #2423.
Also the objuuid was removed and objpk is used.
Import/export was adapted by @CasperWA and log is exported as Node records (with their pk which is recreated during export/import to avoid pk collisions).

We have to see what happens when a node is re-imported with more log entries - We should import the new log entries based on their UUID. A new ticket should be created for this.

coveralls · 2019-02-03T17:41:50Z

Coverage increased (+0.1%) to 69.71% when pulling f4a5757 on szoupanos:issue_1759_vb3 into 309b2f2 on aiidateam:provenance_redesign.

sphuber

A few minor changes and a question

sphuber · 2019-02-04T11:22:46Z

Wouldn't it be better to demand it only be passed in the kwargs or directly as metadata? Who is calling this constructor, only us correct? Think it might be better to make sure of this from the caller side.

sphuber · 2019-02-04T11:23:41Z

Question: why should one use connect here and begin in the actual tests?

sphuber · 2019-02-04T11:32:37Z

dbnode_id is required correct? So should not have None as default

sphuber · 2019-02-04T11:36:44Z

Same, dbnode_id should not have a default

sphuber · 2019-02-04T11:38:43Z

remove default for dbnode_id

sphuber · 2019-02-04T14:47:30Z

@giovannipizzi one thing I just realized is that the report method of the Process class prints the pk of its node in the header of the log message, e.g.:

[122|PwBandsWorkChain|run_relax]: launching PwRelaxWorkChain<124>

In addition, often people (like myself) will use pks in the message itself to reference other nodes. Of course both of these will no longer make sense when the reports are imported into another database. Of course we cannot migrate this since we do not know what part of the string is intended as a pk. However, we probably should make this clear to developers and maybe advice to use UUIDs instead. As for the prefix, that is added by us in the report method. Maybe we should already start using UUIDs there, or make it dynamic in the __str__ of a log record and upon formatting print the pk corresponding to the dbnode_id of the record. I don't think we can/should address these issues in this PR but wanted to have your thoughts

giovannipizzi

A few changes

giovannipizzi · 2019-02-04T16:54:51Z

Now that I get here, I suggest a different approach:

define two utility functions to get the 'Django query' object, and then call .count(), .delete(), .values, ... where appropriate. Now you define that methods for some of them (and you don't always use them) and here you have to write again the query

I can do this, even if I don't find it a big duplication of code and I did it like this in order to be very comparable to what happens at the SQLA migrations (where there are SQL queries and we can not perform what you propose).

giovannipizzi · 2019-02-04T17:27:16Z

Same change as suggested for Django

giovannipizzi · 2019-02-04T17:38:34Z

one thing I just realized is that the report method of the Process class prints the pk of its node in the header of the log message, e.g.:
[122|PwBandsWorkChain|run_relax]: launching PwRelaxWorkChain<124>
In addition, often people (like myself) will use pks in the message itself to reference other nodes. Of course both of these will no longer make sense when the reports are imported into another database. Of course we cannot migrate this since we do not know what part of the string is intended as a pk. However, we probably should make this clear to developers and maybe advice to use UUIDs instead.

Agreed. We can't migrate this and it's up to developers, and good to clarify this.

As for the prefix, that is added by us in the report method. Maybe we should already start using UUIDs there, or make it dynamic in the __str__ of a log record and upon formatting print the pk corresponding to the dbnode_id of the record. I don't think we can/should address these issues in this PR but wanted to have your thoughts

Agreed that we fix this in a later PR. In the DB now things are correct. If this is only about printing in the AiiDA log text file maybe having a UUID is not a big deal, though (and probably more convenient)?

sphuber · 2019-02-04T18:37:47Z

@szoupanos and @giovannipizzi I have addressed most of the comments of Giovanni.

@szoupanos please see if you can address the comments about deduplicating the code in the migrations and making sure that in the reverse operations, the objpk is populated with the dbnode_id to ensure that we do not lose any data. Also the comment about why a transaction is used in the deletion of the non-node log entries I did not answer nor changed the code, please verify that one as well.

Notice that since we have two commits that we want to keep separate, I had to amend the individual commits to affect the changes requested and could not add another commit. Unfortunately then this will make it more difficult to view incremental changes.

sphuber · 2019-02-04T22:13:58Z

@szoupanos if you want to adapt your commit tomorrow, you can use git rebase -i HEAD~2 and then mark your commit as edit and keep Casper's as keep. Then it will stop at your commit for you to make changes. You make the changes, call git add -u and then git commit --amend. When that is successful, finally call git rebase --continue and everything should be good. Then simple force push as usual

giovannipizzi · 2019-02-05T11:12:24Z

@szoupanos I've discussed with @sphuber yesterday. Feel free not to apply my comments where they apply only to code style, considering these are only migrations and not code that has to be maintained.

Instead for the backward migration: as I commented, now we don't prevent going back. Then, we do NOT want to be able to go back in time (not possible, as we delete data) but that if I go forward (applying the migration), then back and then forward again, the data is not destroyed.
This just requires putting back the data in the objpk column, hopefully it's a simple copy-paste and change of code.

szoupanos · 2019-02-05T12:44:25Z

@giovannipizzi I had a short discussion with @sphuber (in order to get the bigger picture because I was a bit lost with the comments)
I will create the backward data migrations. I can also re-create the data of the objname (approximately) by copying the node type which I am going to do.

For code, duplication, since we all agree that it is not a major issue, let's leave the code as it is in order to finish this PR ASAP.

In commit `c0fba2e38557fe1ea535d1dc24076cba99212616`, a migration was introduced for the `DbLog` table in order to support the export and import of log entries. However, this migration was flawed and would lose valuable information. The old `DbLog` was designed to support adding log messages to various entities. Because these entities were not restricted to just nodes, and therefore could live in different database tables, a foreign key could not be used for the relationship, but instead an `objpk` and `objname` column were used to allow multiple table references. However, for the import to know whether an exported log record is already present in the database and therefore should not be imported again, the object pk could not be used as these do remain the same between databases. To this end the migration added a UUID column to the `DbLog` table and dropped the `objpk` column. However, the data was not migrated before doing so and so the connection between logs and their entities was lost. After the faulty commit, we realized that the reason for not using the foreign key was that logs needed to be able to be associated with both nodes as well as legacy workflows, but since support for the latter had already been dropped this was no longer necessary. Given that now only nodes can have associated logs, we can revert the `objpk` to be a proper foreign key and the `objname` can be dropped. Since the logs for legacy workflows (and any other unexpected entities) will be dropped from the database, they should be exported to a file for archival purposed. In summary, the migrations perform the following logical actions: * Export existing logs for legacy worklows and other entities to file * Delete records from legacy workflows and other entities from `dblog` * Create a foreign key `dbnode_id` * Migrate data from `objpk` to `dbnode_id` * Create a new column `uuid` for `DbLog` that is unique * Delete the `objpk` and `ojbname` columns from `DbLog` * Delete `objpk` and `objname` from the `metadata` of `DbLog` records Note that the addition of the unique `uuid` column had to be done in three steps. The problem is that the default value for the UUID is generated by a python function which cannot therefore be done on a database level but the values have to be set after the creation of the column. However, as soon as the column is created the uniqueness constraint is violated since it will be empty for all the existing records. For that reason the migration is split in three steps. In the first, the column is created and set as nullable. The second step will populate the records using the python function to generate the UUIDs and finally, the third migration will alter the column to be unique. Note that for Django this migration could be written in a single file, but for SqlAlchemy it had to be broken into several migrations. This is because of Alembic, the framework used to perform SqlAlchemy migrations, which had trouble with setting the UUIDs after creation of the column in a single transaction. To circumvent this problem, the migration was split into three separate migrations and Alembic is now instructed to use a single transaction per revision, mirroring the migration behavior of Django.

To make the export of `Log` entities possible, the `QueryBuilder` had to be extended to support retrieving log records for a given `Node`. The following changes were applied: * Internal `log_model_class` has been removed and replaced by `orm.Log` * Added support to join node to log (`with_node`) * Added support to join log to node (`with_log`) The `verdi export create` has a new flag to include or exclude the export of logs for nodes that are to be exported, defaulting to include: `--include-logs/--exclude-logs` Documentation has been updated to include new `QueryBuilder` join args in table and the `metadata.json` example has been updated in documentation to include correct Log info.

giovannipizzi

I only added a comment, that I think is important but if you agree please open an issue/another PR and it's ok to merge this in the meantime

giovannipizzi · 2019-02-07T20:58:53Z

        except Exception:
            # Bring back the DB to the correct state if this setup part fails
            self._revert_database_schema()
-            raise


I think we should keep this raise (the same is also done in SQLA)

I will add it in PR #2402

szoupanos requested review from giovannipizzi and sphuber February 3, 2019 17:18

szoupanos force-pushed the issue_1759_vb3 branch from 92f9811 to 2503c72 Compare February 3, 2019 17:40

szoupanos mentioned this pull request Feb 3, 2019

Import of Log entries should be checked for existence based on their UUID #2448

Closed

szoupanos force-pushed the issue_1759_vb3 branch from 2503c72 to a9d0987 Compare February 4, 2019 10:13

sphuber requested changes Feb 4, 2019

View reviewed changes

sphuber force-pushed the issue_1759_vb3 branch 2 times, most recently from c228045 to 476f850 Compare February 4, 2019 14:19

giovannipizzi reviewed Feb 4, 2019

View reviewed changes

sphuber force-pushed the issue_1759_vb3 branch from 476f850 to 82bd001 Compare February 4, 2019 18:33

sphuber force-pushed the issue_1759_vb3 branch 3 times, most recently from 9b53ab4 to 4fea9ae Compare February 4, 2019 22:11

szoupanos force-pushed the issue_1759_vb3 branch 2 times, most recently from 4aae372 to 0c767de Compare February 6, 2019 17:34

szoupanos and others added 2 commits February 7, 2019 15:30

sphuber force-pushed the issue_1759_vb3 branch from 0c767de to f4a5757 Compare February 7, 2019 14:30

giovannipizzi approved these changes Feb 7, 2019

View reviewed changes

sphuber approved these changes Feb 7, 2019

View reviewed changes

sphuber merged commit 93bdb16 into aiidateam:provenance_redesign Feb 7, 2019

This was referenced Feb 7, 2019

Export of Log records #1759

Closed

Wrong migration for DbLog to be fixed #2423

Closed

szoupanos deleted the issue_1759_vb3 branch February 8, 2019 17:37

giovannipizzi mentioned this pull request Feb 13, 2019

Don't fail during the DbLog migration if there are Logs of deleted nodes #2476

Closed

Conversation

szoupanos commented Feb 3, 2019

Uh oh!

coveralls commented Feb 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sphuber left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sphuber commented Feb 4, 2019

Uh oh!

giovannipizzi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

giovannipizzi commented Feb 4, 2019

Uh oh!

sphuber commented Feb 4, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sphuber commented Feb 4, 2019

Uh oh!

giovannipizzi commented Feb 5, 2019

Uh oh!

szoupanos commented Feb 5, 2019

Uh oh!

giovannipizzi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

coveralls commented Feb 3, 2019 •

edited

Loading

sphuber commented Feb 4, 2019 •

edited

Loading