Corrects the Log migrations, improves import/export#2447
Conversation
92f9811 to
2503c72
Compare
2503c72 to
a9d0987
Compare
sphuber
left a comment
There was a problem hiding this comment.
A few minor changes and a question
There was a problem hiding this comment.
Wouldn't it be better to demand it only be passed in the kwargs or directly as metadata? Who is calling this constructor, only us correct? Think it might be better to make sure of this from the caller side.
There was a problem hiding this comment.
Question: why should one use connect here and begin in the actual tests?
There was a problem hiding this comment.
dbnode_id is required correct? So should not have None as default
There was a problem hiding this comment.
Same, dbnode_id should not have a default
There was a problem hiding this comment.
remove default for dbnode_id
c228045 to
476f850
Compare
|
@giovannipizzi one thing I just realized is that the In addition, often people (like myself) will use pks in the message itself to reference other nodes. Of course both of these will no longer make sense when the reports are imported into another database. Of course we cannot migrate this since we do not know what part of the string is intended as a pk. However, we probably should make this clear to developers and maybe advice to use UUIDs instead. As for the prefix, that is added by us in the |
There was a problem hiding this comment.
Now that I get here, I suggest a different approach:
- define two utility functions to get the 'Django query' object, and then call
.count(),.delete(),.values, ... where appropriate. Now you define that methods for some of them (and you don't always use them) and here you have to write again the query
There was a problem hiding this comment.
I can do this, even if I don't find it a big duplication of code and I did it like this in order to be very comparable to what happens at the SQLA migrations (where there are SQL queries and we can not perform what you propose).
There was a problem hiding this comment.
Same change as suggested for Django
Agreed. We can't migrate this and it's up to developers, and good to clarify this.
Agreed that we fix this in a later PR. In the DB now things are correct. If this is only about printing in the AiiDA log text file maybe having a UUID is not a big deal, though (and probably more convenient)? |
476f850 to
82bd001
Compare
|
@szoupanos and @giovannipizzi I have addressed most of the comments of Giovanni. @szoupanos please see if you can address the comments about deduplicating the code in the migrations and making sure that in the reverse operations, the Notice that since we have two commits that we want to keep separate, I had to amend the individual commits to affect the changes requested and could not add another commit. Unfortunately then this will make it more difficult to view incremental changes. |
9b53ab4 to
4fea9ae
Compare
|
@szoupanos if you want to adapt your commit tomorrow, you can use |
|
@szoupanos I've discussed with @sphuber yesterday. Feel free not to apply my comments where they apply only to code style, considering these are only migrations and not code that has to be maintained. Instead for the backward migration: as I commented, now we don't prevent going back. Then, we do NOT want to be able to go back in time (not possible, as we delete data) but that if I go forward (applying the migration), then back and then forward again, the data is not destroyed. |
|
@giovannipizzi I had a short discussion with @sphuber (in order to get the bigger picture because I was a bit lost with the comments) For code, duplication, since we all agree that it is not a major issue, let's leave the code as it is in order to finish this PR ASAP. |
4aae372 to
0c767de
Compare
In commit `c0fba2e38557fe1ea535d1dc24076cba99212616`, a migration was introduced for the `DbLog` table in order to support the export and import of log entries. However, this migration was flawed and would lose valuable information. The old `DbLog` was designed to support adding log messages to various entities. Because these entities were not restricted to just nodes, and therefore could live in different database tables, a foreign key could not be used for the relationship, but instead an `objpk` and `objname` column were used to allow multiple table references. However, for the import to know whether an exported log record is already present in the database and therefore should not be imported again, the object pk could not be used as these do remain the same between databases. To this end the migration added a UUID column to the `DbLog` table and dropped the `objpk` column. However, the data was not migrated before doing so and so the connection between logs and their entities was lost. After the faulty commit, we realized that the reason for not using the foreign key was that logs needed to be able to be associated with both nodes as well as legacy workflows, but since support for the latter had already been dropped this was no longer necessary. Given that now only nodes can have associated logs, we can revert the `objpk` to be a proper foreign key and the `objname` can be dropped. Since the logs for legacy workflows (and any other unexpected entities) will be dropped from the database, they should be exported to a file for archival purposed. In summary, the migrations perform the following logical actions: * Export existing logs for legacy worklows and other entities to file * Delete records from legacy workflows and other entities from `dblog` * Create a foreign key `dbnode_id` * Migrate data from `objpk` to `dbnode_id` * Create a new column `uuid` for `DbLog` that is unique * Delete the `objpk` and `ojbname` columns from `DbLog` * Delete `objpk` and `objname` from the `metadata` of `DbLog` records Note that the addition of the unique `uuid` column had to be done in three steps. The problem is that the default value for the UUID is generated by a python function which cannot therefore be done on a database level but the values have to be set after the creation of the column. However, as soon as the column is created the uniqueness constraint is violated since it will be empty for all the existing records. For that reason the migration is split in three steps. In the first, the column is created and set as nullable. The second step will populate the records using the python function to generate the UUIDs and finally, the third migration will alter the column to be unique. Note that for Django this migration could be written in a single file, but for SqlAlchemy it had to be broken into several migrations. This is because of Alembic, the framework used to perform SqlAlchemy migrations, which had trouble with setting the UUIDs after creation of the column in a single transaction. To circumvent this problem, the migration was split into three separate migrations and Alembic is now instructed to use a single transaction per revision, mirroring the migration behavior of Django.
To make the export of `Log` entities possible, the `QueryBuilder` had to be extended to support retrieving log records for a given `Node`. The following changes were applied: * Internal `log_model_class` has been removed and replaced by `orm.Log` * Added support to join node to log (`with_node`) * Added support to join log to node (`with_log`) The `verdi export create` has a new flag to include or exclude the export of logs for nodes that are to be exported, defaulting to include: `--include-logs/--exclude-logs` Documentation has been updated to include new `QueryBuilder` join args in table and the `metadata.json` example has been updated in documentation to include correct Log info.
0c767de to
f4a5757
Compare
giovannipizzi
left a comment
There was a problem hiding this comment.
I only added a comment, that I think is important but if you agree please open an issue/another PR and it's ok to merge this in the meantime
| except Exception: | ||
| # Bring back the DB to the correct state if this setup part fails | ||
| self._revert_database_schema() | ||
| raise |
There was a problem hiding this comment.
I think we should keep this raise (the same is also done in SQLA)
This PR improves the DbLog migrations for Django & SQLA that were created for issues #1102 and
#1759 with PR #2393.
More specifically the changes are described in issue #2423.
Also the objuuid was removed and objpk is used.
Import/export was adapted by @CasperWA and log is exported as Node records (with their pk which is recreated during export/import to avoid pk collisions).
We have to see what happens when a node is re-imported with more log entries - We should import the new log entries based on their UUID. A new ticket should be created for this.