ORM: Use pydantic to specify a schema for each ORM entity#6255
Conversation
3848afd to
c253259
Compare
c253259 to
fc52f41
Compare
49379d9 to
d1659b4
Compare
a7a6065 to
069bd1f
Compare
655fcf0 to
d778da8
Compare
07f3001 to
054c43c
Compare
054c43c to
1c65856
Compare
7c7ef64 to
1e7f536
Compare
|
@sphuber just wondering, have you checked how this affects performance? |
This is implemented following the pydantic PR for aiida-core: aiidateam/aiida-core#6255
| def serialize_computer(self, computer: Computer, _info): | ||
| return computer.label | ||
|
|
||
| # @model_validator(mode='after') # type: ignore[misc] |
There was a problem hiding this comment.
Seems like some code that was forgotten
There was a problem hiding this comment.
Thanks for checking out this PR @agoscinski . For some background info: I commented this out on purpose for now. Ideally, this check should be there because when storing the code, this condition should be satisfied. Unfortunately, however, this now also prevents just instantiating this model that validates the check, even if we don't plan on storing it. This is problematic for the export behavior, because that would rely on instantiating a Model that violates this uniqueness rules, even though in this case it doesn't apply.
The old CLI verdi code create core.code.installed would also rely on something like this check to immediately return an error as soon as label was specified in interactive mode and that label already existed for the same computer. It would then reprompt the user asking for a different label. I haven't been able to reimplement this feature in this PR so was keeping the comment as a reminder.
Safe to say that this PR is not completely done yet 😅
There was a problem hiding this comment.
Is this still "in progress"?
There was a problem hiding this comment.
@sphuber I'm looking at this now. Trying to probe the issue you mention in this thread. I did the following:
- Create a
hello_world@localhostinstalled code using verdi - Load the code and serialize it
- Create an
InstalledCodeinstance from 2
I expected an issue here, asfrom_serializedleverages the model, so the validator should've been triggered, raising an exception, as the label + computer already exist! No exception though.
As for your comment regarding the CLI, are you saying a similar validation should be implemented there? Sounds like it, but not in this PR, right?
There was a problem hiding this comment.
Summarizing a recent meeting with @sphuber:
- The currently commented out validator on the
Code.Modelis intended as a check on the uniqueness ofCodelabel+computer.labeltuples - However, one use of the pydantic
Modelof a code, the dynamic creation of the interactive options ofverdi code create..., requires theModelonly for access to the available properties. Unfortunately, the validator will fire on instantiation from the serialized code, which will conflict with the uniqueness constraint, making it impossible to use theModelin this case - The move to pydantic
Models for this CLI application regresses the user experience in the sense that it does not fire the validator until after the user finishes the interactive part, even though the tuple is already available within the first three options (note that this is not tied to this PR, as the use of pydanticModels for this case was implemented in an earlier PR)
The CLI matter is not blocking for this PR, as I understand it. A solution can be explored in a follow-up PR.
There was a problem hiding this comment.
Okay. Thanks for clarifying this matter 🙏
There was a problem hiding this comment.
Hmm, I think maybe there's another potentially critical issue here.
I tried reconstructing an installed code instance. With the validation active on the model, I first need to make the above change (and related changes) to Entity.to_model(). This allows me to serialize the code. However, deserializing it using orm.InstalledCode.from_serialized(**code.serialize()) also raises the validation error. But here we shouldn't assume pre-validated input, so we can't use the model_construct solution in from_serialized.
I'm not sure how to get around this matter and allow the model validator. A REST API receiving a a serialized Code instance cannot assume the data is valid. In the case that the data came from the same AiiDA instance with which the REST API is linked, the validation will surely object.
@sphuber am I overlooking something?
update
Thinking on this some more, "real" use cases of REST API node transfer will likely be between AiiDA instances, so the collision of the same full code label should not happen? But even if this is true, a specialized warning would be better. Not sure where to implement it though.
There was a problem hiding this comment.
Following up on this, if no strong opinions regarding specialized warnings, I believe this matter is closed.
There was a problem hiding this comment.
I am not sure what your conclusion is @edan-bainglass but my understanding is that you would not replace the constructor to model_construct which I agree with. Since this would skip all validation as well as any resolving of nested members. It seems quite clear from the whole conversation that validations related to database relationships (like in this case uniqueness) should not be part of the pydantic model (at least in AiiDA which has so far separated the storage from the initialization) but should be moved to another level. If we agree on this we can remove this validation with referencing an issue about the CLI validation.
EDIT: Note that this PR establishes the serialization of objects through their unique identifier (node --serialization--> pk) which relies on the fact on deserialization that the node is present in the database. Such validation checks would break any of these cases. I think this just enforces the argument that such validations should not be in the pydantic model and rather be implemented on endpoints such as the CLI that are responsible for initialization and storing of nodes.
There was a problem hiding this comment.
@agoscinski getting a chance to look at this once more as part of the REST API work. Regarding the use of model_construct, as I see it now, it seems "safe" to use it in Entity.to_model, as there is no need for validation here. The entity is already in the DB. We SHOULD NOT use it for deserialization (construction from model), which we are not.
1e7f536 to
66f6096
Compare
There was a problem hiding this comment.
Is it really necessary to make orm pluggable, what is the use case of it?
There was a problem hiding this comment.
@sphuber correct me if I'm wrong, but this has to do with the use of orm_class, right? Under the implementation section of the AEP, it states that orm_class is a shortcut to model_to_orm used to define a model field that points to an ORM instance. So instead of
class Model(...):
user = MetadataField(
...,
model_to_orm=lambda pk: User.collection.get(id=pk),
)you can do
class Model(...):
user = MetadataField(
...,
orm_class=User,
)The various ORM classes that may be used in orm_class are then defined in project.entry-points.'aiida.orm', so that you can do in Entity.model_to_orm_field_values(...):
if orm_class := get_metadata(field, 'orm_class'):
if isinstance(orm_class, str):
orm_class = BaseFactory('aiida.orm', orm_class)
fields[key] = orm_class.collection.get(id=field_value)But then I guess my question is, do these not already have entry points? Or is the problem that you can't fetch them all generically using the BaseFactory?
a7c6c13 to
9816200
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #6255 +/- ##
==========================================
+ Coverage 78.29% 78.44% +0.15%
==========================================
Files 566 567 +1
Lines 42766 43063 +297
==========================================
+ Hits 33481 33776 +295
- Misses 9285 9287 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
It would be nice to have some benchmarks, |
This is implemented following the pydantic PR for aiida-core: aiidateam/aiida-core#6255
try to fix filepath_files by introducing byte storage for (de)serialization make filepath_files default None also in pydantic model [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci wip different approach including repo export add tmp_path update regression test cleanup code, introduce now path as second variable put to_model, from_model private, fix missing repository_path add serialization tests mark serialize and from_serialized as private Revert "mark serialize and from_serialized as private" This reverts commit b09a7b9. Throw warning when serialization interface is used Add tests for codes apply review after talking with geigerj2 apply review after talking with geigerj2 fix regression tests Add path support for filepath_executable remove code that was purely for debugging add conversion of path to str to support path for filepath_executable use py3.9 compatible replacement for Path.walk
Issues has been solved in aiidateam#6824 and split to other issues since it needs a solution outside of the pydantic model.
aa70388 to
eb93bf2
Compare
For now only `FilePath` has been added and replaced everywhere.
This was a misunderstanding so far: while the object initializer supports path as input, we store the path in the database always as string. Since the pydantic model is used for query results to the database we need to specify the type of the object in the database.
for more information, see https://pre-commit.ci
|
We solved the problems with the CI I mentioned in my last post in #6824 by using the same approach as in the export to yaml functionality in While working on this PR I consistently find subtle issues that indicate to me that the feature is not stable yet, but do not prevent users to try it out out. Therefore I put a warning when the serialize function is used that this is an experimental feature and is likely to break in future releases. I created an issue to track the overall progress of the model integration #6842. From my side with these changes this PR is mergable. |
|
Thanks @agoscinski. This looks great! The new |
…6255) This PR addresses [AEP 010](https://github.com/aiidateam/AEP/blob/983a645c9285ba65c7cf07fe6064c23e7e994c06/010_orm_schema/readme.md) by introducing a schema interface for AiiDA's ORM entities using `pydantic`. The goal is to make the structure and content of ORM classes introspectable and serializable, enabling external applications—such as web APIs or CLIs—to interface with AiiDA in a programmatic, schema-driven way. Each core ORM class now defines a `Model` subclass that exposes its fields using rich metadata annotations. These models support conversion to and from ORM instances (`_from_model`, `_to_model`), as well as a user API yielding JSON-compatible formats (`from_serialized`, `serialize`). The serialization mechanism is experimental and is stated to the user as such on use of the serialization methods. In the process of implementing the original proposal, the `MetadataField` utility was extended with additional metadata such as `exclude_to_orm`, `exclude_from_cli`, `orm_class`, `orm_to_model`, and `model_to_orm`, enabling precise control over how fields behave across layers. A helper function `get_metadata` was added to facilitate metadata retrieval from `FieldInfo` objects. The PR refactors the `AuthInfo`, `Comment`, and `PortableCode` classes to adopt this new schema interface. For `AuthInfo` and `Comment`, equality checks are now defined based on the models to improve testability and object comparison. The `PortableCode` now also supports `PurePath` for the `filepath_executable` on initialization, though the type of its model's `filepath_files` is corrected to `str` only, reflecting how it is stored in the database. The serialization of repository files is unified for the `filepath_files` field and yaml-export functionality. For objects with repository-stored files, on serialization, files are written to a user-specified directory (or a temporary one if not provided) and the paths are stored in the serialized model. The `from_serialized` method can then be used to reconstruct the object, including its files, from the serialized data. Furthemore, CLI support was also expanded: dynamic option generation now respects `exclude_from_cli` and allows prioritizing or customizing CLI arguments using schema metadata. The `create_code` command and dynamic group listing logic were updated accordingly to use the new models for constructing instances. Additionally, entry points were registered for core ORM classes to improve plugin discoverability, and a new `FilePath` typing alias was introduced for shared use across the codebase. --------- Co-authored-by: Edan Bainglass <edan.bainglass@gmail.com> Co-authored-by: Alexander Goscinski <alex.goscinski@posteo.de> Co-authored-by: Alexander Goscinski <alexander.goscinski@posteo.de>
This PR addresses AEP 010 by introducing a schema interface for AiiDA's ORM entities using
pydantic. The goal is to make the structure and content of ORM classes introspectable and serializable, enabling external applications—such as web APIs or CLIs—to interface with AiiDA in a programmatic, schema-driven way. Each core ORM class now defines aModelsubclass that exposes its fields using rich metadata annotations. These models support conversion to and from ORM instances (_from_model,_to_model), as well as a user API yielding JSON-compatible formats (from_serialized,serialize). The serialization mechanism is experimental and is stated to the user as such on use of the serialization methods.In the process of implementing the original proposal, the
MetadataFieldutility was extended with additional metadata such asexclude_to_orm,exclude_from_cli,orm_class,orm_to_model, andmodel_to_orm, enabling precise control over how fields behave across layers. A helper functionget_metadatawas added to facilitate metadata retrieval fromFieldInfoobjects.The PR refactors the
AuthInfo,Comment, andPortableCodeclasses to adopt this new schema interface. ForAuthInfoandComment, equality checks are now defined based on the models to improve testability and object comparison. ThePortableCodenow also supportsPurePathfor thefilepath_executableon initialization, though the type of its model'sfilepath_filesis corrected tostronly, reflecting how it is stored in the database. The serialization of repository files is unified for thefilepath_filesfield and yaml-export functionality.For objects with repository-stored files, on serialization, files are written to a user-specified directory (or a temporary one if not provided) and the paths are stored in the serialized model. The
from_serializedmethod can then be used to reconstruct the object, including its files, from the serialized data.Furthemore, CLI support was also expanded: dynamic option generation now respects
exclude_from_cliand allows prioritizing or customizing CLI arguments using schema metadata. Thecreate_codecommand and dynamic group listing logic were updated accordingly to use the new models for constructing instances.Additionally, entry points were registered for core ORM classes to improve plugin discoverability, and a new
FilePathtyping alias was introduced for shared use across the codebase.