Speed up creation of Nodes in the AiiDA ORM#2214
Conversation
giovannipizzi
left a comment
There was a problem hiding this comment.
@ltalirz great!!
Can I suggest to add a step in travis (not necessarily a test, maybe an additional test) that runs the speed test (e.g. for creating 10k nodes, and for running say 10x a simple verdi command that a) require load_dbenv and b) does not require it, and output some timings? This might be great in the long term to go back in history and monitor when something broke speedup. Moreover, in Jenkins one can output "artifacts" that are stored, maybe also in travis there is something similar. If you don't have the time to do it but you're ok with this, feel free to copy this text in an issue (and even assign me).
|
I realize tests are falling, there is at least one relative import that has not been moved |
|
For performance tests, I opened #2215 This PR is still a little bit dirty and before I move forward I have a few questions, perhaps also for @sphuber or @muhrin :
|
|
Dear Leo, thanks for the work - looks good. My response:
|
|
@muhrin thanks a lot for your thoughts! Re 2.: Ok. So, the idea is that when you load a new profile (when this becomes possible...), the AiiDAManager cache will be reset, correct? I will add this to its docstring. Re 3.: Could you elaborate on this a bit? In 1. you wrote that the
Furthermore, is there currently anything that invalidates the cache? |
|
This holds even when calling with the default backend, correct? I.e. caching stuff inside So, for the moment, should we cache the default user inside the |
|
Hmm...I thought it was there already but you're right, it's not. The place for caching (the default) objects collection is here: It's quick to add, happy to do it because the performance implications could be great. If you get this PR through I can add the caching, you can track it here: #2216 I would avoid putting the caching in the |
Now if a new collection is created via .objects it is cached for the next time in a lazy store. Similarly the default user for a User collection is cached after being created for the first time. This may lead to issues if the default user can change for a given profile during runtime, however for now we don't support this.
There was a problem hiding this comment.
Do we really need a wildcard here?
Perhaps just import the functions we want (I guess the constants defined in profile.py are not needed here (?))
There was a problem hiding this comment.
Changed this to what we do in other prats of the code (which I borrowed from the asyncio codebase):
from .profile import *
__all__ = profile.__all__
This way we delegate what gets imported to profile itself.
There was a problem hiding this comment.
Do we need to introduce a different adjective (configured user) here?
Why not default user?
Edit: I see you just moved the function. Perhaps still a good time to rename it now?
There was a problem hiding this comment.
Changed to default_user_email
There was a problem hiding this comment.
Was this an exception before as well?
Should it be forbidden not to have a default user?
Edit: I see, it was...
There was a problem hiding this comment.
Yes, it's forbidden. Can't store any nodes without one.
There was a problem hiding this comment.
Is this stuff no longer necessary, this creation of a new_user object?
There was a problem hiding this comment.
It's now done at the ORM level in the aiida/backends/testimplbase.py insert_data so it's backend independent...this is the way, ideally, all these things would be but we can't delete everything from ORM function calls.
|
@muhrin added a few comments, otherwise looks good to me (+ tested that node creation is fast again :-) )! |
There was a problem hiding this comment.
maybe get rid of the print statements or make them debug/info logs
There was a problem hiding this comment.
left over debug print statements
There was a problem hiding this comment.
Is this comment still relevant if the accompanying line is commented out?
There was a problem hiding this comment.
Should we uncomment this assert, or if not just remove it
There was a problem hiding this comment.
wrong indentation in docstring
There was a problem hiding this comment.
This can be done more succinctly now
computer = Computer.get('deneb')
transport = computer.get_transport()
scheduler = computer.get_scheduler()
scheduler.set_transport(transport)
This way we have greater control over its creation and we don't have to implement a reset method which starts to get complicate. We just throw the object away and instsantiate a new one when we need to reset * Removed construct_backend() (replaced by get_manager().get_backend()) * Changed all tests to not specify the backend, they should just do what user facing code does which is use the default one and if we ever want to test with different backends we just implement a function to load a new backend from a profile globally * Moved getting default user email to the `Profile` class
PR aiidateam#2214 stripped of safeguards that prevented the default user from being deleted in between tests. Since AiiDA expects a default user in many operations, leaving the DB in a state with no default user is not ideal. This commit reintroduces safeguards that result in the default user remaining untouched.
aiida.worktoaiida.manageaiida.orm.entities.Collectionaiida.orm.user.User.Collection(fix Add caching of object collections #2216)Benchmark on MacBook Pro, python3
Node.__init__)Node.__init__)The timing of 58.5s is significantly worse than what we wrote down during the tests with @szoupanos and @giovannipizzi (which was ~10s for 10k nodes), i.e. probably someone broke the caching.
Anyhow, the target we set then was: 2s for 10k nodes, which we have reached now on python3 (it's the 1.7s that should scale linearly with the number of nodes).