StashCalculation: a new CalcJob plugin#6772
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #6772 +/- ##
==========================================
+ Coverage 78.29% 78.31% +0.02%
==========================================
Files 566 567 +1
Lines 42766 42796 +30
==========================================
+ Hits 33481 33510 +29
- Misses 9285 9286 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| spec.inputs.pop('code', None) | ||
|
|
||
| # Ideally one could use the same computer as the one of the `source_node`. | ||
| # However, if another computer has access to the directory, we don't want to restrict.` |
There was a problem hiding this comment.
I am not sure how would I pass another computer here? Maybe you could also explain more scenario more detailed.
| # However, if another computer has access to the directory, we don't want to restrict.` | |
| # However if you cannot access the stash storage from the same computer anymore but you have access to it from another computer, you can can specify the computer with this commented block below | |
There was a problem hiding this comment.
Thanks I updated the comment, and moved it to docstring.
Passing computer appears to be a very routing procedure it's done via metadata.computer.
It should be very clear, now, even for the users.
|
|
||
| if calculation.process_type == 'aiida.calculations:core.stash': | ||
| remote_node = load_node(calculation.inputs.source_node.pk) | ||
| uuid = calculation.inputs.source_node.uuid |
There was a problem hiding this comment.
might increase readability
| uuid = calculation.inputs.source_node.uuid | |
| uuid = remote_node.uuid |
239ff8e to
c535928
Compare
applied! |
84f49a7 to
4c1504b
Compare
StashCalculation: new CalcJob pluginStashCalculation: a new CalcJob plugin
|
|
||
| spec.input( | ||
| 'source_node', | ||
| valid_type=(orm.RemoteData, orm.SinglefileData), |
There was a problem hiding this comment.
🤔 yeah, you're probably right
There was a problem hiding this comment.
Alright, I added orm.FolderData to the list.
There was a problem hiding this comment.
did you? I still don't see it here
|
|
||
| if stash_mode == StashMode.COPY.value: | ||
| target_basepath = Path(stash_options['target_base']) / uuid[:2] / uuid[2:4] / uuid[4:] | ||
| target_basepath = target_base / uuid[:2] / uuid[2:4] / uuid[4:] |
There was a problem hiding this comment.
Can you make the construction of thpa path uuid[:2] / uuid[2:4] / uuid[4:] part of the CalcJob or CalcInfo (not sure which makes more sense but I see that also the the uuid is used from the CalcInfo? This should make this path less magical and better understandable. Then you also need to use that function in engine.daemon.execmanager:upload_calculation (AFAIK this is where the calculation folder is created)
There was a problem hiding this comment.
Yeah, but that would only work if stashing is done as a separate calcjob.
If stashing is performed as part as a generic calcjob, e.g. qe , then this is not possible, And making the folder should be done in execmanager.
There was a problem hiding this comment.
Then do a util function that is used in both cases, that takes a uuid as input and in the docstring you can explain how it should be used. The current state is highly suboptimal.
There was a problem hiding this comment.
I understand, but this has been like this even before this PR.
And it's used only here, and in execmanager.py::upload_calculation --but not in the exact same way--
So I'd leave that for another PR, if we have to functionalize that.
There was a problem hiding this comment.
I understand but you are continuing in this PR the bad design. It should be simply fixable by creating an util function. Please explain then why this change needs a significant higher amount of effort beyond just creating an util function that would justifies a fix in a separate PR.
There was a problem hiding this comment.
So I made a function that is being used only once?
There was a problem hiding this comment.
Ok, I'm convinced. It's actually two places. :) also here:
workdir = Path(remote_working_directory).joinpath(calc_info.uuid[:2], calc_info.uuid[2:4])Line 135 execmanager
is it ok, if I do this in a separate PR? I'll open a sub-issue for this.
There was a problem hiding this comment.
yes issue is also okay or directly PR might be easier since it should be easy fix (if I am not mistaken)
There was a problem hiding this comment.
Just had the look, the logic in upload_calculation is a bit not straight forward. I'd leave this for another PR.
|
|
||
| elif self._command == STASH_COMMAND: | ||
| if node.get_option('stash') is not None: | ||
| if (node.get_option('stash') is not None) or (type(self.process).__name__ == 'StashCalculation'): |
There was a problem hiding this comment.
why not isinstance(self.process, StashCalculation)?
There was a problem hiding this comment.
don't remember why that didn't work,
I have to try it again 🤔
There was a problem hiding this comment.
still open, It could be that you want to explicitly check that it is a StashCalculation and not a child class but that would indicate to me also something being wrong
There was a problem hiding this comment.
Actually, now I realize the second condition is not needed at all.
The first condition would also serve for stashing via a calcjob. The instruction is similar:
inputs = {
'metadata': {
'computer': Computer.collection.get(label="localhost"),
'options': {
'resources': {'num_machines': 1},
'stash': {
'stash_mode': StashMode.COPY.value,
'target_base': '/scratch/my_stashing/',
'source_list': ['aiida.in', '_aiidasubmit.sh'],
},
},
},
'source_node': node_1,
}
Therefore node.get_option('stash') is not None will be True also in this case.
|
|
||
| elif self._command == STASH_COMMAND: | ||
| if node.get_option('stash') is not None: | ||
| if (node.get_option('stash') is not None) or (type(self.process).__name__ == 'StashCalculation'): |
There was a problem hiding this comment.
still open, It could be that you want to explicitly check that it is a StashCalculation and not a child class but that would indicate to me also something being wrong
|
|
||
| if stash_mode == StashMode.COPY.value: | ||
| target_basepath = Path(stash_options['target_base']) / uuid[:2] / uuid[2:4] / uuid[4:] | ||
| target_basepath = target_base / uuid[:2] / uuid[2:4] / uuid[4:] |
There was a problem hiding this comment.
Then do a util function that is used in both cases, that takes a uuid as input and in the docstring you can explain how it should be used. The current state is highly suboptimal.
|
|
||
| spec.input( | ||
| 'source_node', | ||
| valid_type=(orm.RemoteData, orm.SinglefileData), |
|
Hi @agoscinski Thanks a lot for the nice review. |
|
|
||
| spec.input( | ||
| 'source_node', | ||
| valid_type=(orm.RemoteData, orm.SinglefileData), |
There was a problem hiding this comment.
did you? I still don't see it here
|
|
||
| if stash_mode == StashMode.COPY.value: | ||
| target_basepath = Path(stash_options['target_base']) / uuid[:2] / uuid[2:4] / uuid[4:] | ||
| target_basepath = target_base / uuid[:2] / uuid[2:4] / uuid[4:] |
There was a problem hiding this comment.
I understand but you are continuing in this PR the bad design. It should be simply fixable by creating an util function. Please explain then why this change needs a significant higher amount of effort beyond just creating an util function that would justifies a fix in a separate PR.
|
|
||
| class StashCalculation(CalcJob): | ||
| """ | ||
| Utility to stash files from a remote folder. |
There was a problem hiding this comment.
not only remote since you also support SinglefileData (and hopefully FolderData)
| 'target_base': str(target_base), | ||
| 'source_list': ['*'], | ||
| }, | ||
| }, |
There was a problem hiding this comment.
what happens when I forget to add stash options for the StashCalculation? Would like to know if the error message is clear to the user in this case.
There was a problem hiding this comment.
very good point!!!
I totally, missed that. Now made those fields compulsory via:
spec.inputs['metadata']['options']['stash'].required = True
spec.inputs['metadata']['options']['stash']['stash_mode'].required = True
spec.inputs['metadata']['options']['stash']['target_base'].required = True
spec.inputs['metadata']['options']['stash']['source_list'].required = TrueThere was a problem hiding this comment.
This point is important and needs short discussion #6772 (comment) (maybe in person is easier) otherwise is almost there
|
|
||
| if stash_mode == StashMode.COPY.value: | ||
| target_basepath = Path(stash_options['target_base']) / uuid[:2] / uuid[2:4] / uuid[4:] | ||
| target_basepath = target_base / uuid[:2] / uuid[2:4] / uuid[4:] |
There was a problem hiding this comment.
yes issue is also okay or directly PR might be easier since it should be easy fix (if I am not mistaken)
Just had the look, the logic in upload_calculation is a bit not straight forward. I'd leave this for another PR. Opened an issue: |
|
@agoscinski has opened a PR #6837 that would allow a It's better and more intuitive if Update: |
|
@agoscinski please see #6837 (comment) |
There was a problem hiding this comment.
In PR #6837 we explored the possibility to input a CalcJobNode as input to a CalcJob, so we could use its path for stashing. In the discussion there it was concluded that a CalcJob already returns a RemoteData node that contains the calcjob working directory. This RemoteData node can be then passed to the StashCalculation implemented in this PR and allows a same usage of the stash options as one would do it when stashing in the first place when running the CalcJob. Therefore the usage of RemoteData as input is also a user friendly. This needs however to be documented in the readthedocs in a future PR as this is not clear for a user (and developers^^)
Historically, stashing was only possible, if it was instructed before running a generic calcjob. The instruction had to be "attached" to the original calcjob, like this for example:
However, if a user would realize they need to stash something only after running th calcjob, this would not be possible.
This PR, defines a new calcjob, that is able to perform a stashing operation even after a calculation is finished.
The usage is very similar, and for consistency and user-friendliness, we keep the instruction as part of the metadata. The only main input is obviously a source node, for example:
P.S. This PR is part of the upcoming changes, reference in #6764