Skip to content

Add built-in support and API for exit codes in WorkChains#1640

Merged
sphuber merged 2 commits into
aiidateam:developfrom
sphuber:fix_1639_process_spec_error_codes
Jun 22, 2018
Merged

Add built-in support and API for exit codes in WorkChains#1640
sphuber merged 2 commits into
aiidateam:developfrom
sphuber:fix_1639_process_spec_error_codes

Conversation

@sphuber
Copy link
Copy Markdown
Contributor

@sphuber sphuber commented Jun 11, 2018

Fixes #1639

Add built-in support and API for exit codes in WorkChains

Currently it is possible to return non-zero integers from a WorkChain
outline step to abort its execution and assign the exit status to the
node, however, there is no official API yet to make this easier.

Here we define the concept of an ExitCode, a named tuple that takes
an integer exit status and a descriptive message. Through the
ProcessSpec, a WorkChain developer can add exit codes that correspond
to errors that may crop up during the execution of the workchain. For
example the following spec definition:

@classmethod
def define(cls, spec):
        super(CifCleanWorkChain, cls).define(spec)
        spec.exit_code(418, 'ERROR_I_AM_A_TEAPOT',
        message='workchain found itself in an identity crisis')

In one of the outline steps, the exit code can be used by retrieving
it through either the integer exit status or the string label:

return self.exit_codes.ERROR_I_AM_A_TEAPOT

or

return self.exit_codes(418)

and return it. Returning an instance of ExitCode will trigger the exact
same mechanism as returning an integer from an outline step. The engine
will detect the ExitCode and return its exit status as the result,
triggering the abort and the exit status to be set on the Node.

Note that this addition required name changes in parts of the existing
API. For example the Calculation attribute finish_status was renamed
to exit_status. This is because it can now be accompanied by an string
exit_message. The exit_status and exit_message together form the
ExitCode.

@sphuber sphuber requested review from greschd and muhrin June 11, 2018 16:33
@sphuber
Copy link
Copy Markdown
Contributor Author

sphuber commented Jun 11, 2018

@muhrin and @greschd : let's discuss here about the API and the implementation details. I started implementing this in plumpy but I noticed that the WorkChain in aiida-core does not actually sub class the WorkChain class from plumpy. Was there a specific reason for this @muhrin , given that most of the tests and functionality seems to be present in plumpy.

@codecov-io
Copy link
Copy Markdown

codecov-io commented Jun 11, 2018

Codecov Report

Merging #1640 into develop will increase coverage by 0.03%.
The diff coverage is 76.34%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #1640      +/-   ##
===========================================
+ Coverage     57.1%   57.13%   +0.03%     
===========================================
  Files          274      275       +1     
  Lines        33874    33911      +37     
===========================================
+ Hits         19343    19376      +33     
- Misses       14531    14535       +4
Impacted Files Coverage Δ
aiida/daemon/execmanager.py 8.22% <0%> (ø) ⬆️
aiida/cmdline/utils/common.py 22.41% <0%> (ø) ⬆️
aiida/work/process_spec.py 100% <100%> (ø) ⬆️
aiida/work/exceptions.py 100% <100%> (ø) ⬆️
aiida/orm/calculation/job/__init__.py 100% <100%> (ø) ⬆️
aiida/orm/implementation/calculation.py 92.3% <100%> (ø) ⬆️
aiida/cmdline/commands/work.py 30.59% <20%> (ø) ⬆️
...implementation/general/calculation/job/__init__.py 38.49% <28.57%> (+0.05%) ⬆️
aiida/cmdline/commands/calculation.py 13.77% <60%> (ø) ⬆️
aiida/work/exit_code.py 83.33% <83.33%> (ø)
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update da10680...13810f9. Read the comment docs.

@sphuber
Copy link
Copy Markdown
Contributor Author

sphuber commented Jun 12, 2018

I am starting to rethink the nomenclature of not just ErrorCode but also of finish_status. I would like to add to the finish_status integer a string message like finish_message. This message could be defined through the spec and would be stored as an attribute on the node just like the finish status. However, writing all this new infrastructure I am starting to rethink the naming. The name finish_status has been in the workflows branch for a while and now in develop since the merge, so some people may already started working with it, but I am hoping that it is still okay to rename it.

My thinking is to use the more standard and general name "exit code", as used a lot in Linux. An exit code comes with an integer exit status and an optional exit message. I feel that using this nomenclature makes the interface more consistent. It would look something like this:

# Named tuple to return from Processes, e.g. from WorkChain
ExitCode = collections.namedtuple('ExitCode', 'status message')

# Exception to exit from FunctionProcesses
class Exit(Exception):

    def __init__(self, status, message):
        self._status = status
        self._message = message

# Attributes of the Calculation node
def exit_status(self):
def exit_message(self):

# Definition of exit code on ProcessSpec
ProcessSpec.exit_code(418, 'ERROR_I_AM_A_TEAPOT', 'the process found itself in an identity crisis')

@sphuber sphuber changed the title Add built-in support and API for error codes in WorkChains Add built-in support and API for exit codes in WorkChains Jun 12, 2018
@sphuber sphuber force-pushed the fix_1639_process_spec_error_codes branch 3 times, most recently from 6eeb442 to f3fa7cc Compare June 12, 2018 15:52
@muhrin
Copy link
Copy Markdown
Contributor

muhrin commented Jun 15, 2018

@sphuber Why is there a code, a string name for the exit status and the message. Do we really need the ERROR_I_AM_A_TEAPOT?

@greschd
Copy link
Copy Markdown
Member

greschd commented Jun 15, 2018

Concerning the integer exit code: What's the convention by which they are assigned? Is it the same as for Unix exit codes, or some other convention?

I think it might make sense to encode this convention for example in an IntEnum:

from enum import IntEnum

class ProcessExitCode(IntEnum):
    SUCCESS = 0
    ERROR_GENERIC = 1
    ERROR_I_AM_A_TEAPOT = 418

@greschd
Copy link
Copy Markdown
Member

greschd commented Jun 15, 2018

To clarify: In general I like the use of a descriptive name instead of just an integer code because it is less cryptic when the error is created. However, this doesn't give you any benefit unless the descriptive name is in some way coupled to the integer code.

@sphuber
Copy link
Copy Markdown
Contributor Author

sphuber commented Jun 15, 2018

@muhrin I added the label because I still want to be able to refer to the error code in the WorkChain definition by a label and not an integer. Imagine for example the following example WorkChain:

class CifCleanWorkChain(WorkChain):

    @classmethod
    def define(cls, spec):
        super(CifCleanWorkChain, cls).define(spec)
        spec.exit_code(401, 'ERROR_CIF_FILTER_FAILED',
            message='the CifFilterCalculation step failed')
        spec.outline(
            cls.run_filter_calculation,
            cls.inspect_filter_calculation,
        )

    def inspect_filter_calculation(self):
        try:
            self.ctx.cif = self.ctx.cif_filter.out.cif
        except AttributeError:
            self.report('aborting: the CifFilterCalculation did not return the required cif output')
            return self.exit_code('ERROR_CIF_FILTER_FAILED')

In the last line, where I am returning the exit code, I do not want to have to refer to it by the integer exit status. Imagine you have a more complex workchain with many exit codes, it is more difficult to understand what is going on if you see return self.exit_code(401). The label is merely for the developers.

@sphuber
Copy link
Copy Markdown
Contributor Author

sphuber commented Jun 15, 2018

@greschd concerning the integer exit status convention, we use exactly that of Unix, i.e. 0 means success any non-zero positive integer means failure. Now for the Process we do not have any standard error codes, but the JobCalculation already had the calculation status, which had several FAILED varietites. To homogenize the process states, I connect those to a predefined enum:

class JobCalculationFinishStatus(enum.Enum):
    """
    This enumeration maps specific calculation states to an integer. This integer can
    then be used to set the finish status of a JobCalculation node. The values defined
    here map directly on the failed calculation states, but the idea is that sub classes
    of AbstractJobCalculation can extend this enum with additional error codes
    """
    FINISHED = 0
    SUBMISSIONFAILED = 100
    RETRIEVALFAILED = 200
    PARSINGFAILED = 300
    FAILED = 400
    I_AM_A_TEAPOT = 418

Since we did not remove the calc_state I simply mapped those onto these default exit codes so those will automatically be assigned by the daemon. I am not sure how much of a convention we should enforce but I am open to suggestions

@sphuber sphuber force-pushed the fix_1639_process_spec_error_codes branch from f3fa7cc to bc0daf8 Compare June 20, 2018 08:59
@sphuber
Copy link
Copy Markdown
Contributor Author

sphuber commented Jun 20, 2018

After discussion with @muhrin the exit code collection is made callable to be able to retrieve a particular exit code from the WorkChain as follows

class CifCleanWorkChain(WorkChain):

    @classmethod
    def define(cls, spec):
        super(CifCleanWorkChain, cls).define(spec)
        spec.exit_code(418, 'ERROR_I_AM_A_TEAPOT',
            message='workchain found itself in an identity crisis')
        spec.outline(
            cls.run_filter_calculation,
            cls.inspect_filter_calculation,
        )

    def inspect_filter_calculation(self):
        try:
            self.ctx.cif = self.ctx.cif_filter.out.cif
        except AttributeError:
            exit_code = self.exit_codes.ERROR_I_AM_A_TEAPOT
            exit_code = self.exit_codes('ERROR_I_AM_A_TEAPOT')
            exit_code = self.exit_codes(418)
            return exit_code

I also updated the documentation to explain the exit code definition on the ProcessSpec as well as how to reference and use them within the outline code itself.

@sphuber sphuber force-pushed the fix_1639_process_spec_error_codes branch from b3e8cbd to 5a18f02 Compare June 20, 2018 20:08
sphuber added 2 commits June 21, 2018 18:44
Currently it is possible to return non-zero integers from a WorkChain
outline step to abort its execution and assign the exit status to the
node, however, there is no official API yet to make this easier.

Here we define the concept of an ExitCode, a named tuple that takes
an integer exit status and a descriptive message. Through the
ProcessSpec, a WorkChain developer can add exit codes that correspond
to errors that may crop up during the execution of the workchain. For
example the following spec definition:

	@classmethod
	def define(cls, spec):
   		super(CifCleanWorkChain, cls).define(spec)
    	spec.exit_code(418, 'ERROR_I_AM_A_TEAPOT',
        	message='workchain found itself in an identity crisis')

In one of the outline steps, the exit code can be used by retrieving
it through either the integer exit status or the string label:

	return self.exit_code('I_AM_A_TEAPOT')

or

	return self.exit_code(418)

and return it. Returning an instance of ExitCode will trigger the exact
same mechanism as returning an integer from an outline step. The engine
will detect the ExitCode and return its exit status as the result,
triggering the abort and the exit status to be set on the Node.

Note that this addition required name changes in parts of the existing
API. For example the Calculation attribute `finish_status` was renamed
to `exit_status`. This is because it can now be accompanied by an string
`exit_message`. The `exit_status` and `exit_message` together form the
`ExitCode`.
This allows one to both access any particular exit code from the
collection through attribute dereferencing:

	self.exit_codes.ERROR_I_AM_A_TEAPOT

as well as through calling it with the string label or integer
status, given that the ExitCodeNamespace implements the call method

	self.exit_codes(418)
	self.exit_codes('ERROR_I_AM_A_TEAPOT')
@sphuber sphuber force-pushed the fix_1639_process_spec_error_codes branch from 5a18f02 to 13810f9 Compare June 21, 2018 16:53
@sphuber
Copy link
Copy Markdown
Contributor Author

sphuber commented Jun 21, 2018

Alright @muhrin @greschd , she is all ready to go. Note that after a final discussion we replaced the method of raising the Exit exception to exit from a workfunction, by simply returning an ExitCode tuple. Since the later is exactly what is done in the WorkChain, that added consistency is nicer than raising, which offers no real benefits

Copy link
Copy Markdown
Contributor

@muhrin muhrin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good mate

@sphuber sphuber merged commit adefb22 into aiidateam:develop Jun 22, 2018
@sphuber sphuber deleted the fix_1639_process_spec_error_codes branch June 22, 2018 08:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants