Conversation
54097cd to
406ad06
Compare
…ip-init' and 'origin/095-skip-init' into data4es-update
As `--ignore` accepts exact IDs, used in the code, sometimes it can be not obvious how to specify desired ID (like in case of '91_in'/'91_out'). New option allows to check correct spelling.
5c464d3 to
99ce05f
Compare
|
Instead of rebasing this branch on the current version of |
|
Resolve [WIP] status for what`s left to do does not really belong here -- it is up to every individual stage. |
There was a problem hiding this comment.
9f5b26b: this commit's description talks about changing "OC" to "09", but also transforms (unrelated to OC/09) line 208 into three lines (due to length exceeding 80 symbols, I presume). Shouldn't this change be a separate commit, or amendment to one of the previous ones?
0e4e8c9 and fb6c387: options --help and --list are implied, both by general logic and description, to merely output some info and exit. However, some work is actually done and, for example, checking of 095's credentials may fail and lead to script termination without info being displayed.
'09' was a "temporary name" for the Oracle Connector (OC) stage; nothing wrong, but in the view of `--ignore` option it could be confusing. So now '09' is the regular ID for the given stage, and 'OC' is completely removed from the code.
snflicts: Utils/Dataflow/run/data4es-start
99ce05f to
69d0fa0
Compare
Stage commands definition was moved into a separate function to have possibility to define all commands, get status of the operation but not exit immediately in case of error.
…cess. Now process status is checked only when we want to actually run the process, not to get some information about the script usage.
This operation clearly belongs to the commands definition. Previously it was done separately since we needed to define commands before cmdline arguments parsing (for `--list` option), but now it's not the case.
Thank you, fixed now.
Good point, thank you; it also appeared that they failed to be properly handled in case of already running process. Fixed now. |
Branches are still created (Line 195) when these options are used, which may lead to similar problems: |
If script `data4es-start` is run with `--help` or `--list` parameter, it doesn't need to create pipes for branches -- and if it does, it may fail unexpectedly (not showing the expected usage message or list of stages). Now this operation is moved to where all other "actual actions" are performed (like `init_process`, etc.)
Should work properly now, please take a look. |
Yes, works properly now, thank you. |
|
Checked again: while Obvious difference between Similar problem arises if stage 095 is added to ignore list. Also, the error message part about "certificate files: and " can be puzzling - maybe add a specific error message for the case when files are not defined? |
|
Comment to the PR text itself: it seems to me that four boxes starting with "make stages that might ..." should be ticked (and the last one should also link to #263) because #260, #261, #262 and #263 responsible for them were merged successfully. I'm writing this instead of fixing the text myself or ignoring it, in case that I'm wrong and something is actually not finished there. |
I suggest to fix it in a probably unexpected way: remove handling of the files required for Stage 095 from |
834ca4d to
56dfca5
Compare
Incoordination of verbs after the last change (56dfca5).
Looks good to me. |
Resolving conflicts: Utils/Dataflow/run/data4es-start
Main changes:
--ignore <stageID>[,<stageID>]option to skip specific stages.--listoption to show list of stage IDs (to be used with--ignore).--helpoption to show usage info.The goal is to allow
data4esto utilize ESupdatefunctionality when load data.It will allow to "disable" some ETL process nodes for faster processing and not to lose information written to ES earlier. Of course, it will make high load on the ES server, so must be used with caution; still in some situations it looks like a better solution (not to load external sources; not to request ES for known information if external source is not available or doesn`t have requested information).
ToDo:
--updateto Stage 19 (changing default action fromindextoupdate) (see DF/019: introduce 'update' action (ES updates, pt.1.1) #254);(make stages that might stumble over external sources mark output messages "for update":)(025_chicagoES (see DF/025: skip init (ES updates, pt.2.2a) #261);)(091_datasetsRucio (see DF/091: skip mode (ES updates, pt.2.2b) #262);)(095_datasetInfoAMI;)data4es-start:--skipoption to stages (see pyDKB: processor--skipoption (ES updates, pt.2.1). #260):--ignore STAGE...option todata4es-start(currently in this PR);master.