Skip to content

Commit 15bac20

Browse files
authored
feat(qa): cross check run list from trains and DSTs (#446)
1 parent af5da39 commit 15bac20

11 files changed

Lines changed: 147 additions & 24 deletions

File tree

bin/qtl

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ usage() {
1818
physics Generate and analyze physics QA timelines (Step 2)
1919
error Scan for errors in Slurm logs (for Step 1)
2020
reheat Reproduce a data file, e.g., to rerun postprocessing
21+
xtrain Cross check run list from trains and DSTs
2122
2223
OPTIONS: Each command has its own set of options; run a command with no
2324
additional options to see usage for that command.
@@ -41,6 +42,7 @@ case $cmd in
4142
ph*) exec $TIMELINESRC/bin/qtl-physics "$@" ;;
4243
er*) exec $TIMELINESRC/bin/qtl-error "$@" ;;
4344
re*) exec $TIMELINESRC/bin/qtl-reheat "$@" ;;
45+
xt*) exec $TIMELINESRC/bin/qtl-xtrain "$@" ;;
4446
-v|--version)
4547
echo $(mvn -q help:evaluate -Dexpression=project.version -DforceStdout -f $TIMELINESRC/pom.xml || echo "UNKNOWN")
4648
exit 0

bin/qtl-xtrain

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
#!/usr/bin/env ruby
2+
3+
require 'set'
4+
5+
unless ARGV.length == 2
6+
puts """
7+
Verify that a directory of a train's skim files has the same list of
8+
run numbers as a directory of DST-file run directories.
9+
10+
USAGE: qtl xtrain [TRAIN_DIR] [DST_DIR]
11+
12+
Both directories must be on /mss
13+
"""
14+
exit 2
15+
end
16+
train_dir, dst_dir = ARGV
17+
18+
# function to get a set of run numbers from one of the argument dirs
19+
def get_runnums(path, type)
20+
runnums = Set.new
21+
raise "#{type} dir `#{path}` is not on /mss" unless path.match? /^\/mss\//
22+
raise "#{type} dir `#{path}` does not exist" unless Dir.exist? path
23+
24+
# get list of files/directories within
25+
files = []
26+
case type
27+
when :train
28+
files = Dir.glob File.join(path, '*.hipo')
29+
when :DST
30+
files = Dir.glob File.join(path, '*/')
31+
else
32+
raise 'bad type'
33+
end
34+
raise "no #{type} files found in #{type} dir `#{path}`" if files.empty?
35+
36+
# extract their run numbers
37+
files.each do |file|
38+
nums = File.basename(file).scan(/\d+/).map &:to_i
39+
raise "failed to get run number from #{type} object `#{file}`" unless nums.length == 1
40+
runnums << nums[0]
41+
end
42+
raise "failed to get run numbers from #{type} dir `#{path}`" if runnums.empty?
43+
runnums
44+
end
45+
46+
# get runnum lists
47+
train_runs = get_runnums train_dir, :train
48+
dst_runs = get_runnums dst_dir, :DST
49+
puts """----------------------------------------------------------------------------------
50+
train dir run list:
51+
#{train_runs}
52+
DST dir run list:
53+
#{dst_runs}
54+
----------------------------------------------------------------------------------"""
55+
56+
# compare runnum sets
57+
only_in_trains = train_runs - dst_runs
58+
only_in_dsts = dst_runs - train_runs
59+
60+
# return results
61+
code = 0
62+
unless only_in_trains.empty?
63+
$stderr.puts "ERROR: there are runs with skim files, but no corresponding DST-file directories:"
64+
$stderr.puts only_in_trains
65+
code = 1
66+
end
67+
unless only_in_dsts.empty?
68+
$stderr.puts "ERROR: there are runs with DST-file directories, but no corresponding skim files:"
69+
$stderr.puts only_in_dsts
70+
code = 1
71+
end
72+
puts "All good" if code == 0
73+
exit code

doc/qa.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,14 @@ If you are performing a manual QA as part of a cross check, skip to the next sec
4949
- use the scripts in the [`prescaler/` directory](/qadb/prescaler)
5050
</details>
5151

52+
<details>
53+
<summary>- [ ] cross check run list from trains and from DSTs</summary>
54+
55+
- use `qtl xtrain` to make sure the list of DST runs is consistent with the list of runs from a train
56+
- sometimes there are missing train files
57+
- the script also checks for missing DST files (though that should be impossible to happen)
58+
</details>
59+
5260
<details>
5361
<summary>- [ ] make sure all data are cached</summary>
5462

qadb/notes/rga_fa18.md

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,15 +7,21 @@
77
88
We will use the `nSidis` train.
99

10-
First make sure all skim files are cached:
10+
Cross check the train and DST run lists:
1111
```bash
12-
qtl histogram -d rga_fa18_inbending_nSidis --check-cache --flatdir --focus-physics /cache/clas12/rg-a/production/recon/fall2018/torus-1/pass2/main/train/nSidis
13-
qtl histogram -d rga_fa18_outbending_nSidis --check-cache --flatdir --focus-physics /cache/clas12/rg-a/production/recon/fall2018/torus+1/pass2/train/nSidis
12+
bin/qtl xtrain /mss/clas12/rg-a/production/recon/fall2018/torus-1/pass2/main/train/nSidis /mss/clas12/rg-a/production/recon/fall2018/torus-1/pass2/main/dst/recon/
13+
bin/qtl xtrain /mss/clas12/rg-a/production/recon/fall2018/torus+1/pass2/train/nSidis /mss/clas12/rg-a/production/recon/fall2018/torus+1/pass2/dst/recon/
14+
```
15+
16+
Make sure all skim files are cached:
17+
```bash
18+
bin/qtl histogram -d rga_fa18_inbending_nSidis --check-cache --flatdir --focus-physics /cache/clas12/rg-a/production/recon/fall2018/torus-1/pass2/main/train/nSidis
19+
bin/qtl histogram -d rga_fa18_outbending_nSidis --check-cache --flatdir --focus-physics /cache/clas12/rg-a/production/recon/fall2018/torus+1/pass2/train/nSidis
1420
```
1521
then run monitoring
1622
```bash
17-
qtl histogram -d rga_fa18_inbending_nSidis --submit --flatdir --focus-physics /cache/clas12/rg-a/production/recon/fall2018/torus-1/pass2/main/train/nSidis
18-
qtl histogram -d rga_fa18_outbending_nSidis --submit --flatdir --focus-physics /cache/clas12/rg-a/production/recon/fall2018/torus+1/pass2/train/nSidis
23+
bin/qtl histogram -d rga_fa18_inbending_nSidis --submit --flatdir --focus-physics /cache/clas12/rg-a/production/recon/fall2018/torus-1/pass2/main/train/nSidis
24+
bin/qtl histogram -d rga_fa18_outbending_nSidis --submit --flatdir --focus-physics /cache/clas12/rg-a/production/recon/fall2018/torus+1/pass2/train/nSidis
1925
```
2026

2127
## Double check that we have all the runs

qadb/notes/rga_sp19.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -26,22 +26,26 @@ start-workflow.sh rga-a-sp19*.json ## check that this is the correct JSON file
2626
2727
For the prescaled train:
2828
```bash
29-
qtl histogram -d rga_sp19_prescaled --submit --focus-physics PATH_TO_PRESCALED_TRAIN
29+
bin/qtl histogram -d rga_sp19_prescaled --submit --focus-physics PATH_TO_PRESCALED_TRAIN
3030
```
3131

3232
For the SIDIS train, `nSidis`, first make sure all skim files are cached:
3333
```bash
34-
qtl histogram -d rga_sp19_nSidis --check-cache --flatdir --focus-physics /cache/clas12/rg-a/production/recon/spring2019/torus-1/pass2/dst/train/nSidis
34+
bin/qtl histogram -d rga_sp19_nSidis --check-cache --flatdir --focus-physics /cache/clas12/rg-a/production/recon/spring2019/torus-1/pass2/dst/train/nSidis
3535
```
3636
If they are not:
3737
```bash
3838
ls /mss/clas12/rg-a/production/recon/spring2019/torus-1/pass2/dst/train/nSidis/* | tee jlist.txt
3939
jcache get $(cat jlist.txt)
4040
# then wait for them to be cached
4141
```
42+
Cross check the train and DST run lists:
43+
```bash
44+
bin/qtl xtrain /mss/clas12/rg-a/production/recon/spring2019/torus-1/pass2/dst/train/nSidis /mss/clas12/rg-a/production/recon/spring2019/torus-1/pass2/dst/recon
45+
```
4246
then run monitoring
4347
```bash
44-
qtl histogram -d rga_sp19_nSidis --submit --flatdir --focus-physics /cache/clas12/rg-a/production/recon/spring2019/torus-1/pass2/dst/train/nSidis
48+
bin/qtl histogram -d rga_sp19_nSidis --submit --flatdir --focus-physics /cache/clas12/rg-a/production/recon/spring2019/torus-1/pass2/dst/train/nSidis
4549
```
4650

4751
## Make timelines

qadb/notes/rgb_fa19.md

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,15 +8,21 @@
88
We will use the `sidisdvcs` train. There are inbending and outbending data, which we'll
99
combine to one "dataset" in `qtl histogram`.
1010

11-
First make sure all skim files are cached:
11+
Cross check the train and DST run lists:
1212
```bash
13-
qtl histogram -d rgb_fa19_sidisdvcs --check-cache --flatdir --focus-physics \
13+
bin/qtl xtrain /mss/clas12/rg-b/production/recon/fall2019/torus+1/pass2/v1/dst/train/sidisdvcs /mss/clas12/rg-b/production/recon/fall2019/torus+1/pass2/v1/dst/recon
14+
bin/qtl xtrain /mss/clas12/rg-b/production/recon/fall2019/torus-1/pass2/v1/dst/train/sidisdvcs /mss/clas12/rg-b/production/recon/fall2019/torus-1/pass2/v1/dst/recon
15+
```
16+
17+
Make sure all skim files are cached:
18+
```bash
19+
bin/qtl histogram -d rgb_fa19_sidisdvcs --check-cache --flatdir --focus-physics \
1420
/cache/clas12/rg-b/production/recon/fall2019/torus+1/pass2/v1/dst/train/sidisdvcs/ \
1521
/cache/clas12/rg-b/production/recon/fall2019/torus-1/pass2/v1/dst/train/sidisdvcs/
1622
```
1723
then run monitoring
1824
```bash
19-
qtl histogram -d rgb_fa19_sidisdvcs --submit --flatdir --focus-physics \
25+
bin/qtl histogram -d rgb_fa19_sidisdvcs --submit --flatdir --focus-physics \
2026
/cache/clas12/rg-b/production/recon/fall2019/torus+1/pass2/v1/dst/train/sidisdvcs/ \
2127
/cache/clas12/rg-b/production/recon/fall2019/torus-1/pass2/v1/dst/train/sidisdvcs/
2228
```

qadb/notes/rgb_sp19.md

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,13 +7,18 @@
77
88
We will use the `sidisdvcs` train.
99

10-
First make sure all skim files are cached:
10+
Cross check the train and DST run lists:
1111
```bash
12-
qtl histogram -d rgb_sp19_sidisdvcs --check-cache --flatdir --focus-physics /cache/clas12/rg-b/production/recon/spring2019/torus-1/pass2/v0/dst/train/sidisdvcs
12+
bin/qtl xtrain /mss/clas12/rg-b/production/recon/spring2019/torus-1/pass2/v0/dst/train/sidisdvcs /mss/clas12/rg-b/production/recon/spring2019/torus-1/pass2/v0/dst/recon/
13+
```
14+
15+
Make sure all skim files are cached:
16+
```bash
17+
bin/qtl histogram -d rgb_sp19_sidisdvcs --check-cache --flatdir --focus-physics /cache/clas12/rg-b/production/recon/spring2019/torus-1/pass2/v0/dst/train/sidisdvcs
1318
```
1419
then run monitoring
1520
```bash
16-
qtl histogram -d rgb_sp19_sidisdvcs --submit --flatdir --focus-physics /cache/clas12/rg-b/production/recon/spring2019/torus-1/pass2/v0/dst/train/sidisdvcs
21+
bin/qtl histogram -d rgb_sp19_sidisdvcs --submit --flatdir --focus-physics /cache/clas12/rg-b/production/recon/spring2019/torus-1/pass2/v0/dst/train/sidisdvcs
1722
```
1823

1924
## Double check that we have all the runs

qadb/notes/rgb_wi20.md

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,13 +7,18 @@
77
88
We will use the `sidisdvcs` train.
99

10-
First make sure all skim files are cached:
10+
Cross check the train and DST run lists:
1111
```bash
12-
qtl histogram -d rgb_wi20_sidisdvcs --check-cache --flatdir --focus-physics /cache/clas12/rg-b/production/recon/spring2020/torus-1/pass2/v1/dst/train/sidisdvcs
12+
bin/qtl xtrain /mss/clas12/rg-b/production/recon/spring2020/torus-1/pass2/v1/dst/train/sidisdvcs /mss/clas12/rg-b/production/recon/spring2020/torus-1/pass2/v1/dst/recon
13+
```
14+
15+
Make sure all skim files are cached:
16+
```bash
17+
bin/qtl histogram -d rgb_wi20_sidisdvcs --check-cache --flatdir --focus-physics /cache/clas12/rg-b/production/recon/spring2020/torus-1/pass2/v1/dst/train/sidisdvcs
1318
```
1419
then run monitoring
1520
```bash
16-
qtl histogram -d rgb_wi20_sidisdvcs --submit --flatdir --focus-physics /cache/clas12/rg-b/production/recon/spring2020/torus-1/pass2/v1/dst/train/sidisdvcs
21+
bin/qtl histogram -d rgb_wi20_sidisdvcs --submit --flatdir --focus-physics /cache/clas12/rg-b/production/recon/spring2020/torus-1/pass2/v1/dst/train/sidisdvcs
1722
```
1823

1924
## Double check that we have all the runs

qadb/notes/rgc_fa22.md

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,10 +7,15 @@
77
88
We will use the `sidisdvcs` train.
99

10+
Cross check the train and DST run lists:
11+
```bash
12+
for d in $(ls -d /mss/clas12/rg-c/production/fall22/pass1/*/dst); do echo "===== $d ====="; bin/qtl xtrain $d/train/sidisdvcs $d/recon; done
13+
```
14+
1015
We will combine the targets' data into a single dataset named `rgc_fa22_prescaled`.
1116
```bash
12-
qtl histogram --check-cache -d rgc_fa22_sidisdvcs --flatdir --focus-physics $(ls -d /cache/clas12/rg-c/production/fall22/pass1/*/dst/train/sidisdvcs/)
13-
qtl histogram -d rgc_fa22_sidisdvcs --flatdir --focus-physics $(ls -d /cache/clas12/rg-c/production/fall22/pass1/*/dst/train/sidisdvcs/)
17+
bin/qtl histogram --check-cache -d rgc_fa22_sidisdvcs --flatdir --focus-physics $(ls -d /cache/clas12/rg-c/production/fall22/pass1/*/dst/train/sidisdvcs/)
18+
bin/qtl histogram -d rgc_fa22_sidisdvcs --flatdir --focus-physics $(ls -d /cache/clas12/rg-c/production/fall22/pass1/*/dst/train/sidisdvcs/)
1419
```
1520

1621
## Double check that we have all the runs

qadb/notes/rgc_sp23.md

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,10 +7,15 @@
77
88
We will use the `sidisdvcs` train.
99

10+
Cross check the train and DST run lists:
11+
```bash
12+
for d in $(ls -d /mss/clas12/rg-c/production/spring23/pass1/*/dst); do echo "===== $d ====="; bin/qtl xtrain $d/train/sidisdvcs $d/recon; done
13+
```
14+
1015
We will combine the targets' data into a single dataset named `rgc_sp23_prescaled`.
1116
```bash
12-
qtl histogram --check-cache -d rgc_sp23_sidisdvcs --flatdir --focus-physics $(ls -d /cache/clas12/rg-c/production/spring23/pass1/*/dst/train/sidisdvcs/)
13-
qtl histogram -d rgc_sp23_sidisdvcs --flatdir --focus-physics $(ls -d /cache/clas12/rg-c/production/spring23/pass1/*/dst/train/sidisdvcs/)
17+
bin/qtl histogram --check-cache -d rgc_sp23_sidisdvcs --flatdir --focus-physics $(ls -d /cache/clas12/rg-c/production/spring23/pass1/*/dst/train/sidisdvcs/)
18+
bin/qtl histogram -d rgc_sp23_sidisdvcs --flatdir --focus-physics $(ls -d /cache/clas12/rg-c/production/spring23/pass1/*/dst/train/sidisdvcs/)
1419
```
1520

1621
## Double check that we have all the runs

0 commit comments

Comments
 (0)