-
Notifications
You must be signed in to change notification settings - Fork 71
Expand file tree
/
Copy pathREADME.qpWave
More file actions
141 lines (107 loc) · 5.53 KB
/
README.qpWave
File metadata and controls
141 lines (107 loc) · 5.53 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
DOCUMENTATION OF QpWavs and qpAdm:
See also .examples/pdoc.pdf
qpWave:
qpWave requires that the input data is available in a Reichlab format such as EIGENSTRAT.
To convert to the appropriate format, one can use CONVERTF. See
README.CONVERTF for documentation of programs for converting file formats.
Executable and source code:
------------------------------------------------------------------------------
For information about installing the program, see README.ADMIXTOOLS.
After installing the programs, the executable for qpWave should be located in the bin directory.
To run qpWave, type the following on a linux machine.
$DIR/bin/qpWave -p parfile >logfile
$DIR: Path to the bin directory.
logfile: Name of the logfile. The logfile contains the output of the run.
parfile: Name of parameter file
qpWave gives evidence of the number of admixture flows between the left and right populations,
and should usually be run as a precursor to qpAdm (see below).
DESCRIPTION OF EACH PARAMETER in parfile:
genotypename: input genotype file (in eigenstrat or packedancestrymap r format)
snpname: input snp file (in eigenstrat format)
indivname: input indiv file (in eigenstrat format)
popleft: left population list (1 per line)
popright: right population list (1 per line)
details: YES
allsnps: YES
## default NO
When set to YES, the maximum number of snps available for each f4 statistic are used. By default,
only the intersect of snps across all Left and Right populations are used.
*** NEW ***
If allsnps: YES qpfstats is called to calculate the relevant f-statistics. If you want
compatibility with older versions (VERSION < 1400) code:
allsnps: YES
inbreed: YES
oldallsnpsmode: NO
This is deprecated as the algorithm is inferior.
*** strongly recommended that inbreed be set. YES if possible.
oracle: NO
## see description below ; oracle: YES is default and recommended
The same feature is now also in qpAdm
chrom: Only use snps in the specified chromosome.
DESCRIPTION OF OUTPUT FILE:
The program will write all the output to stdout. The output file prints the parfile entered by the user,
number of snps and individuals, jackknife block size, number of blocks for jackknife and the results.
See example shown in-
examples/qpWave.log
qpAdm:
To run qpAdm, type the following on a linux machine.
$DIR/bin/qpAdm -p parfile >logfile
$DIR: Path to the bin directory.
logfile: Name of the logfile. The logfile contains the output of the run.
parfile: Name of parameter file
qpAdm takes a parameter file in exactly the same format as qpWave.
The first population of (popleft) is the target population and the
main purpose of the program is to provide admixture weights, writing the
target as a linear combination of the other populations of popleft.
details: YES/NO
is a new option (not available in qpWave)
.../examples/qpAdm.log has annotations ### manually added to explain output
CAVEATS
1) It is important to realize that the answers are invalid if there has
been post admixture gene-flow between left and right populations.
2) Weights may be negative. If significantly so, the admixture model is
probably wrong.
3) We recommend keeping popright small. If large the covariance matrix
of f4-statististics is likely to be poorly estimated. I then don't trust
the computed p-values although the admixture weights seem to usually be
reasonable.
4) *** NEW ***
The new version is more robust in boundary cases where
the mixing coefficients are not well determined. In
well-conditioned cases the difference from the old code should
be very small. There's a magic constant (diagplus) that gets added
along the diagonal of various matrices. To get the old answers with the new
program code:
diagplus: 0
*** new feature (for both qpAdm and qpWave)***
Input can now be an fstats file -- see README.qpfstats
For qpAdm new parameter numboot: see README.qpfstats
*** (fairly) NEW ***
If you run qpWave or qpAdm with allsnps: YES but no fstats filer, then the programs
call qpfstats with a system call. If this call fails it can be tricky to debug.
Options to make the process more transparent:
fstatslog: <your log file>
keeptmp: YES
## (Various temporary files are created in $STMP (see below) or /tmp. By default these are deleted.
fstastoutname: <myfstats.txt>
In some cases it may be useful to keep this file for future runs.
It is not always practical, but for some projects involbing less than (say) 40 populations it will be
best to run qpfstats (with allsnps: YES) and then run many qpWave/qpAdm runs off the fstats file
qpfstats may be slow but the follow on runs will be very fast.
By default /tmp is used for temporary files. To get more control set an environment variable STMP.
For instance in bash code
export STMP=~/mytrashdir
The directory should exist when qpWave or qpAdm are run.
*** NEW ***
qpWave and qpAdm compute large covariance matrices. In some cases
the eigenvalues are poorly determined and some adjustment is needed.
We now implement an algorithm "Oracle Assisted Shrinkage"
[Wei and Zhao :: IEEE Trans. on Sig. Processing (2023) ; Chen Wiesel et al. (2009)]
In cases where we have good coverage and are processing the whole genome the effect
is small, but it is more noticeable in cases where we have rather little data.
For compability with older releases you can code
oracle: NO
I oracle mode is used the parameter diagplus is ignored.
Nick Patterson
<nickp@broadinstitute.org>
------------------------------------------------------------------------------