Some genetic association tests include an unidentifiable nuisance parameter under the null hypothesis of no association. When the mode of inheritance (MOI) is not specified in a case-control design, the Cochran-Armitage (CA) trend test contains an unidentifiable nuisance parameter. The transmission disequilibrium test (TDT) in a family-based association study that includes the unaffected also contains an unidentifiable nuisance parameter. The hypothesis tests that include an unidentifiable nuisance parameter are typically performed by taking a supremum of the CA tests or TDT over reasonable values of the parameter. The
A genome-wide association study is a powerful method to screen a high-dimensional genome data set and select candidate single nucleotide polymorphisms (SNPs) for genetic associations. Genetic association studies are commonly conducted by a case-control design or family-based design. Table 1 shows a summarization of a case-control data set at a single biallelic SNP. The Cochran-Armitage (CA) linear trend test (Armitage, 1955; Cochran, 1954) is known to be a powerful test in a genetic case-control association study, but it requires the specification of a mode of inheritance (MOI). The MOI can be specified by selecting a weight vector (0,
The CA trend test statistic is a Rao’s score test statistic that can be derived from the logit model of
where
The family-based association study is known to be less powerful than the population-based association study, but the transmission disequilibrium test (TDT) (Spielman
where the nuisance parameter
Davies (1977) proposed the supremum test statistics when a nuisance parameter is present only under the alternative hypothesis. In addition, he presented a general formula for an upper bound of the
Gaussian processes are widely used in many scientific fields (Choi and Lee, 2014; Lee and Park, 2017). A sine-cosine process is the simplest Gaussian wave process written as
where
In this work, we show that the supremum test of the CA linear trend test and the supremum test of the TDT can be written in terms of the supremum of a sine-cosine process, and we provide the exact asymptotic
Suppose that the probability function
The score function is written as
Let
where
and we define the locally most powerful test given
Since
We define two independent standard normal random variable
Using these standard random variables, we can write
We define an angle
Here cos
Let
For a given
The proof of Theorem 1 is given in
In this section, we illustrate two examples. One is a case-control type pharmacogenetics data set of anti-epileptic drug responses. The other example is the case-parent trio dataset of attention deficit hyperactivity disorder (ADHD) illustrated in Lunetta
Two hundred and eighty-eight patients of epilepsy were recruited from multiple epilepsy clinics in Korea and they were genotyped for whole-exomes by the next-generation sequencing experiments. All study participants were eligible if they had drug-resistant (case group) or drug-responsive (control group) epilepsy according to the following definitions and criteria explained in Kim
We performed the CA trend tests for recessive, additive, and dominant genetic models and the supremum test of the CA test for undetermined MOI but 0 ≤
Lunetta
In this paper, we derived simple formulas to calculate the
Table 5 shows the simulation results. We could not see any pattern that one method provides smaller
We also provided the unified sine-cosine process expression of the supremum tests for the CA linear trend without specifying the MOI and the TDT including the unaffected. This work focused on the case in which there is a one-dimensional unidentifiable nuisance parameter in a linear form. Davies (1987, 2002) considered a chi-square process and an
This research was supported by a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute, funded by the Ministry of Health & Welfare, Republic of Korea (Grant No. HI15C1559).
Data structure in a case-control association study
Total | ||||
---|---|---|---|---|
Control | ||||
Case | ||||
Total |
Data structure in a family-based association study
Transmitted | Not transmitted | Total | |
---|---|---|---|
Unaffected | |||
Affected |
The
SNP | 0 ≤ | |||||
---|---|---|---|---|---|---|
Exact | 1,000,000 | 10,000,000 | ||||
rs16964316 | 5.42 × 10^{−6} | 1.00 × 10^{−6} | 2.60 × 10^{−6} | 4.68 × 10^{−6} | 2.17 × 10^{−6} | 9.55 × 10^{−5} |
rs17671352 | 1.34 × 10^{−5} | 7.00 × 10^{−6} | 8.70 × 10^{−6} | 1.42 × 10^{−1} | 1.66 × 10^{−4} | 2.02 × 10^{−6} |
rs16909651 | 6.09 × 10^{−5} | 5.60 × 10^{−5} | 5.92 × 10^{−5} | 3.14 × 10^{−5} | 2.14 × 10^{−5} | 1.61 × 10^{−3} |
rs12417255 | 6.33 × 10^{−5} | 5.50 × 10^{−5} | 6.24 × 10^{−5} | 2.75 × 10^{−3} | 1.96 × 10^{−5} | 2.22 × 10^{−5} |
rs12041477 | 8.84 × 10^{−5} | 7.41 × 10^{−5} | 7.57 × 10^{−5} | 3.25 × 10^{−2} | 7.54 × 10^{−5} | 1.61 × 10^{−5} |
The numbers in the second column are obtained from Equation (2.14). The numbers in the third and fourth columns of 0 ≤
Two-tailed
Allele | Asymptotic | Permutation | |||||
---|---|---|---|---|---|---|---|
DAT-480 | 17 | 10 | 6 | 13 | (0.000, 1.000) | 0.09099 | 0.083 |
(0.050, 0.100) | 0.14124 | 0.146 | |||||
(0.114, 0.161) | 0.11762 | 0.146 | |||||
DRD4-7 | 15 | 6 | 5 | 10 | (0.000, 1.000) | 0.05017 | 0.058 |
(0.050, 0.100) | 0.03970 | 0.040 | |||||
(0.114, 0.161) | 0.03360 | 0.034 |
The results of the simulation study
SNP used for simulated data | 0 ≤ | ||
---|---|---|---|
Exact asymptotic | 1,000,000 | 10,000,000 | |
rs16964316 | 4.39 × 10^{−8} | 0.00 | 1.00 × 10^{−7} |
rs17671352 | 3.12 × 10^{−6} | 2.00 × 10^{−6} | 2.50 × 10^{−6} |
rs16909651 | 5.05 × 10^{−7} | 0.00 | 4.00 × 10^{−7} |
rs12417255 | 9.66 × 10^{−7} | 1.00 × 10^{−6} | 9.00 × 10^{−7} |
rs12041477 | 2.23 × 10^{−6} | 2.00 × 10^{−6} | 2.70 × 10^{−6} |
Each dataset was generated by three copies of the genotypes of one of SNPs in Table 3. The phenotypes were generated by combining two copies of the phenotypes of the real data and a randomized copy of the phenotypes.
SNP = single nucleotide polymorphism.