2.1. Data structure
There are n subjects across G groups in the K genes (positions). Each subject has a gene expression level. We take the linear model Y_{k} = Xβ_{k} + E_{k} in the k^{th} gene into account where Y_{k} = (Y_{1}_{k}, …, Y_{nk})^{t} is the vector of gene expression levels collected from n individuals across G groups in the k^{th} gene and A^{t} denotes the transpose of a matrix A. The design matrix of the n × G matrix is
$$\mathbf{X}=\left[\begin{array}{cccccc}1& 0& 0& 0& 0& \cdots \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ 1& 0& 0& 0& 0& \cdots \\ 1& 1& \cdots & 0& 0& \cdots \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ 1& 1& \cdots & 0& 0& \cdots \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ 1& 0& 0& 0& \cdots & 1\\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ 1& 0& 0& 0& \cdots & 1\end{array}\right].$$The regression parameter of the G × 1 matrix is β_{k} = (μ_{1}_{k}, δ_{2}_{k}, δ_{3}_{k}, …, δ_{Gk})^{t} where μ_{1}_{k} denotes the average gene expression level in the k^{th} gene in the first group and δ_{jk} denotes the difference between μ_{1}_{k} and the average expression level in the k^{th} gene in the j^{th} group, j = 1, 2, …, G. The vector of error of the n × 1 matrix denotes E_{k} = (ε_{1}_{k}, ε_{2}_{k}, …, ε_{nk})^{t}. We estimate a parameter of interest β_{k} for each k = 1, …, K with the following M-estimator.
2.2. M-estimator
Let ρ: ℜ_{G} × X → ℜ be a measurable function. We define an M-estimator M_{n} as a solution by minimizing with respect to t ∈ ℜ_{G}.
$$\sum _{i=1}^{n}\rho \hspace{0.17em}\left({Y}_{i}-{\mathbf{x}}_{i}^{t}\mathbf{t}\right),$$where Y_{i} is the i^{th} observation of Y_{k} and x_{i}(= (x_{i}_{1}, …, x_{iG})^{t}), i = 1, …, n denotes the i^{th} row of X. M_{n} is defined to be regression equivalent if M_{n}(Y + Xb) = M_{n}(Y) + b for b in ℜ_{G}. We define M_{n} to be scale equivalent if M_{n}(cY) = cM_{n}(Y) for c > 0. Generally, the second condition is not satisfied. Fortunately, studentization makes M_{n} regression and scale equivalent. We consider a studentized M-estimator M_{n} of β_{k} as a solution for minimization
$$\sum _{i=1}^{n}\rho \hspace{0.17em}\left(\frac{{Y}_{i}-{\mathbf{x}}_{i}^{t}\mathbf{t}}{{S}_{n}}\right)$$with respect to t (G × 1 matrix) and S_{n} = S_{n}(Y) is a scale statistic. The above linear model refers to the classical ANOVA model. But the distribution Y_{1}_{k}, …, Y_{nk} may not be Gaussian. Researchers may be interested in G groups that may be stochastically ordered. For example, it is applicable to dose-response gene expression microarray data, introduced by Peddada et al. (2003) in cases when gene expression is stochastically increasing over time or dose. We define the null hypothesis H_{0}_{k} as the fact that the G groups in the k^{th} gene are statistically homogeneous. The alternative hypothesis H_{1}_{k} is because G groups in the k^{th} gene are ordered in an increasing level of dominance. H_{0}_{k} and H_{1}_{k} is constructed as below.
$${H}_{0k}:{\delta}_{2k}={\delta}_{3k}=\cdots ={\delta}_{Gk}=0\mathrm{\hspace{0.17em}\u200a\u200a}\mathrm{\hspace{0.17em}\u200a\u200a}\mathrm{\hspace{0.17em}\u200a\u200a}\text{vs\hspace{0.28em}}\mathrm{\hspace{0.17em}\u200a\u200a}\mathrm{\hspace{0.17em}\u200a\u200a}{H}_{1k}:0\le {\delta}_{2k}\le {\delta}_{3k}\le \cdots \le {\delta}_{Gk}.$$These can be restated as the following hypotheses.
$${H}_{0k}:{\mathit{\theta}}_{k}=\mathbf{A}{\mathit{\beta}}_{k}=0\mathrm{\hspace{0.17em}\u200a\u200a}\mathrm{\hspace{0.17em}\u200a\u200a}\mathrm{\hspace{0.17em}\u200a\u200a}{H}_{1k}:{\mathit{\theta}}_{k}=\mathbf{A}{\mathit{\beta}}_{k}\ge 0,$$where the (G − 1) × G matrix
$$\mathbf{A}=\left(\begin{array}{cccccc}0& 0& 0& 0& 0& \cdots \\ 0& 1& 0& 0& 0& \cdots \\ 0& -1& 1& 0& 0& \cdots \\ 0& 0& -1& 1& 0& \cdots \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ 0& 0& 0& \cdots & -1& 1\end{array}\right).$$So as to test the null hypothesis, we tend to take alternatives that the vector θ_{k} belongs to the positive orthant space ℜ^{+(}^{G}^{−1)} into account. As for the univariate case, we have an optimal UMP test. However, we do not have an optimal UMP test in a multivariate case. For instance, the Hotelling T^{2} creates a huge set of confidence intervals and involves loss of efficiency. It is therefore interesting to consider statistical inference under restricted setup. We can utilize UIP (Roy, 1953) for such a statistical inference under the positive orthant multivariate alternative hypothesis. We take the first derivative of ρ function as ψ. M_{n} should be a median-unbiased estimator of β_{k}. Symmetry of F and skew symmetry of ψ, which is defined to be the symmetry of top left with bottom right and top right with bottom left, are necessary conditions for the median-unbiasedness of M_{n}. We select Huber loss function as a good candidate for ψ function. Therefore, minimization makes the estimator regression and scale equivalent. We define the Huber function as follows.
$$\rho (t)=\{\begin{array}{ll}c\mid t\mid -\frac{1}{2}\times {c}^{2},\hfill & \text{if\hspace{0.17em}}\mid t\mid \hspace{0.17em}>c,\hfill \\ \frac{1}{2}\times {t}^{2},\hfill & \text{if\hspace{0.17em}}\mid t\mid \hspace{0.17em}\le c.\hfill \end{array}$$The derivative of the Huber function ψ is defined as follows.
$$\psi (t)=\{\begin{array}{ll}c\times \text{sign}(t),\hfill & \text{if\hspace{0.17em}}\mid t\mid \hspace{0.17em}>c,\hfill \\ t,\hfill & \text{if\hspace{0.17em}}\mid t\mid \hspace{0.17em}\le c.\hfill \end{array}$$ψ function is decomposed into the sum
$$\psi ={\psi}_{a}+{\psi}_{b}+{\psi}_{c}.$$ψ_{a} is absolutely continuous function with absolutely continuous derivative. ψ_{c} should be a continuous and piecewise linear function. ψ_{b} is an increasing step function. We have ψ_{a} = ψ_{b} for the case of Huber loss function. This function satisfies the following conditions (Jurečková and Sen, 1996).
M1: S_{n} should be both regression invariant and scale invariant, S_{n} > 0 a.s. and n^{1/2}(S_{n} − S) = O_{p}(1).
M2: H(t) = ∫ ρ((z − t)/S)dF(z) has the singular minimum at t = 0.
M3: For some δ > 0 and η > 1,
$${\int}_{-\infty}^{\infty}{\left[\mid z\mid \underset{\mid u\mid \le \delta}{\text{sup}}\left|{\psi}_{a}^{\u2033}\left(\frac{{e}^{-v}(z+u)}{S}\right)\right|\right]}^{\eta}dF(z)<\infty $$
and
$${\int}_{-\infty}^{\infty}{\left[{\mid z\mid}^{2}\underset{\mid u\mid \le \delta}{\text{sup}}\left|{\psi}_{a}^{\u2033}\left(\frac{{e}^{-v}(z+u)}{S}\right)\right|\right]}^{\eta}dF(z)<\infty ,$$
where ${\psi}_{a}^{\prime}(z)=(d/dz){\psi}_{a}(z)$ and ${\psi}_{a}^{\u2033}(z)=({d}^{2}/d{z}^{2}){\psi}_{a}(z)$.
M4: ψ_{c}(z) should be a continuous and piecewise linear function with knots at μ_{1}, …, μ_{r}. Henceforth the derivative ${\psi}_{c}^{\prime}(z)$ is a step function
$${\psi}_{c}^{\prime}(z)={\alpha}_{\nu},\mathrm{\hspace{0.17em}\u200a\u200a}\mathrm{\hspace{0.17em}\u200a\u200a}\mathrm{\hspace{0.17em}\u200a\u200a}{\mu}_{\nu}<z<{\mu}_{\nu +1},\mathrm{\hspace{0.17em}\u200a\u200a}\nu =0,1,\dots ,r,$$
where α_{0}, α_{1}, …, α_{r} ∈ ℜ_{1}, α_{0} = α_{r} = 0, and −∞ = μ_{0} < μ_{1} < ··· < μ_{r} < μ_{r}_{+1} < ∞. f(z) is assumed to be bounded in a neighborhood of S_{μ}_{1}, …, S_{μ}_{r}.
M5: ψ_{b}(z) = λ_{ν} for q_{ν} < z ≤ q_{ν}_{+1}, ν = 1, …, m where −∞ = q_{0} < q_{1} < ··· < q_{m}_{+1} = ∞, −∞ < λ_{0} < λ_{1} < ··· < λ_{m} < ∞. f(z) and f′(z) are assumed to be bounded in S_{q}_{1}, …, S_{q}_{m}. We represent M_{n} asymptotically in the following functionals
$$\begin{array}{l}{\gamma}_{1}={S}^{-1}{\int}_{-\infty}^{\infty}\left({\psi}_{a}^{\prime}\left(\frac{z}{S}\right)+{\psi}_{c}^{\prime}\left(\frac{z}{S}\right)\right)dF(z),\\ {\gamma}_{2}={S}^{-1}{\int}_{-\infty}^{\infty}z\left({\psi}_{a}^{\prime}\left(\frac{z}{S}\right)+{\psi}_{c}^{\prime}\left(\frac{z}{S}\right)\right)dF(z).\end{array}$$
Moreover, the following conditions are satisfied.
$$\begin{array}{l}\text{X}1.\hspace{0.17em}{x}_{i1}=1,\mathrm{\hspace{0.17em}\u200a\u200a}i=1,\dots ,n,\\ \text{X}2.\hspace{0.17em}{n}^{-1}\sum _{i=1}^{n}{\Vert {\mathbf{x}}_{i}\Vert}^{4}={O}_{p}(1),\\ \text{X}3.\hspace{0.17em}\underset{n\to \infty}{\text{lim}}\hspace{0.17em}{\mathbf{Q}}_{n}=\mathbf{Q},\end{array}$$
where Q_{n} = n^{−1}X^{t}X and Q is a positive definite p × p matrix.
Under these conditions, M_{n} is a solution of the equation
$$\sum _{i=1}^{n}{\mathbf{x}}_{i}\psi \hspace{0.17em}\left(\frac{{Y}_{i}-{\mathbf{x}}_{i}^{t}\mathbf{t}}{{S}_{n}}\right)=\mathbf{0}.$$We calculate S_{n} follows in order to make S_{n} scale and regression invariant. Regression scores as defined below are used.
For α ∈ (0, 1), â_{n}(α) = (â_{n}_{1}(α), …, â_{nn}(α))^{t} should be the unique solution to maximize
$$\sum _{i=1}^{n}{Y}_{i}{\widehat{a}}_{ni}(\alpha )$$with the constraint
$$\sum _{i=1}^{n}{x}_{ij}{\widehat{a}}_{ni}(\alpha )=(1-\alpha )\sum _{i=1}^{n}{x}_{ij},\mathrm{\hspace{0.17em}\u200a\u200a}\mathrm{\hspace{0.17em}\u200a\u200a}\mathrm{\hspace{0.17em}\u200a\u200a}j=1,\dots ,G.$$Hajék (1965) proposed scores
$${a}_{n}^{*}({R}_{i},\alpha )=\{\begin{array}{ll}0,\hfill & \text{if\hspace{0.28em}}\frac{{R}_{i}}{n}<\alpha ,\hfill \\ {R}_{i}-n\alpha ,\hfill & \text{if\hspace{0.28em}}\frac{{R}_{i}-1}{n}<\alpha <\frac{{R}_{i}}{n},\hfill \\ 1,\hfill & \text{if\hspace{0.28em}}\alpha <\frac{{R}_{i}-1}{n}.\hfill \end{array}$$An increasing and square integrable function φ: (0, 1) → ℜ_{1}, φ(α) = −φ(1 − α), 0 < α < 1 is chosen. For a number α_{0}(0 < α_{0} < 12), φ is assumed to be standardized
$${\int}_{{\alpha}_{0}}^{1-{\alpha}_{0}}{\phi}^{2}(\alpha )d\alpha =1.$$We define regression scores by φ as
$${\widehat{b}}_{ni}=-{\int}_{{\alpha}_{0}}^{1-{\alpha}_{0}}\phi (\alpha )d{\widehat{a}}_{ni}(\alpha ),\mathrm{\hspace{0.17em}\u200a\u200a}\mathrm{\hspace{0.17em}\u200a\u200a}\mathrm{\hspace{0.17em}\u200a\u200a}i=1,\dots ,n.$$That is,
$${\widehat{b}}_{ni}=\{\begin{array}{ll}n{\int}_{{\scriptstyle \frac{({R}_{i}-1)}{n}}}^{{\scriptstyle \frac{{R}_{i}}{n}}}\phi (\alpha ){a}_{n}^{*\prime}({R}_{i},\alpha )d\alpha ,\hfill & \text{if\hspace{0.28em}}\alpha \le \frac{({R}_{i}-1)}{n}\le 1-\alpha ,\hspace{0.17em}\frac{{R}_{i}}{n}\le 1-\alpha ,\hfill \\ n{\int}_{{\scriptstyle \frac{({R}_{i}-1)}{n}}}^{1-\alpha}\phi (\alpha ){a}_{n}^{*\prime}({R}_{i},\alpha )d\alpha ,\hfill & \text{if\hspace{0.28em}}\alpha \le \frac{({R}_{i}-1)}{n}\le 1-\alpha ,\hspace{0.17em}\frac{{R}_{i}}{n}>1-\alpha ,\hfill \\ n{\int}_{\alpha}^{{\scriptstyle \frac{{R}_{i}}{n}}}\phi (\alpha ){a}_{n}^{*\prime}({R}_{i},\alpha )d\alpha ,\hfill & \text{if\hspace{0.28em}}\alpha >\frac{({R}_{i}-1)}{n},\hspace{0.17em}\frac{{R}_{i}}{n}\le 1-\alpha ,\hfill \\ 0,\hfill & \text{if\hspace{0.28em}}1-\alpha <\frac{({R}_{i}-1)}{n},\hfill \\ n{\int}_{\alpha}^{1-\alpha}\phi (\alpha ){a}_{n}^{*\prime}({R}_{i},\alpha )d\alpha ,\hfill & \text{else.}\hfill \end{array}$$We define S_{n} as
$${n}^{-1}\sum _{i=1}^{n}{Y}_{i}{\widehat{b}}_{ni}={n}^{-1}{\mathbf{X}}^{t}{\widehat{\mathbf{b}}}_{n}.$$