^{a}Department of Statistics, Keimyung University. Korea
Correspondence to:^{1}Department of Statistics, Keimyung University, 1095 Dalgubeoldaero, Dalseogu, Daegu 42601, Korea.
E-mail: jschoi@kmu.ac.kr
Received January 22, 2018; Revised May 8, 2016; Accepted June 5, 2018.
Abstract
This paper discusses a method for obtaining nonnegative estimates for variance components in a random effects model. A variance component should be positive by definition. Nevertheless, estimates of variance components are sometimes given as negative values, which is not desirable. The proposed method is based on two basic ideas. One is the identification of the orthogonal vector subspaces according to factors and the other is to ascertain the projection in each orthogonal vector subspace. Hence, an observation vector can be denoted by the sum of projections. The method suggested here always produces nonnegative estimates using projections. Hartley’s synthesis is used for the calculation of expected values of quadratic forms. It also discusses how to set up a residual model for each projection.
Keywords : projections, random effects, vector space, quadratic forms, synthesis
1. Introduction
When a linear model contains more than one random component the model is said to be a random effects model or a mixed effects model depending on whether the fixed effects are or not. A random effect is a random variable representing the effect of a randomly chosen level from a population of levels that a random factor can assume. Many papers have tried to solve negative values of variance components of random effects included in a linear model, which is an underlying model for the analysis of data from a certain experiment. Henderson (1953) suggested three methods to estimate variance components that came to be known as Henderson’s method 1, 2, and 3. Searle (1971) dealt with them in detail. There are many papers devoted to the estimation of variance components; however, it is unfortunate that any method discussed in papers can have negative estimates of variances. Milliken and Johnson (1984) suggested a proposal for eliminating negative estimates. EI-Leithy et al. (2016) and Searle et al. (2009) discussed the nonnegative estimation of variance components in mixed linear models. This paper proposes a proper method that always yields nonnegative estimates for variance components. It is based on the concept of projections defined on a vector space. The basic idea of projection and related concepts are well-defined by Johnson and Wichern (1988). Milliken and Johnson (1984) used the following data as an example for estimating variance components. It is again used here to compare estimates between the Henderson’s method 1 and the method proposed in this paper that produces nonnegative estimates for variance components.
2. Data and model
The data in Table 1 are from Milliken and Johnson (1984). R and C denote the random factors of a two-way treatment structure in a completely randomized design structure.
First, consider a more general two-way random components model for analyzing such a data. The model of general form is given by
where y_{ijk} is the response from the k^{th} experimental unit in the treatment combination of i^{th} level of R and j^{th} level of C. μ is the grand mean, α_{i} is the effect of the i^{th} level of the random factor R and is assumed to be distributed i.i.d. $N(0,{\sigma}_{\alpha}^{2})$, and β_{j} is the effect of the j^{th} level of the random factor C, which is assumed to be distributed i.i.d. $N(0,{\sigma}_{\beta}^{2})$. It also assumes (αβ)_{i j} the effect of the interaction R × C and is assumed to be distributed i.i.d. $N(0,{\sigma}_{\alpha \beta}^{2})$, and ϵ_{ijk} is the random error associated with y_{ijk} where it is assumed ϵ_{ijk} are distributed i.i.d. $N(0,{\sigma}_{\u03f5}^{2})$. These assumptions allow the variances and covariances of the observations to be evaluated. The variance of an observation is
where j is an n × 1 vector of ones ($n={\sum}_{i-1}^{a}{\sum}_{j=1}^{b}{n}_{ij}$), X_{1} is an n × a coefficient matrix of α, α is the a × 1 random vector assumed to have $N(\mathbf{0},{\sigma}_{\alpha}^{2}{\mathit{I}}_{a})$, X_{2} is an n × b coefficient matrix of β, β is the b × 1 random vector assumed to have $N(\mathbf{0},{\sigma}_{\beta}^{2}{\mathit{I}}_{b})$, X_{3} is an n × ab coefficient matrix of (αβ), (αβ) is the ab × 1 random vector assumed to have $N(\mathbf{0},{\sigma}_{\alpha \beta}^{2}{\mathit{I}}_{ab})$, and ϵ is the n × 1 random vector assumed to be distributed $N(\mathbf{0},{\sigma}_{\u03f5}^{2}{\mathit{I}}_{n})$. The covariance matrix of the observation vector y denoted by ∑ is
The analysis of the variance method called Henderson’s Method 1 can be used for the estimation of variance components. But the problem is that it sometimes yields negative values with which we do not agree as an estimate of variance. This unpleasant thing can be avoided by way of projections. Let x and y be vectors in an n dimensional vector space. The projection of y on the vector x denoted by p_{x} is
where X^{−} is Moore-Penrose generalized inverse and denotes (X′X)^{−}X′. The definition of it is well defined on Graybill (1983) as terminology g-inverse. If X is of full rank, (X′X)^{−1} is used instead of (X′X)^{−}.
3. Projection method
A linear model is called a random components model when it has random components only except μ. Random components may happen from the treatment structure or design structure (or both) in an experiment. Henderson (1953) suggested Method 1 for the estimation of variance components in a random model. An alternative method that always produces nonnegative values is discussed because it allows impermissible estimates as variances. An observation vector y in a vector space can be decomposed into projections that are defined on orthogonal vector subspaces generated by model matrices from fitting sub-models of model (2.1). This implies that the projection defined on each orthogonal vector subspace has its own nonnegative variation not affected by other projections. Henceforth, there is the possibility of finding nonnegative variance estimates. To find nonnegative estimates of variance components in model (2.1) first consider fitting a full model. In fitting (2.1), we get (2.2). After fitting the whole model we get the residual vector r,
where δ denotes (μ, α, β, (α β))′. From (3.1) we see that the error vector subspace is generated by sub-model matrix (I − XX^{−}). Let the residual model matrix be X_{f}. The residual sum of squares can be obtained by the squared distance of the projection of r onto the error space. The residual random vector r has only one random component ϵ which has the information about ${\sigma}_{\u03f5}^{2}$. The projection of r denoted by p_{e} is given by
XX^{−} is the projection of y into the vector subspace generated by the model matrix X. Second, think of the residual random model for the estimation of ${\sigma}_{\alpha \beta}^{2}$ and ${\sigma}_{\u03f5}^{2}$. The model is
where X_{b} = (j, X_{1}, X_{2}), δ_{b} = (μ, α, β)′, and ${\mathit{\u03f5}}_{b}=(\mathit{I}-{\mathit{X}}_{b}{\mathit{X}}_{b}^{-})\mathit{\u03f5}$. Letting SS_{b} be the sum of squares adjusted for δ_{b}, SS_{b} = r_{b}′r_{b}. E(SS_{b}) will reveal information about the two variance components ${\sigma}_{\alpha \beta}^{2}$ and ${\sigma}_{\u03f5}^{2}$. The projection of r_{b} into the vector space generated by sub-model matrix is revealed as
where ${\mathit{X}}_{\alpha \beta}=(\mathit{I}-{\mathit{X}}_{b}{\mathit{X}}_{b}^{-}){\mathit{X}}_{3}$. The sum of squares due to (αβ) is the squared length of the projection which is expressed by ${\mathit{p}}_{ab}^{\prime}{\mathit{p}}_{ab}$. Then we can see that $S{S}_{b}={\mathit{p}}_{ab}^{\prime}{\mathit{p}}_{ab}+{\mathit{p}}_{e}^{\prime}{\mathit{p}}_{e}$. From (3.3), it can be seen the residual random vector r_{b} has two variance components ${\sigma}_{\alpha \beta}^{2}$ and ${\sigma}_{\u03f5}^{2}$. These are estimated separately from each corresponding vector subspace that is orthogonal to each other. Third, for the estimation of variance component ${\sigma}_{\beta}^{2}$ a model to be fitted is given by
where X_{a} = (j, X_{1}), δ_{a} = (μ, α)′, and ${\mathit{\u03f5}}_{a}=(\mathit{I}-{\mathit{X}}_{a}{\mathit{X}}_{a}^{-})({\mathit{X}}_{3}(\mathit{\alpha}\mathit{\beta})+\mathit{\u03f5})$. It should be noted that r_{a} has three variance components from (3.4), one of which is estimated from the estimation space and the others are from the error space. Hence, it has the information about ${\sigma}_{\beta}^{2},{\sigma}_{\alpha \beta}^{2}$, and ${\sigma}_{\u03f5}^{2}$. Let SS_{a} be r_{a}′r_{a}. The projection denoted by p_{b} is given as
where ${\mathit{X}}_{\beta}=(\mathit{I}-{\mathit{X}}_{a}{\mathit{X}}_{a}^{-}){\mathit{X}}_{2}$. The sum of squares due to β is given by the squared distance of the projection ${\mathit{p}}_{b}^{\prime}{\mathit{p}}_{b}$. So, $S{S}_{a}={\mathit{p}}_{b}^{\prime}{\mathit{p}}_{b}+{\mathit{p}}_{ab}^{\prime}{\mathit{p}}_{ab}+{\mathit{p}}_{e}^{\prime}{\mathit{p}}_{e}$. Lastly, the fitting model for ${\sigma}_{\alpha}^{2}$ is
where X_{j} = j, δ_{j} = μ, and ${\mathit{\u03f5}}_{j}=(\mathit{I}-{\mathit{X}}_{j}{\mathit{X}}_{j}^{-})({\mathit{X}}_{2}\mathit{\beta}+{\mathit{X}}_{3}(\mathit{\alpha}\mathit{\beta})+\mathit{\u03f5})$. Let SS_{j} be ${\mathit{r}}_{j}^{\prime}{\mathit{r}}_{j}$. Since the residual random vector has all four random components, it has the information about ${\sigma}_{\alpha}^{2},{\sigma}_{\beta}^{2},{\sigma}_{\alpha \beta}^{2}$, and ${\sigma}_{\u03f5}^{2}$. From (3.5), ${\sigma}_{\alpha}^{2}$ can be estimated in the estimation space independently of the others. The projection of r_{j} into the space generated by $(\mathit{I}-{\mathit{X}}_{j}{\mathit{X}}_{j}^{-}){\mathit{X}}_{1}$ is
where X_{α} = (I − X_{j}X_{j}^{−})X_{1}. The sum of squares adjusted for α is ${\mathit{p}}_{a}^{\prime}{\mathit{p}}_{a}$. Hence, $S{S}_{j}={\mathit{p}}_{a}^{\prime}{\mathit{p}}_{a}+{\mathit{p}}_{b}^{\prime}{\mathit{p}}_{b}+{\mathit{p}}_{ab}^{\prime}{\mathit{p}}_{ab}+{\mathit{p}}_{e}^{\prime}{\mathit{p}}_{e}$. It should be noted that random residual models are derived in a reverse order using the residual vector. After establishing the model in each step, identify the model matrix associated with the random effects and then derive the corresponding projection at that step using the matrix. Then calculate the sum of squares due to random effects in terms of projection.
4. Expectations of quadratic forms
Variance components are estimated by equating the sums of squares to expected values. The sums of squares are quadratic forms in the observations. Hartleys synthesis (Hartley, 1967) is used for the calculation of the expected values of quadratic forms. The method applies to the calculations of the coefficients of variance components of the expected value of quadratic forms in observations as a variation source. Let Q be a quadratic form in the observation vector y and q = y′Qy where Q is assumed to be a symmetric matrix of constants. Then the expectation of Q is normally given by
The matrix of a quadratic form in y for the analysis of variance can be constructed such that μ^{2}j′Qj = 0. Hence, the expectation of the quadratic form that does not depend on μ is
where Tr(Q) denotes the trace of the square matrix Q. It should be noted that the $\text{Tr}({\mathit{X}}_{i}^{\prime}\mathit{Q}{\mathit{X}}_{i})$ can vary at times because of the dependence of sub-model matrices X_{i}’s which are not orthogonal to each other. For the calculation of the coefficients of variance components orthogonal sub-model matrices are used instead of X_{i}’s. First, think of the coefficients of ${\sigma}_{\u03f5}^{2}$. From the equation of (3.2) the expectation of SS_{e} is
where y is supposed of consisting of orthogonal components vectors. First, the coefficient of ${\sigma}_{\u03f5}^{2}$, c_{e}_{1} is obtained by using X_{f}, which is the residual model matrix of (3.1). That is,
where X_{β} denotes the model matrix of (3.4). It shows X_{β} and I−XX^{−} are orthogonal matrices. Fourth, the coefficient of ${\sigma}_{\alpha}^{2}$, c_{e}_{4}, is given as
where X_{α} is the model matrix of (3.5). This expression also shows X_{α} and I − XX^{−} are orthogonal matrices. All matrices associated with the variance components ${\sigma}_{\alpha \beta}^{2},{\sigma}_{\beta}^{2}$, and ${\sigma}_{\alpha}^{2}$ are orthogonal to the coefficient matrix of the random error vector ϵ. Hence, (4.1) is reduced to
Next, we compute E(SS_{b}) having two variance components because the random residual vector r_{b} has random effects related to ${\sigma}_{\alpha \beta}^{2}$ and ${\sigma}_{\u03f5}^{2}$. E(SS_{b}) is
Second, the coefficient of ${\sigma}_{\alpha \beta}^{2}$, c_{b}_{2} is obtained using the model matrix $(\mathit{I}-{\mathit{X}}_{b}{\mathit{X}}_{b}^{-}){\mathit{X}}_{3}$ of (3.3) denoted by X_{αβ}. This turns out to be
Third, the coefficient of ${\sigma}_{\beta}^{2}$, c_{b}_{3} is obtained using the model matrix $(\mathit{I}-{\mathit{X}}_{b}{\mathit{X}}_{b}^{-}){\mathit{X}}_{2}$ of (3.4) denoted by X_{β}. That is,
where two matrices X_{β} and $\mathit{I}-{\mathit{X}}_{b}{\mathit{X}}_{b}^{-}$ are orthogonal. Lastly, the coefficient of ${\sigma}_{\alpha}^{2}$, c_{b}_{4} is obtained using the model matrix $(\mathit{I}-{\mathit{X}}_{j}{\mathit{X}}_{j}^{-}){\mathit{X}}_{1}$ of (3.5) denoted by X_{α}. That is,
Since r_{a} defined on (3.4) has three random effects, E(SS_{a}) has the information about the coefficients of ${\sigma}_{\beta}^{2}{\sigma}_{\alpha \beta}^{2}$, and ${\sigma}_{\u03f5}^{2}$. E(SS_{a}) is
where two matrices X_{β} and $\mathit{I}-{\mathit{X}}_{b}{\mathit{X}}_{b}^{-}$ are orthogonal. Lastly, the coefficient of ${\sigma}_{\alpha}^{2}$, c_{a}_{4} is obtained using the model matrix of (3.5) denoted by X_{α}. That is,
The coefficients of variance components of (4.2) can be obtained similarly as before with the model matrices X_{f}, X_{αβ}, X_{β}, and X_{α}. With the X_{f}, the coefficient of ${\sigma}_{\u03f5}^{2}$, c_{j}_{1} is given as
The solutions from a system of linear equations in (${\sigma}_{i}^{2}$)’s (4.3) are given as nonnegative estimates of variance components for the assumed model (2.1).
5. Example
In this section the data presented in Section 2 is used as an example of getting nonnegative estimates for four variance components. From the model fitting procedure discussed previously we get SS_{e} = 30.66667, SS_{b} = 139.81176, SS_{a} = 154.57143, and SS_{j} = 155.2143. The expectations of these are given as
Nonnegative estimates are obtained by the projection method. These values are compared with the Type I estimates from Milliken and Johnson (1984) which are
This paper suggests a method to estimate variance components in a two-way random model. The method is based on the concept of an orthogonal projection of an observation vector in n-dimensional space onto the vector subspace of itself. Orthogonal projections are used for the estimation of variance components associated with the random effects of the assumed model. There are many methods that are available in the literature; however, most still have the problem of negative estimates of variance components when it happens. To avoid this, it is necessary to build proper models to obtain right projections onto vector subspaces that are orthogonal to each other. Therefore, it is important to identify the residual random vector in each step and then construct a residual random model for it.
The corresponding orthogonal projection is obtained by projecting the residual random vector onto the vector subspace spanned by the residual model matrix of the residual random model. The observation vector can be denoted as the sum of projections since projections are defined on their own vector subspaces that are orthogonal.
Hartley’s synthesis was used for the calculations of the expectations of sums of squares expressed as sums of squared distances of projections that are orthogonal. Each projection is defined on its own vector subspace orthogonal to others; therefore, the expectation on the sums of squares for the residual vector defined on each stage has nonzero coefficients for variance components only if it has related random effects.
Variance as a measure of variation of data can never be a negative value from its definition; Nevertheless, we accredit the negative estimate as evidence that the true value of the component is zero as is often. However, the projection method shows that the estimates of variance components are always nonnegative.
Acknowledgements
This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (Grant No.2018R1D1A1B07043021).
TABLES
Table 1
Data for two-way design
RC11
RC12
RC13
RC21
RC22
RC23
10
13
21
16
13
11
12
15
19
18
19
13
11
14
References
EI-Leithy AH, Abdel-Wahed AZ, and Abdallah SM (2016). On non-negative estimation of variance components in mixed linear models. Journal of Advanced Research, 7, 59-68.
Graybill FA (1983). Matrices with Applications in Statistics, California, Wadsworth.
Hartley HO (1967). Expectations, variances and covariances of ANOVA mean squares by “synthesis”. Biometrics, 23, 105-114.
Henderson CR (1953). Estimation of variance and covariance components. Biometrics, 9, 226-252.
Johnson RA and Wichern DW (1988). Applied Multivariate Statistical Analysis, New Jersey, Prentice-Hall.
Milliken GA and Johnson DE (1984). Analysis of Messy Data, New York, Van Nostrand Reinhold.
Searle SR (1971). Linear Models, New York, John Wiley & Sons.
Searle SR, Casella G, and McCulloch CE (2009). Variance Components, New York, John Wiley & Sons.