TEXT SIZE

• •   CrossRef (0) Restricted maximum likelihood estimation of a censored random effects panel regression model  Minah Leea, Seung-Chun Lee1,b

aData Analysis Team, Samsung SDS, Korea;
bDepartment of Applied Statistics, Hanshin University, Korea
Correspondence to: 1Department of Applied Statistics, Hanshin University, 137 Hanshindae-gil, Osan, Gyeonggi-Do
18101, Korea. E-mail: seung@hs.ac.kr
Received January 2, 2019; Revised February 18, 2019; Accepted February 18, 2019.
Abstract

Panel data sets have been developed in various areas, and many recent studies have analyzed panel, or longitudinal data sets. Maximum likelihood (ML) may be the most common statistical method for analyzing panel data models; however, the inference based on the ML estimate will have an inflated Type I error because the ML method tends to give a downwardly biased estimate of variance components when the sample size is small. The under estimation could be severe when data is incomplete. This paper proposes the restricted maximum likelihood (REML) method for a random effects panel data model with a censored dependent variable. Note that the likelihood function of the model is complex in that it includes a multidimensional integral. Many authors proposed to use integral approximation methods for the computation of likelihood function; however, it is well known that integral approximation methods are inadequate for high dimensional integrals in practice. This paper introduces to use the moments of truncated multivariate normal random vector for the calculation of multidimensional integral. In addition, a proper asymptotic standard error of REML estimate is given.

Keywords : restricted maximum likelihood, censored dependent variable, panel regression
1. Introduction

The panel regression model with individual specific effects has the following specification:

$wit=xit′β+μi+ϵit, i=1,2,…n; t=1,2,…,Ti,$

where wit is the dependent variable, xit is the p×1 vector of predictors, β is the p×1 vector of regression coefficients, μi is the time-invariant individual specific effect, and ϵit is the remaining disturbance term. Here, subscripts i and t represent the individual and the time period, respectively. When μi is assumed to be constant over time, the model is referred to as the “fixed effects” model, which is known to have an incidental parameter problem (Lancaster, 2000), while the “random effects” model treats μi as a random variable. As McCulloch (1996) stated, a frequentists decision to regard an effect as fixed or random is a complicated one, but we will assume that the individual specific effect μi’s are random and independent of the regressors xit.

The standard assumption of the error term, viz. ϵit’s are independent and identically distributed normal random variable. This implies that the dependent variable can be any real number; however, in many statistical analyses the dependent variable can only be observable on limited ranges. For example, a dependent variable is constrained to be positive, as in the case of wage or hours worked. Unemployed people are left-censored at zero since the wage or working hours would be zero. To incorporate the unemployed in the model, the dependent variable is commonly treated as an unobservable latent variable, which leads to the standard Tobit model (Tobin, 1958).

A generalization of the standard Tobit model is that the dependent variable can be censored in either left, right or both directions of a Type I Tobit model, where the observed values of the dependent variable are defined as

$yi={ℓ,if wi≤ℓ,wi,if ℓ

where ℓ and u are known lower and upper censored points. Thus, the standard Tobit model is a special case of Type I Tobit model with ℓ = 0 and u = ∞.

Such a censored dependent variable commonly occurs in survival analysis, biomedical and epidemiological studies. The usual estimation method fails to provide consistent estimates for a conventional regression model. This leads to discussing the estimation methods in the censored regression model (Tobin, 1958; Maddala, 1983; and Amemiya, 1984). Currently, the dominating method may be the maximum likelihood (ML), which is implemented in most statistical packages dealing with a censored regression model. For example, R packages such as AER (Kleiber and Zeileis, 2009), and NADA (Lee, 2017) give ML estimates for the usual linear regression model. See also “qlim” procedure in SAS (2011). In particular, censReg (Henningsen, 2017) and “xttobit” of Stata (2017) provide ML estimates for the random effects panel regression model.

The ML estimator in nonlinear panel data model with fixed effects is widely understood to be biased and inconsistent when the length of panel T is small; however, Green (2004) found in simulation studies, that the finite sample bias of the ML method appears in the disturbance variance rather than in the slope parameters. Since, it is known that when the sample size is small, the ML estimate of disturbance variance is biased downward, and inferences on the regression coefficients will have an inflated Type I error rate because their precision is overstated. It is desirable to consider other estimation methods when the sample size is small. We believe that Bayesian estimation could be an alternative (Lee, 2016); however, a natural frequentist substitute may be the restricted maximum likelihood (REML) method (Patterson and Thomson, 1971).

Note that REML estimates variance components on the basis of residuals resulting after eliminating the fixed effects contained in a model. This makes REML divide the mean squared deviation by degrees of freedom instead of by sample size, which can remedy the downward bias of ML. It also has a Bayesian justification. Today, REML is widely used for the estimation of variance components in various mixed effects models with complete data, but surprisingly it is not well established for limited dependent variable models such as the binary or the censored dependent variable model. Even the definition of REML procedure is unclear. For example, Lee and Nelder (2001) regarded the REML as an adjusted profile likelihood method, but Drum and McCullagh (1993) considered it as an unbiased estimation equation method. See, Noh and Lee (2007) for further details.

The difficulty of an ML based approach for the limited dependent variable model lies mainly in computational problem rather than theoretical aspect. For instance, Hughes (1999) provided ML and REML estimates in a general mixed-effects linear model with censored data using a Monte Carlo EM algorithm and claimed that the approach can be used with an arbitrarily complex design matrix; however, such a EM based method lacks the capability of providing standard errors of variance components. He gave asymptotic standard errors for only fixed effects relying on the maximum likelihood theory. Since the asymptotic standard error does not take account of the estimation of variance components, the approximation may not be theoretically appealing. Like Hughes (1999), most works dealing with REML focus on the parameter estimation itself and did not mention the standard error of REML estimates. It is believed that the calculation of the asymptotic standard error of REML estimate is another challenging work.

The problem lies in the likelihood based methods with a limited dependent variable is the computation likelihood function. It is difficult to maximize the function directly because the likelihood function contains multidimensional integrals. Many authors proposed to use an integral approximation method for the computation of a likelihood function, and then maximize the approximated likelihood function. Various integral approximation methods such as Newton and Gauss-Hermite quadratures, Monte Carlo integration and Markov Chain Monte Carlo have been employed for this purpose. The approximation method enables the likelihood based estimation in the limited dependent variable model; however, it is well known that such integral approximation methods are inadequate for high dimensional integrals. It may be difficult to get a good approximation when the number of censored observation is large under the censored random effects panel model. Indeed, it is observed that statistical packages using different methods give quite different results. See Zhang et al. (2011) for further details.

The main object of this paper is to present a REML procedure for a censored dependent variable model. Many authors have considered REML in binary dependent variable models, but we can only find limited literature on REML estimation with censored data. The censored dependent variable model resembles the normal-probit model for binary data; however, the methodology used in the normal-probit model cannot be directly applicable to the model considered here in that we do not use an approximation method. It is also demonstrated through a simulation study that when the sample size is small, REML is a proper method in the sense that inferences based on it have the Type I error rate close to a nominal level.

2. Restricted maximum likelihood method

### 2.1. Estimation

Let $μ~N(0,σμ2I)$ and $ϵ~N(0,σμ2I)$ where μ and ϵ are independent random vectors of μi’s and ϵit’s, respectively, and $N=∑i=1nTi$. Writing (1.1) as w = Xβ + Zμ + ϵ, where X is the N × p matrix of regressors and Z is the N × n incident matrix for μi’s, the likelihood can be written as

$L(β,σμ2,σϵ2;y)=∫ℛf(w)du,$

where f (w) is the probability density function of a multivariate normal random vector with mean Xβ, variance-covariance matrix $V=ZZ′σμ2+Iσϵ2$, ℛ = {w : y(w) = y} is the set of latent variables given the observed data y, and u is the Lebesgue measure.

Note that when data is complete, i.e., no censored observations, the REML equations for variance components are shown to be

$w′P∂V∂σi2Pw=tr(P∂V∂σi2), i=μ,ϵ,$

where P = V−1V−1X(XV−1X)XV−1 (Searle et al., 2006, p.251). For a censored data, we take conditional expectation on the REML equations, and hence estimates are obtained by solving the following equations.

$Sσμ2=E(w′PZZ′Pw∣y)-tr(PZZ′)=0,$$Sσϵ2=E(w′PPw∣y)-tr(P)=0,$

where tr(A) denotes the trace of a square matrix A. As we noted before, REML is not uniquely defined when data is incomplete. For instance, McCulloch (1994) gave an EM algorithm for REML estimation in a probit-normal model by treating the fixed effects as random effects whose variance tend to infinity. The EM algorithm solve the above estimation equations; therefore, our definition of REML coincides with the approach of McCulloch (1994). See also, Hughes (1999).

In general, the REML estimation includes no procedure to estimate fixed effects. The fixed-effects are estimated with the estimated random components in a complete data case; however, the conditional expectations are determined by the variance components as well as by the slope parameter β. The random components and fixed effects should therefore be estimated simultaneously. As Searle et al. (2006) suggested, the log-likelihood equation for the ML of β will be used.

$Sβ=∂∂βlog∫ℛf(w)du=∫ℛX′V-1(w-Xβ)f(w)du∫ℛf(w)du=X′V-1(E(w∣y)-Xβ).$

To solve the equations, it needs to compute the conditional expectations related to the moments of a multivariate truncated normal random vector. Thus, the calculation of the moments is essential and is the main problem of ML based approaches. Many researchers employed the Gibbs sampler or the Gauss-Hermit numerical integration method to approximate the moments and then used the EM or variations of EM algorithms. However, such numerical methods are generally not recommended for high dimensional integrals. Since, the dimensionality is increasing with the number of censored observations, it could not give proper approximation when the number of censored observations is large. An EM based method also lacks the capability of providing the standard error of REML estimate. To compute the standard error, it requires to compute up to the 4th order moments, which may be hard to approximate by numerical methods.

There is a long history of the moment calculation for a multivariate truncated normal random vector. Using the moment generating function or recurrence relationships, many moment calculation methods have been proposed under various conditions, see Arismendi (2013). Among them, Kan and Robotti (2017) gave a method meet our demand. In what follows, we assume safely that necessary moments of truncated variables could be obtainable. It would be worth mentioning that their algorithm requires to compute 5m conditional expectations where m is the number of censored observations that may be huge in some applications; however, once we notice that in the panel regression model, observations from different individuals are independent, the computational burden could be reduced greatly.

To compute the conditional expectations in (2.2), (2.3), and (2.4), we assume that the last m of y are censored observations. The vectors of uncensored and censored observations will be denoted by $y1=(y1i,...,y1N–m)′$ and $y2=(y21,...,y2m)′$, respectively. Likewise the vector of latent variables and the matrix of regressors are partitioned as $w′=(w1′,w2′)$ and $X′=(X1′,X2′)$. Then, w2|w1 = y1 is a multivariate normal random vector with mean $μw2∣y1=X2β+V21V11-1(y1-X1β)$ and variance-covariance matrix $Vw2∣y1=V22-V21V11-1V12$ where Vi j, i, j = 1, 2 are the partitioned matrices of V according to w1 and w1. This shows that w2|y is a multivariate truncated normal random vector. To be precise, notations related to a multivariate truncated normal distribution are defined as follows.

Suppose and R = {(z1, . . . , zn) : aizibi, i = 1, . . . , n}, then the distribution of z|zR is a multivariate truncated normal on R, and it will be represented by $z|z∈R~TN(a,b)(µ,V)$, where $a={ai}i=1n$ and $b={bi}i=1n$.

Note that each element of y2 is either ℓ or u. For each i = 1, . . . ,m, y2i = ℓ indicates that w2i is left-truncated, and then define $ai*=-∞$ and $bi*=ℓ$. Similarly, if y2i = u, let $ai*=u$ and $bi*=∞$. Then, we have $w2|y~TN(a*,b*)(µw2|y1,Vw2|y1)$.

The conditional expectation of a quadratic form of w is equal to

$E(w′Aw∣y)=y1′A11y1+2y1′A12E(w2∣y)+E(w22′A22w2∣y).$

Here, a symmetric matrix A is partitioned in an obvious manner. Thus, it needs to compute up to the second order moments of a multivariate truncated normal random variable to evaluate the conditional expectation. In fact, we need to compute every quantities $E(w2i1j1w2i2j2w2i3j3w2i4j4∣y)$ where ik ∈ (1, . . . ,m), k = 1, . . . , 4 and jk’s are nonnegative integers satisfying $∑k=14ik≤4$. Then, the conditional expectation of a quadratic form of the latent variables shown in (2.2) or (2.3) can be calculated by (2.5) and $E(w2′A22w2∣y)=tr(A22E(w2w2′∣y))$.

The Newton-Raphson method is applicable to solve estimation equations that require the derivative of the equations to form a Jacobian matrix. One may consider a numerical derivative method, which may reduce some computational burden, and hence may speed up getting the Jacobian matrix. However, we found that a numerical method is quite unstable. Besides, the Jacobian can be obtained analytically.

Using a well-known ML theory, the partial derivatives of Sβ with respect to β′ and $σi2$, i = μ, ϵ are shown to be

$∂Sβ∂β′=-X′V-1X+X′V-1Vw∣yV-1X$

and

$∂∂σi2Sβ=-X′V-1∂V∂σi2V-1[E(w∣y)-Xβ]-X′V-1Vw∣yV-1∂V∂σi2V-1Xβ+12X′V-1Cov(w,w′V-1∂V∂σi2V-1w|y),$

where Vw|y is the conditional variance of W given observed y, $∂V/∂σμ2=ZZ′$ and $∂V/∂σϵ2=I$. For the differentiation of $Sσμ2$ and $Sσϵ2$, the following theorem is applicable.

Theorem 1

LetAbe a symmetric matrix which depends on $σμ2$and $σϵ2$, but notβ, then for i = μ or ϵ, we have

$∂∂β′E(w′Aw∣y)=Cov(w′Aw,w∣y)V-1X,$

and

$∂∂σi2E(w′Aw∣y)=E(w′∂A∂σi2w|y)+12Cov(w′Aw,w′V-1∂V∂σi2V-1w∣y)-Cov(w′Aw,w∣y)V-1∂V∂σi2V-1Xβ.$
Proof

The conditional expectation can be written as

$E(w′Aw∣y)=∫ℛw′Awf(w)du/∫ℛf(w)du,$

and (/β′) f (w) = (wXβ)′V−1Xf (w), we have

$∂E(w′Aw∣y)∂β′=∫ℛw′Aw∂∂β′f(w)du∫ℛf(w)du-∫ℛw′Awf(w)du∫ℛ∂∂β′f(w)du[∫ℛf(w)du]2=[E(w′Aw(w-Xβ)′∣y)-E(w′Aw∣y)E((w-Xβ)′∣y)]V-1X=Cov(w′Aw,w∣y)V-1X.$

Likewise, $(∂/∂σi2)f(w)=(1/2)[(w-Xβ)′V-1(∂V/∂σi2)V-1(w-Xβ)-tr(V-1∂V/∂σi2)]f(w)$ gives

$∂E(w′Aw∣y)∂σi2=∫ℛ∂∂σi2{(w′Awf(w)}du∫ℛf(w)du-∫ℛw′Awf(w)du∫ℛ∂∂σi2f(w)du[∫ℛf(w)du]2=12E[w′Aw{(w-Xβ)′V-1∂V∂σi2V-1(w-Xβ)-tr(V-1∂V∂σi2)}|y]-12E(w′Aw∣y){E((w-Xβ)′V-1∂V∂σi2V-1(w-Xβ)|y)-tr(V-1∂V∂σi2)}+E(w′∂A∂σi2w|y)=E(w′∂A∂σi2w|y)+12Cov{w′Aw,(w-Xβ)′V-1∂V∂σi2V-1(w-Xβ)|y}=E(w′∂A∂σi2w|y)+12Cov(w′Aw,w′V-1∂V∂σi2V-1w∣y)-Cov(w′Aw,w∣y)V-1∂V∂σi2V-1Xβ.$

Theorem 1 is quite standard and is useful not only for REML but also for ML estimation. Perhaps it is well known, but we could not find a statement of the second order derivatives when data is incomplete. We include the proof for completeness. The second order derivatives gives an advantage of our definition of REML as the solution of (2.2), (2.3), and (2.4) in that the asymptotic standard error can be expressed as analytic forms rather than numerical forms.

Once we note PPZZP is a symmetric matrix, PPZZP = PZZPP, and $(∂/∂σi2)P=-P(∂V/∂σi2)P$, then Theorem 1 gives following partial derivatives

$∂Sσμ2∂β′=Cov(w′PZZ′Pw,w∣y)V-1X,∂Sσμ2∂σμ2=-2E(w′PZZ′PZZ′Pw∣y)+12Cov(w′PZZ′Pw,w′V-1ZZ′V-1w∣y)-Cov(w′PZZ′Pw,w∣y) V-1ZZ′V-1Xβ+tr(PZZ′PZZ′),∂Sσμ2∂σϵ2=-2E(w′PPZZ′Pw∣y)+12Cov(w′PZZ′Pw,w′V-1V-1w∣y)-Cov(w′PZZ′Pw,w∣y) V-1V-1Xβ+tr(PZZ′P),∂Sσϵ2∂β′=Cov(w′PPw,w∣y)V-1X,∂Sσϵ2∂σμ2=-2E(w′PPZZ′Pw∣y)+12Cov(w′PPw,w′V-1ZZ′V-1w∣y)-Cov(w′P′Pw,w∣y) V-1ZZ′V-1Xβ+tr(PZZ′P),∂Sσϵ2∂σϵ2=-2E(w′PPPw∣y)+12Cov(w′PPw,w′V-1V-1w∣y)-Cov(w′PPw,w∣y) V-1V-1Xβ+tr(PP).$

Let S(θ) be the vector of Sθ, $Sσμ2$, and $Sσϵ2$ where $θ=(β′,σμ2,σϵ2)′$, then the partial derivatives and S(θ) consist of conditional moments of the forms, E(w|y), Var(w|y), E(wAw|y), Cov(w,wAw|y), and Cov(wAw,wBw|y) where A and B are some symmetric matrices. Let us consider the calculation of these quantities. The first two are

$E(w∣y)=(y1E(w2∣y)) and Vw∣y=Var(w∣y)=(000Vw2∣y),$

where $Vw2∣y=E(w2w2′∣y)-E(w2∣y)E(w2′∣y)$, and the third type has been seen before. Also, one may know that the first Nm elements of N × 1 vector Cov(w,wAw|y) are zero. Indeed, and it can be shown

$Cov(w,w′Aw∣y)′=(0′,[Cov(w2,w2′A22w2∣y)+2Vw2∣yA21y1]′).$

Since, the ith element of $Cov(w2,w2′A22w2∣y)$ is equal to

$Cov(w2i,w2′A22w2∣y)=E(w2i,w2′A22w2∣y)-E(w2i∣y)E(w2′A22w2∣y),$

and for each i = 1, 2, . . . ,m,

$E(w2iw2′A22w2∣y)=∑j=1m∑k=1maik22E(w2iw2jw2k∣y)=aii22E(w2i3∣y)+2∑j≠imaij22E(w2i2w2j∣y)+∑j≠imajj22E(w2iw2j2∣y)+2∑j

where $aij22$ denotes the (i, j)th element of A22, Cov(w,wAw|y) is computable with up to the 3rd order conditional moments of a multivariate truncated normal random vector. Finally, the conditional covariance of two quadratic forms of latent variables is shown to be

$Cov(w′Aw,w′Bw∣y)=4y1′A12Vw2∣yB12′y1+2y1′A12Cov(w2,w2′B22w2∣y)+2Cov(w2′A22w2,w2∣y)B12′y1+Cov(w2′A22w2,w2′B22w2∣y),$

and the last term of above equation is equal to $E(w2′A22w2w2′B22w2∣y)-E(w2′A22w2∣y)E(w2′B22w2∣y)$. Since

$E(w2′A22w2w2′B22w2∣y)=tr[B22E(w2w2′A22w2w2′∣y)],$

the conditional covariance can be calculated if $E(w2w2′A22w2w2′∣y)$ is given. Let ei j be the (i, j)th element of $E(w2w2′A22w2w2′∣y)$, then we have

$eii=aii22E(w2i4∣y)+2∑k≠imaik22E(w2i3w2k∣y)+∑k≠imakk22E(w2i2w2k2∣y)+2∑k<ℓ,m∑k,ℓ≠imakℓ22E(w2i2w2kw2ℓ∣y),$

for i = 1, . . . ,m and

$eij=aii22E(w2i3w2j∣y)+ajj22E(w2iw2j3∣y)+2ajj22E(w2iw2j2∣y)+2∑k≠(i,j)maik22E(w2i2w2jw2k∣y)+2∑k≠(i,j)majk22E(w2iw2j2w2k∣y)+∑k≠(i,j)makk22E(w2iw2jw2k2∣y)+2∑k<ℓm∑k,ℓ≠i,jmakℓ22E(w2iw2jw2kw2ℓ∣y),$

for i, j = 1, . . . ,m; ij.

Thus, the nonlinear simultaneous equations, S(θ) = 0 and the Jacobian consisting of

$J(θ)=(∂Sβ∂β′,∂Sβ∂σμ2,∂Sβ∂σϵ2∂Sσμ2∂β′,∂Sσμ2∂σμ2,∂Sσμ2∂σϵ2∂Sσϵ2∂β′,∂Sσϵ2∂σμ2,∂Sσϵ2∂σϵ2).$

are computable with conditional moments $E(w2i1j1w2i2j2w2i3j3w2i4j4∣y),∑k=14jk≤4$, which are assumed to be given by the algorithm of Kan and Robotti (2017).

The REML estimate of θ can be found by applying the Newton-Raphson method,

$θ^(i+1)=θ^(i)-J-1(θ^(i))S(θ^(i))$

with an arbitrary initial value $θ^(0)$. If convergence is reached at the t-step, we set $θ^(t+1)$ as the REML estimate. For the initial value, it may use the estimate of θ based only on uncensored observations. We have used a R package, plm of Croissant and Millo (2008) for this purpose. With this initial value, we could get the convergence in almost all cases among 1,000 replications shown in Section 4.

2.2. Standard error

When data is incomplete, it seems that the standard error of REML is more complex than ML, because Louis (1982) provided the incomplete data Hessian in terms of the conditional expectations of the complete data Hessian, which makes it possible to work with incomplete data. Theoretically there is no difficulty in ML estimation since the Hessian is directly related to the variance of the ML estimate. Many alternatives to Louis’s method have also been proposed (Meilijson, 1989; Meng and Rubin, 1991; Duan and Fulop, 2011). The problem exists in the REML estimation is that, unlike the ML case, the Hessian does not give a variance. However, it is hard to find literature mentioning the standard error despite most researchers mentioning REML estimation with incomplete data. Regardless, a slight variant of the maximum likelihood theory can give an asymptotic variance of REML estimate.

### Theorem 2

Let$θ^$ be a solution to S(θ) = 0, then an asymptotic variance of$θ^$ is given by

$Var(θ^)≈J-1(θ^) [Var (S(θ))] [J-1(θ^)]′,$

whereJ(θ) is the Jacobian of S(θ).

Proof

The Taylor series expansion gives $S(θ)≈S(θ^)+J(θ^)(θ^–θ)$, but $S(θ^)=0$, we have $J–1(θ^)S(θ)≈θ^–θ$.

In the ML estimation, the Hessian is equal to the Jacobian, and the variance of score is obtained by the expected value of minus the Hessian, $[(∂2logL)/(∂θ∂θ′)]$ called the Fisher information or the expected information. Thus, if we regard S(θ) as the score of ML, Var(S(θ)) plays the role of the expected information. Note that Efron and Hinkley (1978) stated that in most cases the observed information, which is equal to minus the Hessian, is a more appropriate measure of information than the expected information, and an asymptotic variance of ML estimate is usually computed using the observed information rather than expected information. Thus, it may be desirable to replace Var(S(θ)) by some observed information like quantity.

To use Theorem 2, Var(S(θ)) should be given. Since, $∂Sβ/∂β′=(∂2logL)/(∂β∂β′)$, using the maximum likelihood theory, we have

$Var(Sβ)=-E[∂2 log L∂β∂β′]=X′V-1X-X′V-1Var(w∣y)V-1X,$

but, $∂Sσi2/∂σi2$ does not give the variance of $Sσi2$ for i= μ, ϵ. To compute $Var(Sσi2)$, we can use a relationship $Var(E(X|Y))Var(X)–E(Var(X|Y))$. That is, the variance of conditional expectation of a quadratic form can be obtained by

$Var[E(w′Aw∣y)]=Var(w′Aw)-E[Var(w′Aw∣y)]=2tr(AVAV)+4β′X′AVAXβ-E[Var(2y1′A12w2+w2′Aw2∣y)].$

As stated before, the variance given in (2.7) plays the role of the expected information, but in view of Efron and Hinkley (1978), it would be better to use the non-expected value of (2.7). Thus, we replace the expectation part in (2.7) by

$Var(2y1′A12w2+w2′Aw2∣y)=4y1′A12Vw2∣yA12′y1+4y1A12Cov(w2,w2′Aw2∣y)+Var(w2′Aw2∣y).$

Since, $Var(w2′Aw2∣y)=Cov(w2′Aw2,w2′Aw2∣y)$, we can use previous results for computing the conditional variance. Similarly, for the covariance of S&beta; and $Sσi2$

$Cov[X′V-1(E(w∣y)-Xβ),E(w′Aw∣y)]=X′V-1Cov(w,w′Aw)-X′V-1E[Cov(w,w′Aw∣y)]=2X′AXβ-X′V-1E[Cov(w,w′Aw∣y)]$

is applicable. Finally, the covariance of $Sσμ2$ and $Sσϵ2$ is equal to

$Cov(Sσμ2,Sσϵ2)=Cov(w′PZZ′Pw,w′PPw)-Cov(w′PZZ′Pw,w′PPw∣y)=2tr(PZZ′PVPPV)+4β′X′PZZ′PVPPXβ-E[Cov(w′PZZ′Pw,w′PPw∣y)].$

The expected values in (2.8) and (2.9) will be replaced by non-expected values, $Cov(w,wAw|y)$ and $Cov(w′PZZ′Pw,w′PPw|y)$, respectively for calculating asymptotic standard errors.

3. Examples

In this section, we demonstrate two examples to reflect some of main features of REML and ML estimates. In addition, we wish to talk about some computational issues.

### Example 1

The first data “EmpUK” presented by Arellano and Bond (1991) is an unbalanced panel of 140 observations from 1976 to 1984. It consists of 1031 observations on 7 variables. The dependent variable “emp” has a heavily right-skewed distribution, and some observations are quite large compared with most observations. The values of “emp” larger than 30 are treated the same in this example. That is, we assume that the dependent variable is right-censored at 30, then 57 out of 1031 are treated as censored observations. Because of the assumption, the estimation itself may not be meaningful, but it is believed that the data is suitable to show a large sample property of REML and ML estimates.

Table 1 shows the estimates of REML and ML. All the computations are done under R (R Core Team, 2017) with RcppArmadillo (Eddelbuettel and Sanderson, 2014) and Rcpp (Eddelbuettel et al., 2018). The two estimation methods do not make differences to the estimate of variance components as well as slope parameters. The ML estimate of random component is only slightly smaller and has the tendency to have a slightly smaller standard error than REML, but it is hard to find some statistical meaning. Because the downward bias is a small sample property, this result can be predictable. Since, ML is theoretically preferable method, it seems that the REML has little or no theoretical support.

Both censReg (Henningsen, 2017) and xttobit of Stata (2017) have a capability of ML estimation for a censored random-effects panel regression model. We first used censReg to get the ML estimates for Table 1, but found that it is quite unstable and failed to give estimates. It mainly uses numerical methods, the Gauss-Hermite quadrature (GHQ) for approximating the log-likelihood, and a numerical differential method, but GHQ is generally not recommended for high dimensional integrals. Since, the dimensionality is increasing with the number of censored observations, it could not give a proper approximation when the number of censored observations is large. Generally, increasing the number of quadratic points, nGHQ increases the accuracy of the computation, but increasing nGHQ was not helpful for the ML estimation of “EmpUK” data. Even when nGHQ is sufficiently large, small changes in nGHQ produced quite different estimates.

We did not examine xttobit, but it uses the same algorithm and carries an inherent possibility that xttobit suffers the same problem. It seems that censReg is only good when the number of censored observations is small. Even that case, we must be careful about the value of nGHQ. In censReg, the default value of nGHQ is 8, but it would not be large enough, because Lesaffre and Spiessens (2001) gave a simple example of a logistic random-intercepts model in the context of a longitudinal clinical trial where nGHQ gives valid results only for a high number of quadrature points. The author of censReg may recognize this point. He gave an example showing the effect of nGHQ.

### Example 2

We borrow an artificial panel data, which include 60 observations of 15 panels, shown in Henningsen (2017) for the second example. The dependent variable y is modeled by 2 independent variables x1 and x2. Among the 60 values of y, 20 observations are left-censored at 0. The REML is generally distinguish from the ML when sample size is small; therefore, the data may be adequate for demonstrating the small sample property of two estimates. Table 2 shows the estimates and their standard errors.

It can be observed that the estimates are nearly the same for all slope parameters. For instance, relative distances between REML and ML estimates $(θ^ML–θ^REML)/std(θ^ML)$ are nearly equal to zero for all slop parameters. But two methods make a difference in the estimate of random components. The ML gives a slightly smaller estimate of variance components. Also, the standard errors of ML are smaller than the REML for all parameters. It is believed that the REML can remedy the underestimation of the ML in this small data since ML is known to underestimate variance components, which leads to the underestimated standard error.

4. Simulation study and conclusion

REML was distinguished from ML in the estimation of variance components and standard errors. When a sample size is small, an inference based on ML estimates may not be appropriate because of the inflated Type I error due to the underestimated standard error. It seems that REML is more adequate for the inference of a censored regression model since REML reports a larger standard error than ML. To see this, we have performed a simulation under the study design used in Example 2. That is, we have generated a dataset according to

$yit=max{β0+β1x1it+β2x2it+μi+ϵit,0}, μi∼iidN(0,σμ2) and ϵ∼iidN(0,σϵ2),$

where yit is the observation of the dependent variable. The regressors x1 and x2 are selected from a standard normal and a uniform distribution ranging 0 to 1, respectively. The parameters are given by $(β0,β1,β2,σμ2,σϵ2)=(-1,2,3,1,1)$.

The data generation process has been repeated 1,000 times. Some estimates of variance components may become negative. In particular, the ML method often estimates $σμ2$ negatively. The negative estimates could be constrained to zero. We could only consider T = 4 and n = 10, 15 since the difference between two methods is statistically meaningful when sample size is small and the algorithm is computationally expensive. Based on 1,000 replications for each case, the average of estimates and the empirical mean squared error have been computed (Table 3). Also, a 95% Wald confidence interval for the slope parameter has been constructed for each replication, and then we have calculated the empirical coverage probability of the interval to measure the adequacy of the calculated standard error. If the standard error is asymptotically correct, then the coverage probability would be close to 0.95.

Note that the coverage probabilities of ML are smaller than the nominal level for all sample sizes. In particular, when n = 10, we may say that the coverage probability is far below the nominal level. The coverage probabilities of REML are also below the nominal level, but REML gives coverage probabilities closer to 0.95 than ML. Thus, it may be concluded that the REML can provide a proper standard error of the estimate.

However, the ML tends to have smaller empirical mean squared error. Thus, ML is theoretically appearing when we wish to estimate the parameter itself; however, the inference, such as the interval estimation or the hypothesis testing on the slope parameter, based on ML may not be adequate when the sample size is small. Perhaps, this is well known in the general linear model.

We have considered a methodology under a censored random effect panel regression model, but the methodology can be applicable to a general mixed-effects linear model with limited dependent variables.

TABLES

### Table 1

Estimates of EmpUK data

REMLML

EstimateStd. errt-valueEstimateStd. errt-value
(Intercept)2.34310.79212.9582.34230.79012.965
wage−0.08140.0164−4.970−0.08140.0164−4.977
capital0.12450.04242.9390.12480.04222.956
output0.04240.003910.9710.04240.003910.986
$σμ2$35.14544.539634.86754.4131
$σϵ2$1.14140.05621.13820.0558

REML = restricted maximum likelihood; ML = maximum likelihood.

### Table 2

Estimates of an artificial panel data

REMLML

EstimateStd. errt-valueEstimateStd. errt-value
(Intercept)−0.39210.4782−0.820−0.36550.4612−0.792
x11.70200.21867.7871.68380.21247.927
x22.28750.69193.3062.26360.67393.359
$σμ2$0.90050.51090.79610.4474
$σϵ2$1.01750.27200.97340.2534

REML = restricted maximum likelihood; ML = maximum likelihood.

### Table 3

Summary of simulation

nθREMLML

Aver. EstAver. stdMSECoverageAver. EstAver. stdMSECoverage
10β0−1.06300.59330.39530.937−1.01560.56040.37480.922
β12.04810.29180.09300.9412.02250.27850.08910.936
β23.05250.78270.64360.9373.01560.67630.68520.925
$σμ2$1.07320.73530.52640.91330.60580.4005
$σϵ2$1.00270.36570.13630.92580.32180.1194

15β0−1.05790.53540.31250.944−1.01950.51310.30770.924
β12.01840.22160.07580.9492.01220.25500.05380.940
β23.05000.70610.51910.9403.00950.76080.50480.930
$σμ2$1.04830.58840.34740.95060.55320.3316
$σϵ2$0.99010.29820.09470.94080.29190.1008

$(β0,β1,β2,σμ2,σϵ2)=(-1,2,3,1,1)$. REML = restricted maximum likelihood; ML = maximum likelihood; MSE = mean squared error.

References
1. Amemiya T (1984). Tobit models: a survey. Journal of Econometrics, 24, 3-61.
2. Arellano M and Bond S (1991). Some tests of specification for panel data: Monte Carlo evidence and an application to employment equations. The Review of Economic Studies, 58, 227-297.
3. Arismendi JC (2013). Multivariate truncated moments. Journal of Multivariate Analysis, 117, 41-75.
4. Croissant Y and Millo G (2008). Panel data econometrics in R: The plm package. Journal of Statistical Software, 27, 1-43.
5. Drum ML and McCullagh P (1993). REML estimation with exact covariance in the logistic mixed model. Biometrics, 49, 677-689.
6. Duan JC and Fulop A (2011). A stable estimator of the information matrix under EM for dependent data. Statistics and Computing, 21, 83-91.
7. Eddelbuettel D, Francois R, Allaire J, Ushey K, Kou Q, Russell N, Bates D, and Chambers J (2018) Seamless R and C++ Integration .
8. Eddelbuettel D and Sanderson C (2014). RcppArmadillo: accelerating R with high-performance C++ linear algebra. Computational Statistics and Data Analysis, 71, 1054-1063.
9. Efron B and Hinkley DV (1978). The observed versus expected information. Biometrika, 65, 457-487.
10. Green W (2004). Fixed effects and bias due to the incidental parameters problem in the Tobit model. Econometric Review, 23, 125-147.
11. Henningsen A (2017). censReg: Censored Regression (Tobit) Models R package version 0.5 .
12. Hughes JP (1999). Mixed effects models with censored data with application to HIV RNA levels. Biometrics, 55, 625-629.
13. Kan R and Robotti C (2017). On moments of folded and truncated multivariate normal distributions . Unpublished manuscript.
14. Kleiber C and Zeileis A (2009). AER: Applied Econometrics with R R package version 1.1 .
15. Lancaster T (2000). The incidental parameter problem since 1948. Journal of Econometrics, 95, 391-413.
16. Lee L (2017) Nondetects And Data Analysis for environmental data .
17. Lee SC (2016). A Bayesian inference for fixed effects panel probit model. Communications for statistical Applications and Methods, 23, 179-187.
18. Lesaffre E and Spiessens B (2001). On the effect of the number of quadrature points in a logistic random effects model: an example. Journal of Royal Statistical Society, Applied Statistics, Series C, 50, 325-335.
19. Lee Y and Nelder JA (2001). Hierarchical generalized linear models: a synthesis of generalized linear models, random‚Äďeffect model and structure dispersion. Biometrika, 88, 987-1006.
20. Louis TA (1982). Finding the observed information matrix when using the EM algorithm. Journal of the Royal Statistical Society, Series B, 62, 257-270.
21. Maddala GS (1983). Limited-Dependent and Qualitative Variables in Econometrics, New York, Cambridge University Press.
22. Meilijson I (1989). A fast improvement to the EM algorithm on its own terms. Journal of the Royal Statistical Society, Series B, 51, 127-138.
23. Meng XL and Rubin DB (1991). Using EM to obtain asymptotic variance-covariance matrices: the SEM algorithm. Journal of the American Statistical Association, 86, 899-909.
24. McCulloch CE (1994). Maximum likelihood variance components estimation for binary data. Journal of the American Statistical Association, 89, 330-335.
25. McCulloch CE (1996). Fixed and random effects and best prediction. Proceedings of the Kansas State Conference on Applied Statistics in Agriculture. .
26. Noh M and Lee Y (2007). REML estimation for binary data in GLMMs. Journal of Multivariate Analysis, 98, 896-915.
27. Patterson H and Thomson R (1971). Recovery of inter-block information when block sizes are unequal. Biometrika, 31, 100-109.
28. R Core Team (2017) R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing .
29. SAS (2011). SAS/ETS 93 User‚Äôs Guide.
30. Searle SR, Casella G, and McCulloch CE (2006). Variance Components, New York, John Wiley & Sons.
31. Stata (2017). Finite Mixture Models Reference Manual, Stata press.
32. Tobin J (1958). Estimation of relationships for limited dependent variables. Econometrica, 26, 24-36.
33. Zhang H, Lu N, Feng C, Thurston SW, Xia Y, Zhu L, and Tu XM (2011). On fitting generalized linear mixed-effects models for binary responses using different statistical packages. Statistics in Medicine, 30, 2562-2572.