search for

CrossRef (0)
A two-step approach for variable selection in linear regression with measurement error
Communications for Statistical Applications and Methods 2019;26:47-55
Published online January 31, 2019
© 2019 Korean Statistical Society.

Jiyeon Songa, and Seung Jun Shin1,b

aDepartment of Statistics, University of Connecticut, USA, bDepartment of Statistics, Korea University, Korea
Correspondence to: 1Corresponding author: Department of Statistics, Korea University, 145 Anam-Ro, Sungbuk-Gu, Seoul 02841, Korea. E-mail: sjshin@korea.ac.kr
Received October 16, 2018; Revised December 27, 2018; Accepted January 7, 2019.

It is important to identify informative variables in high dimensional data analysis; however, it becomes a challenging task when covariates are contaminated by measurement error due to the bias induced by measurement error. In this article, we present a two-step approach for variable selection in the presence of measurement error. In the first step, we directly select important variables from the contaminated covariates as if there is no measurement error. We then apply, in the following step, orthogonal regression to obtain the unbiased estimates of regression coefficients identified in the previous step. In addition, we propose a modification of the two-step approach to further enhance the variable selection performance. Various simulation studies demonstrate the promising performance of the proposed method.

Keywords : measurement error, penalized orthogonal regression, SIMEX
1. Introduction

With the growth of high dimensional data, variable selection becomes a primal task in statistical learning. Since the prediction accuracy of final models relies heavily on selected variables. Regularization has become one of the canonical approaches for variable selection due to its fast and promising performance under high-dimensional setups since the proposal of the least absolute shrinkage and selection operator (LASSO) (Tibshirani, 1996). In addition to the L1 penalty for LASSO, nonconvex penalties such as the smoothly clipped absolute deviation (SCAD) (Fan and Li, 2001) and the minimax concave plus penalty (MCP) (Zhang, 2010) have been widely employed as alternatives.

Measurement error in variables is commonly observed in practice. Let the dataset be (yi, xi) ∈ ℝ × ℝp for i = 1, …, n generated from yi = xiβ+ εi where β = (β1, …, βp) and εi is the random error with mean 0 and variance σ2, and independent of xi, i = 1, …, n. Without measurement error, the least squares estimate would be an obvious choice. However, suppose that instead of xi, we observe the contaminated predictor wi by the measurement error ui. That is,

wi=xi+ui,         i=1,,n,

where ui denotes the measurement error with mean zero and covariance u, and independent of both xi and εi. Assuming E(x) = 0, Cov(x)=σx2Ip and Cov(u)=σu2Ip where σx and σu are scalars, we compute the population regression coefficient of y on w denoted by β* in order to illustrate the effect of the measurement error:


where β = {Cov(x)}1Cov(x, y) denotes the true target of interest. This shows that β* does not consider a measurement error biased toward zero and this bias is called attenuation (Fuller, 1987). Therefore the implication of ignoring a measurement error could be considerable in inferential procedure.

Numerous methods have been developed for measurement error models (Carroll et al., 2006, and references); however, is limited research on variable selection considering the measurement error effect. Liang and Li (2009) proposed SCAD-penalized orthogonal regression for a partially linear model and Ma and Li (2010) elegantly developed the penalized version of an estimating equation that can be applied to more broad families of semi-parametric models. Both ideas attempt to achieve variable selection and estimation simultaneously.

In this paper, we propose a two-step procedure for variable selection in linear regression with measurement errors. The proposed process conducts selection and estimation separately. We firstly identify important variables without considering measurement error by applying conventional regularized methods directly to (yi,wi). We then obtain unbiased coefficient estimates only for the selected variables via orthogonal regression. Our idea shares a conceptual similarity with the least angle regression - ordinary least squares (LARS-OLS) hybrid (Efron et al., 2004) and the relaxed LASSO (Meinshausen, 2007) that separate variable selection and estimation steps in linear regression without measurement error. In addition, we propose a simple modification in the first step to enhance the variable selection performance by reducing false positives via random partitioning.

2. Orthogonal regression

Orthogonal regression has been regarded as one of the popular choices for bias correction. The orthogonal regression assumes that X = (x1, …, xn) is an unknown fixed constant to be estimated. Consequently, the orthogonal regression minimizes the following objective function with respect to the unknown parameter β and the unobservable covariate X.


where y = (y1, …, yn), γ = σ2 which is unknown. Here we used a new notation γ to denote σ2 to emphasize that it will be treated as a tuning parameter instead of a model parameter. In this regards, we develop a data-adaptive method to choose a proper value of γ, which will be discussed in Section 4. The measurement error variance u is often assumed to be known in practice since it can be estimated by repeatedly measuring the predictors (or is sometimes available from external sources). When γ = 1, the objective function becomes the orthogonal distance between (yi,wi) and (xiβ, xi), which explains its name. Here we let β̂ be the orthogonal regression estimator where (β̂, ) = argminβ,XL(β,X).

To illustrate this, we consider a set of data (yi,wi), i = 1, …, n from the following linear regression model with the measurement error.


where β = (0, 1), xiiidN2(0,I),ɛiiidN(0,1), and uiiidN2(0,I) with n = 100. We use the three estimation methods: least squares regression on X (denoted by Oracle) and least squares regression on W (OLS), and the orthogonal regression on W (OR). Figure 1 shows the boxplots of the coefficient estimates obtained by the three methods over 100 independent repetitions. Notice that β1 is uninformative and there is no attenuation observed even for OLS (subpanel (a)). However, subpanel (b) for the informative variable β2 depicts that attenuation is clearly observed in OLS. The orthogonal regression corrects the attenuation and its distribution is similar to the oracle estimates without the measurement error.

The least squares estimate for (yi,wi) is biased; however, it still contains a significant amount of information for identifying informative variables since the goal of the variable selection is not to estimate exact value of βj, j = 1, …, p but to check whether βj = 0 or not. This motivates us to develop a two-step approach that separates selection step and estimation step.

3. Two-step procedure

We develop a two-step procedure for variable selection in linear regression with measurement error. The key idea of the two-step procedure is to separate selection and estimation. In the first step we consider the following regularized linear regression for (yi,wi)


where W = (w1, …,wn), pλ denotes the sparsity-inducing penalty function such as L1 norm penalty for LASSO and non-convex penalties including SCAD and MCP and λ > 0 is a tuning parameter. The role of λ is to control the degree of sparsity of the solution β̂(1) that can be data-adaptively chosen via cross-validation.

Notice that β̂(1) is obviously biased. However, the goal of the first step is not to estimate β but to estimate . Therefore, we have

S^={jβ^j(1)0,   j=1,,p}

in the first step.

Given , we can estimate by applying the orthogonal regression of yi on the selected predictors only. Finally we have the final estimate denoted by that is defined by


where , , , and is the covariance matrix of .

The proposed two-step method shows promising performance; however, we empirically observe that uninformative variables are often selected in the first step due to the additional variability of wi compared to xi. We suggest a simple modification for the selection step as follows. In order to reduce such false positives in the first step. We randomly split data into two subsets with the same size and apply regularized methods on the two subsets. Then we set where and denote the selected sets from the two subsets respectively. Simulation studies show that the simple modification helps reduce false positives in the selection step.

However, we also remark that this random partitioning approach may not work well when the sample size is low and/or the signal of informative variables are not strong because the signal is too weak to detect from half of the data. Therefore we recommend in practice to employ this modification only when the sample size is large enough.

4. Tuning γ: SIMEX estimator of σ2

In the orthogonal regression (2.1), γ which is in fact σ2 is an unknown but crucial quantity, and thus it should be estimated before applying the proposed method. Without the measurement error, we can estimate σ2 by


where β̂* denotes the least square estimate of y on X. In the presence of measurement error, we can consider σ̃2(γ), a naive estimate of error variance from the orthogonal regression as:


where β̂(γ) denotes the orthogonal regression estimate and it is a function of γ. However, σ̃2(γ) in (4.1) clearly over-estimates σ2 since Ware contaminated. Let


be the difference between the two estimates. We propose to exploit the simulation-extrapolation (SIMEX) (Cook and Stefanski, 1994) to estimate Δ(γ) in (4.2). For the SIMEX method, we consider independent copies of ui, ui*iidN(0,Σu), i = 1, …, n and define W*=(w1*,,wn*) with wi*=xi+ui*. SIMEX is motivated because the relation between X and W is similar to that between Wand W*.

In particular, we denote Wb*=(w1,b*,,wn,b*) with be wi,b*=xi+ui,b where ui,b for i = 1, …, n is the independently simulated measurement error of the bth iteration for b = 1, …, B. Given γ, we obtain the solution β^b*(γ) as the orthogonal regression estimator of (y, Wb*)


since Cov(wi,b*-xi)=2Σu. We then compute

σ˜b*2(γ)=1n-p-1(y-W*β^*(γ))(y-W*β*(γ)),         b=1,,B,

and finally, the SIMEX estimator of ΔSIMEX(γ) is given by


and the estimated variance σ^SIMEX2(γ) is


Finally, we can find γ̂ such that γ^=argminγσ^SIMEX2(γ)-γ via grid search.

5. Simulation

We conduct a simulation to investigate the performance of the two-step variable selection procedure and σ2 estimation. We consider a SCAD penalty function and perform ten-fold cross validation for λ selection in the first step of the variable selection. We remark that we have two versions, the original and the modified version, of the two-step procedure method, denoted by TS1 and TS2, respectively. Competing methods considered in the simulation include the least squares regression of y on X as an oracle estimator (denoted by Oracle), the least squares regression of y on W (denoted by OLS), the orthogonal regression of y on W (denoted by OR) and the penalized least squares regression of y on W (denoted by PLS). We also consider the penalized orthogonal regression (denoted by POR) by directly adding a penalty term on (2.1), which can be viewed as a one-step procedure. For all orthogonal regressions considered here, we assume γ = σ2 is known for simplicity.

Setting (n, p) ∈ {200, 500} × {20}, we consider the following three regression models:


where xiiidNp(0,Ip) and ɛiiidN(0,1). (M1) and (M2) are regarded as the models with relatively week signals while (M3) is the complex model with strong signals. When generating the measurement error uiiidNp(0,Σu), we consider the two different structures of u as:

  • Independent (IND): Σu=σu2Ip

  • Autoregressive (AR): Σu=σu2Ap where Ap denotes a p-dimensional symmetric matrix whose (i, j)th element is ρ|ij|, i, j = 1, …, p.

For σu, we consider {0.5, 1} for IND and 0.1, 0.25} for AR respectively.

For performance evaluation, we report the three values with their standard errors:

  • TP: averaged true positives: the # of important variables selected in the first step.

  • FP: averaged false positive: the # of unimportant variables selected in the first step.

  • MSE: the median of squared error ||ββ̂||2.

The first two measures, TP and FP quantify the performance of the variable selection. MSE measures the accuracy of coefficient estimates. In our simulation study, perfect methods have 5 TP, 0 FP and the lowest MSE.

Table 1 and Table 2 report the simulation results when the measurement errors have independent and autoregressive structures, respectively. It is observed that our two-step approach outperforms the POR which is a one-step approach. Comparing the two versions of the two-step approach, the modified version substantially reduces false positives. We also note that our two-step approach using LASSO and MCP performs better than the one-step approach under the all scenarios in considered.

We next conduct additional simulations to examine the performance of our method to estimate γ = σ2, the variance of error. We investigate the case where the data is generated from (M1) having the independent form ∑u with σu × n × p = {0.5, 1} × 1000 × 20. While σu is fixed, we vary the true values of γ = σ2 between 0.5 to 1.5 to show the performance of the proposed method with B = 100.

Table 3 describes the average estimated γ and its standard error over 100 repetitions. As the true γ is increased, the proposed method performs well on average even though the noise strength σu is strong. High averages with low standard error shows the consistency of the estimator and the accuracy of our algorithm. Thus, the proposed estimation method based on the SIMEX idea shows promising performance to select a proper γ.

6. Real data illustration

For the real data illustration, we use the Boston housing data (Harrison and Rubinfeld, 1978) available in R. The response is the logarithm of the median value of owner-occupied homes in the Boston areas. The data originally contains thirteen predictors which are not contaminated. In order to check the performance of the proposed method under the presence of measurement error, we first exclude two discrete predictors and marginally standardize the eleven continuous predictors. We then generate (p – 11) noise variables from the standard normal distribution so that we have p predictors in total where p ∈ {20, 30}. Finally, the measurement errors from the normal distribution with mean 0 and variance 0.5 which we assume to be known are generated and added to all predictors. We apply the proposed two-step procedure (TS) to this artificially contaminated data and compare its performance to the penalized least regression of the response on the contaminated predictors. We repeat these steps 100 times independently; consequently, Table 4 reports the averaged root mean squared error (RMSE) of regression coefficients over the 100 independent repetitions. The last column reports the p-values for the pairwise mean difference between the RMSEs of TS and PLS. When computing the RMSE we treat the OLS estimators based on the eleven predictors before the contamination as true parameter values for informative predictors and 0 for all noise variables. It is observed that the proposed TS outperforms PLS that ignores measurement error which is concordant to what we have seen in Section 4.

7. Conclusion

In this paper, we develop a two-step variable selection method for measurement error models. The proposed method is based on the idea of separating selection and estimation; it first selects significant variables from contaminated covariates and then obtains the orthogonal regression estimates of the selected variables. Furthermore, we suggested a SIMEX-based method to estimate γ which is unknown in practice. Our simulation results illustrate that the proposed method works well under various scenarios with different types of variance-covariance matrices. A tactic assumption in both the variable section methods and the SIMEX method is that u is known. The estimation of the error covariance matrix is still challenging in error-in-variable models; in addition, most research assumes that u is diagonal such as Σu=σu2IpAmemiya and Fuller (1984) and σu can be estimated from repeated measurements. Despite the drawback, the proposed methods are valuable for improving estimation in the measurement errors model; consequently, the estimation of u remains a topic for future work.


This work is supported by National Research Foundation of Korea Grant (NRF-2018R1D1A1B07043 034).


This work is supported by National Research Foundation of Korea Grant (NRF-2018R1D1A1B07043 034).

Fig. 1. Boxplots for coefficient estimates of (a) uninformative variable (1) and (b) informative variable (2).

Table 1

Simulation result for independent predictors

ModelMethodσu = 0.5σu = 1

M1Oracle5.00(0.00)0.00(0.00)0.010 (0.013)5.00(0.00)0.00(0.00)0.010 (0.013)
OLS5.00(0.00)15.00(0.00)0.254 (0.052)5.00(0.00)15.00(0.00)1.301 (0.122)
OR5.00(0.00)15.00(0.00)0.115 (0.038)5.00(0.00)15.00(0.00)0.507 (0.235)
PLS5.00(0.00)0.74(1.32)0.206 (0.051)5.00(0.00)1.21(1.72)1.254 (0.120)
POR5.00(0.00)1.12(2.55)0.046 (0.043)5.00(0.00)1.84(1.74)0.236 (0.178)
TS15.00(0.00)0.74(1.50)0.037 (0.035)5.00(0.00)1.21(2.08)0.182 (0.159)
TS25.00(0.00)0.54(1.04)0.035 (0.027)5.00(0.00)0.93(1.23)0.142 (0.130)

M2Oracle5.00(0.00)0.00(0.00)0.010 (0.013)5.00(0.00)0.00(0.00)0.010 (0.013)
OLS5.00(0.00)15.00(0.00)0.270 (0.053)5.00(0.00)15.00(0.00)1.351 (0.128)
OR5.00(0.00)15.00(0.00)0.124 (0.039)5.00(0.00)15.00(0.00)0.560 (0.201)
PLS5.00(0.00)0.78(1.88)0.224 (0.051)5.00(0.00)1.52(2.00)1.296 (0.126)
POR5.00(0.00)1.58(2.64)0.055 (0.048)5.00(0.00)2.04(2.38)0.281 (0.203)
TS15.00(0.00)0.78(1.75)0.037 (0.034)5.00(0.00)1.52(2.48)0.192 (0.165)
TS25.00(0.00)0.68(1.36)0.034 (0.031)5.00(0.00)1.08(2.00)0.136 (0.147)

M3Oracle5.00(0.00)0.00(0.00)0.010 (0.013)5.00(0.00)0.00(0.00)0.010 (0.013)
OLS5.00(0.00)15.00(0.00)1.817 (0.301)5.00(0.00)15.00(0.00)9.922 (0.909)
OR5.00(0.00)15.00(0.00)0.534 (0.165)5.00(0.00)15.00(0.00)2.695 (1.189)
PLS5.00(0.00)1.45(1.62)1.641 (0.314)4.95(0.20)2.94(2.55)9.871 (0.951)
POR5.00(0.00)1.86(1.81)0.267 (0.173)4.94(0.20)3.41(2.68)1.604 (1.086)
TS15.00(0.00)1.45(1.66)0.236 (0.159)4.95(0.20)2.94(2.56)1.361 (0.900)
TS25.00(0.00)1.00(1.44)0.194 (0.135)4.85(0.34)2.33(2.30)1.179 (0.988)

Averaged TP, FP, and MSE over 100 independent repetitions are reported under (M1)–(M3) where ∑u takes IND structure with n = 500 and p = 20. The SCAD penalty is used for a shrinkage method. The standard errors of TP, FP, and MSE are given in parenthesis. TP = true positives; FP = false positive; MSE = median of squared error.

Table 2

Simulation result for autoregressive predictors

ModelMethodσu = 0.5σu = 1

M1Oracle5.00(0.00)0.00(0.00)0.011 (0.013)5.00(0.00)0.00(0.00)0.011 (0.013)
OLS5.00(0.00)15.00(0.00)0.048 (0.014)5.00(0.00)15.00(0.00)0.134 (0.034)
OR5.00(0.00)15.00(0.00)0.047 (0.014)5.00(0.00)15.00(0.00)0.078 (0.023)
PLS5.00(0.00)0.85(1.42)0.013 (0.010)5.00(0.00)0.86(2.11)0.085 (0.035)
POR5.00(0.00)0.50(2.12)0.012 (0.013)5.00(0.00)1.21(2.16)0.025 (0.023)
TS15.00(0.00)0.85(1.50)0.015 (0.015)5.00(0.00)0.86(1.72)0.025 (0.024)
TS25.00(0.00)0.65(1.44)0.014 (0.011)5.00(0.00)0.65(2.04)0.023 (0.021)

M2Oracle5.00(0.00)0.00(0.00)0.010 (0.013)5.00(0.00)0.00(0.00)0.010 (0.013)
OLS5.00(0.00)15.00(0.00)0.045 (0.014)5.00(0.00)15.00(0.00)0.086 (0.026)
OR5.00(0.00)15.00(0.00)0.046 (0.014)5.00(0.00)15.00(0.00)0.065 (0.020)
PLS5.00(0.00)1.08(1.57)0.014 (0.010)5.00(0.00)1.73(2.92)0.045 (0.026)
POR5.00(0.00)0.47(1.61)0.012 (0.009)5.00(0.00)1.17(2.68)0.018 (0.018)
TS15.00(0.00)1.08(1.66)0.017 (0.016)5.00(0.00)1.73(2.78)0.023 (0.020)
TS25.00(0.00)0.74(1.46)0.015 (0.013)5.00(0.00)1.44(2.11)0.022 (0.019)

M3Oracle5.00(0.00)0.00(0.00)0.010 (0.013)5.00(0.00)0.00(0.00)0.010 (0.013)
OLS5.00(0.00)15.00(0.00)0.064 (0.019)5.00(0.00)15.00(0.00)0.343 (0.074)
OR5.00(0.00)15.00(0.00)0.059 (0.017)5.00(0.00)15.00(0.00)0.158 (0.044)
PLS5.00(0.00)1.00(1.74)0.022 (0.015)5.00(0.00)4.27(4.01)0.223 (0.089)
POR5.00(0.00)1.22(2.13)0.016 (0.015)5.00(0.00)3.46(3.52)0.094 (0.055)
TS15.00(0.00)1.00(1.71)0.021 (0.020)5.00(0.00)4.27(4.24)0.083 (0.050)
TS25.00(0.00)0.77(1.62)0.019 (0.016)5.00(0.00)3.43(3.64)0.078 (0.047)

Averaged TP, FP, and MSE over 100 independent repetitions are reported under (M1)–(M3) where ∑u takes AR structure with n = 500, p = 20, and ρ = 0.5. The SCAD penalty is used for a shrinkage method. The standard errors of TP, FP, and MSE are given in parenthesis. TP = true positives; FP = false positive; MSE = median of squared error.

Table 3

Simulation result for γ estimation

γσu = 0.5σu = 1


The averaged estimated γ and its standard error over 100 repetitions are reported under (M1) with IND structure, n = 1000 and p = 20.

Table 4

Real-data-based comparison results



Averaged root mean square error over 100 independent repetitions along with the corresponding standard deviations. The last column contains the p-values for the (one-sided) pairwise t-test between the RMSEs of PLS and TS.

  1. Amemiya, Y, and Fuller, WA (1984). Estimation for the multivariate errors-in-variables model with estimated error covariance matrix. The Annals of Statistics. 12, 497-509.
  2. Carroll, RJ, Ruppert, D, Stefanski, LA, and Crainiceanu, C (2006). Measurement Error in Nonlinear Models: A Modern Perspective: CRC Press
  3. Cook, JR, and Stefanski, LA (1994). Simulation-extrapolation estimation in parametric measurement error models. Journal of the American Statistical Association. 89, 1314-1328.
  4. Efron, B, Hastie, T, Johnstone, I, and Tibshirani, R (2004). Least angle regression. The Annals of Statistics. 32, 407-499.
  5. Fan, J, and Li, R (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association. 96, 1348-1360.
  6. Fuller, WA (1987). Measurement Error Models. New York: John Willey
  7. Harrison, D, and Rubinfeld, DL (1978). Hedonic housing prices and the demand for clean air. Journal of Environmental Economics and Management. 5, 81-102.
  8. Liang, H, and Li, R (2009). Variable selection for partially linear models with measurement errors. Journal of the American Statistical Association. 104, 234-248.
    KoreaMed CrossRef
  9. Ma, Y, and Li, R (2010). Variable selection in measurement error models. Bernoulli: official journal of the Bernoulli Society for Mathematical Statistics and Probability. 16, 274.
    Pubmed KoreaMed CrossRef
  10. Meinshausen, N (2007). Relaxed lasso. Computational Statistics & Data Analysis. 52, 374-393.
  11. Tibshirani, R (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological). 58, 267-288.
  12. Zhang, CH (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics. 38, 894-942.