TEXT SIZE

CrossRef (0)
Simultaneous outlier detection and variable selection via difference-based regression model and stochastic search variable selection

Jong Suk Parka, Chun Gun Parkb, Kyeong Eun Lee1,a

aDepartment of Statistics, Kyungpook National University, Korea,
bDepartment of Mathematics, Kyonggi University, Korea
Correspondence to: 1Department of Statistics, Kyungpook National University, 80 Daehakro, Bukgu, Daegu 41566, Korea. E-mail: artlee@knu.ac.kr
Received August 14, 2018; Revised November 19, 2018; Accepted February 8, 2019.
Abstract

In this article, we suggest the following approaches to simultaneous variable selection and outlier detection. First, we determine possible candidates for outliers using properties of an intercept estimator in a difference-based regression model, and the information of outliers is reflected in the multiple regression model adding mean shift parameters. Second, we select the best model from the model including the outlier candidates as predictors using stochastic search variable selection. Finally, we evaluate our method using simulations and real data analysis to yield promising results. In addition, we need to develop our method to make robust estimates. We will also to the nonparametric regression model for simultaneous outlier detection and variable selection.

Keywords : Bayesian variable selection, difference-based regression model, mean-shift outlier model, stochastic search variable selection
1. Introduction

In multiple linear regression models, the separated data point on the vertical axis is called an outlier that is different from the others in the data (Weisberg, 2004). These outliers have serious effects in inference and model selection (Kahng et al., 2016). The selection of predictors is a crucial problem in building a multiple linear regression model (George and McCulloch, 1993).

Many approaches for the detection of outliers or selection of variables have been proposed. Most authors have focused on these problems separately; however, some authors proposed methods to perform outlier detection and variable selection simultaneously. For example, Hoeting et al. (1996) considered the model with all variables and outlier candidates which are determined using the least median of squares regression (Rousseeuw, 1984) and computed posterior probabilities for all possible subset models using Markov chain Monte Carlo approach. This approach is efficient in detecting masked outliers and also selecting variables (Kim et al., 2008).

After that Kim et al. (2008) proposed two similar steps, but the details are different from the approach proposed by Hoeting et al. (1996). The first step is to determine outlier candidates using a multiple outlier identification procedure, and the second step is to apply all possible subset regressions of the mean-shift outlier models to select the best model. Compared with the Frequentist model selection methods, the Bayesian model selection methods have several advantages, the major one of which exists in its ability to incorporate prior knowledge into the selection process. In order to take this advantage, we provide an alternative approach for simultaneous variable selection and outlier detection using stochastic search variable selection (SSVS) (George and McCulloch, 1993) in a multiple regression model.

We use the mean-shift outlier model for outlier candidates. To determine these outlier candidates, we use the properties of an intercept estimator in the difference-based regression model (DBRM) (Choi et al., 2018; Park and Kim, 2018b). This type of model was originally used in a time series analysis to remove trends in the mean function (Park and Kim, 2018a; Park, 2018; Park et al., 2012). This method has advantages of good performance and simple usage that uses only the intercept and does not need to estimate the mean function.

The remainder of this paper is organized as follows. In Section 2, after introducing the notation, we describe the mean-shift outlier model and the difference-based regression model (Park and Kim, 2018b) to determine the outlier candidates. In Section 3, we introduce the Bayesian variable selection, SSVS proposed by George and McCulloch (1993). Then, we propose a method that can simultaneously perform outlier detection and variable selection by using SSVS in regression model with outlier candidates. In Sections 4 and 5, we present simulations and a real data example, respectively. Finally, we provide conclusions and recommendations in Section 6.

2. Determining outlier candidates

In this section, we explain how to determine outlier candidates and set up a regression model that includes outlier candidates. We then introduce the difference-based regression model (Choi et al., 2018; Park and Kim, 2018b) in Section 2.2.

### 2.1. Linear model with information about outliers

Let X = [1n : X1] be an n × (p + 1) matrix, with rank(X) = p + 1 and β = (β0, β1, … , βp)′ be a (p × 1) coefficient vector where β0 is an intercept. Assume that the response vector is Y = (y1, y2, … , yn)′.

We consider the mean-shift outlier model (Belsley et al., 1980):

$Y=1nβ0+X1β1+γ+ϵ=Xβ+γ+ϵ, ϵ~N(0,σ2In),$

where γ = (γ1, γ2, … , γn)′ is an n × 1 vector and γj is nonzero only when the jth observation is an outlier.

In this paper, we use the mean-shift outlier model described above in order to include potential q (0 ≤ q < n) outliers, instead of using the entire n’s. To identify potential outliers, we use the properties of an intercept estimator in the DBRM (Park and Kim, 2018b). Since the variance of an outlier indicator variable is (n − 1)/n2 ≈ 1/n, we adjust the weight of them as $n$. Without loss of generality, the dataset is sorted by the order of magnitude of the absolute intercept estimators in DBRM. So we denote

$Z=(X:nIn×q),$

where In×q is the submatrix with the first q columns of In.

In this paper, we use the mean-shift outlier model described above in order to include potential q (0 ≤ q < n) outliers, instead of using the entire n’s. To identify potential outliers, we use the properties of an intercept estimator in the DBRM (Park and Kim, 2018b). Then, we can rewrite the model under the above assumptions:

$Y=∑j=1p+q+1zjθj+ϵ=Zn×(p+q+1)θ(p+q+1)×1+ϵ,$

where ε ~ N(0, σ2In) and θ = (β′, γ′)′. Here, the new model includes p + q + 1 parameters, and rank(Z) = p + q + 1. The first (p + 1) of θj’s come from the βj’s, j = 0, 1, … , p, and the other q coefficients indicate the effects of the outlier candidates.

### 2.2. Difference-based regression model

Park and Kim (2018b) propose an outlier-detection approach that uses the properties of an intercept estimator in the difference-based regression model. This method uses only the estimated intercepts: it does not require estimating the other parameters in the DBRM. To identify whether if the observations are outliers, the DBRM uses a mean-shift outlier model.

In this paper, we describe the DBRM defined by Park and Kim (2018b) using Equation (2.1). Assume that Y(i) and X1(i) are Y and X1 without the ith row for i = 1, 2, … , n. And D(i)Y is the difference between Y(i) and yi1n−1. Then the difference-based regression model can be written as follows:

$D(i)Y=1n-1(-γi)+D(i)X1β1+A(i)γ+D(i)ϵ=[1n-1:D(i)X1](-γiβ1)+A(i)γ+D(i)ϵ, i=1,…,n,$

where −γi is the intercept and D(i) and A(i) are the (n − 1) × n matrices as follows:

$D(i)=(Ii-1-1i-10(i-1),(n-i)0(n-i),(i-1)-1n-iIn-i) and A(i)=(Ii-10i-10(i-1),(n-i)0(n-i),(i-1)0n-iIn-i),$

where Ia is the a × a identity matrix and 0a,b is the a × b null matrix.

Park and Kim (2018b) estimate intercepts, −γi in the DBRM (2.3) using the least square estimators. Then estimated intercepts are as follows:

$-γ^i={1n-1′(In-1-H(i))D(i)ϵ1n-1′(In-1-H(i))1n-1,no outlier,1n-1′(In-1-H(i))A(i)γ1n-1′(In-1-H(i))1n-1+1n-1′(In-1-H(i))D(j)ϵ1n-1′(In-1-H(i))1n-1,several outliers,$

where the ith observation is not an outlier and $H(i)=D(i)X1(X1′D(i)′D(i)X1)-1X1′D(i)′$.

$-γ^k={-γk+1n-1′(In-1-H(k))D(k)ϵ1n-1′(In-1-H(k))1n-1,no outlier,-γk+1n-1′(In-1-H(k))A(k)γ1n-1′(In-1-H(k))1n-1+1n-1′(In-1-H(k))D(k)ϵ1n-1′(In-1-H(k))1n-1,several outliers,$

where the kth observation is an outlier and $H(k)=D(k)X1(X1′D(k)′D(k)X1)-1X1′D(k)′$.

The ith OLS intercept estimator is highly affected by the ith outlier effect (Park and Kim, 2018b). This means that the intercept estimators corresponding to the observations including the outlier effect are large values. Subsequently we can roughly discriminate outliers among the observations. To do this, we estimate the absolute magnitude of the OLS intercept estimators and regard some observations corresponding to the large absolute intercept estimates as possible outliers using a triangular outlier-detection approach proposed by Park and Kim (2018b). This process is determined as follows:

• D(i)Y = Y(i)yi1n−1 = 1n−1(−γi) + D(i)X1β1 + A(i)γ + D(i)ϵ, for i = 1, … , n;

• Estimate intercepts, −γ̂i, i = 1, … , n, and rewrite $γi*=abs(-γ^i)$;

• Ascend them, $γi*,γ(1)*≤γ(2)*≤⋯≤γ(n)*$.

We assume that q is the greatest integer less than one fifth of the number of data and the value of q can be adjusted by looking at the results of the DBRM. That is, the candidate outliers are set by a rapid change point of the estimated intercepts trend.

3. Simultaneous outlier detection and variable selection

In this section, we briefly describe the Bayesian variable selection and introduce stochastic search variable selection (George and McCulloch, 1993). Finally, we explain how to simultaneously detect outliers and select variables for the regression model (2.2) which includes independent variables as the outlier effect.

### 3.1. Stochastic search variable selection

George and McCulloch (1993, 1997) assumed that the prior distributions of the regression coefficients are independent and expressed a prior distribution of each coefficient to be a mixture of two normal distributions. Both normal distributions are centered at 0 with one being a very small variance and the other being a very large variance.

Then we can describe stochastic search variable selection using equation (2.2). The prior distribution of the coefficient θj given the indicator δj is

$θj∣δj~(1-δj)N(0,τj2)+δjN(0,cj2τj2), j=1,2,…,(p+q+1).$

The value of $τj2$ is set to be small, and $N(0,τj2)$ is the prior distribution of the coefficient θj, if the variable zj is not selected. The value of cj is set to be large (cj > 1) so that if δj = 1, then a non-zero estimate of θj should be included in the final model. The prior distribution of δj is P(δj = 1) = 1 − P(δj = 0) = pj. The prior distributions of (δj, θj) is independent from the prior distribution of σ2, which is an inverse gamma (IG) distribution, IG(νδ/2, νδλδ/2).

### 3.2. Simultaneous outlier detection and variable selection

Our approach for simultaneous variable selection and outlier detection consists of two steps: the first step is to determine a set of outlier candidates using properties of an intercept in the DBRM described in Section 2.2. The second step is to perform SSVS of the mean-shift outlier model with the outlier candidates detected in step 1. Our approach is similar to that of Hoeting et al. (1996) and Kim et al. (2008), but different in determination of outlier candidates and the variable selection method.

To simultaneously perform outlier detection and variable selection, we use hierarchical model of equation (2.2), and as the prior for θj conditional on δj and $τj2$ use mixture of two normal densities, which can be written as

$θj∣δj,τj2~(1-δj)N(0,τj2)+δjN(0,c2τj2), j=1,2,…,(p+q+1),$

where c2 and $τj2$ are variance components. We assume an inverse gamma prior for σ2 and $τj2$ and that δj is distributed as Bernoulli with inclusion probability pj, j = 1, 2, … , (p+q+1). Thus, we have the following multilevel model:

$Y∣θ,σ2~N(Zθ,σ2In),$$θ∣δ,τ2~Np+q+1 (0,DδτRDδτ),$$δj∼indBernoulli(pj),$$σ2~IG (a12,b12),$$τj2~IG (a22,b22),$

where R is the prior correlation matrix, and we assume R = I. Also, Dδτ = diag[d1τ1, … , dp+q+1τp+q+1] where $dj2=1$ if δj = 0 and $dj2=c2$ if δj = 1.

Using conjugate priors, it is easy to obtain posterior distributions. Thus Gibbs sampling procedures are easily implemented for calculating the posterior distributions. Accordingly, the posterior distributions as follows:

$θ∣Y,σ2,τ2,δ~N (1σ2AδτZ′Y,Aδτ),σ2∣Y,τ2,δ,θ~IG (n+a12,12[(Y-Zθ)′(Y-Zθ)+b1]),τj2∣Y,σ2,θ,δ~IG (a2+12,θj2/dj2+b22),δj∣δ(j),θ,τ2~Bernoulli (δ^j)$

where Aδτ = (σ−2ZZ + (DδτRDδτ)−1)−1, δ(j) = (δ1, δ2, … , δj−1, δj+1, … , δp+q+1)′, $dj2=1$ if δj = 0 and $dj2=c2$ if δj = 1 and

$δ^j=P(θj∣δj=1)pjP(θj∣δj=1)pj+P(θj∣δj=0)(1-pj).$

Therefore the best subset of variables is selected according to the information contained in the δ. The posterior probability p(δj = 1|Y) for the regressor zj with j = 1, … , p + q + 1 to be included in the model can be estimated by the mean of δ̂j or alternatively by the mean of $p(δj(m)=1∣Y)$.

4. Simulation studies

We conduct simulations to evaluate the performance of our approach compared to that of another existing approach. We compare our approach with BayesVarSel. Therefore, we introduce BayesVarSel before simulation.

Donato and Forte (2017) introduce the R package, BayesVarSel which implements Bayesian methodology for hypothesis testing and variable selection in linear models. To perform the simulation compared to our method, we will use the variable selection in this package. The variable selection functions in this package are Bvs, PBvs, and GibbsBvs. Except for a few arguments the usage of the three functions is very similar.

This package implements the criteria-based priors of the regression coefficients proposed by Bayarri et al. (2012), but the advanced user has the possibility of using several other popular priors in the literature. These priors are shown in

### 4.1. Simulation setting

We consider two cases of multiple regression: p = 4 and p = 9 to demonstrate the performance of the proposed method. To study the behavior of selection of regressors, the coefficient vectors are sets to different β1s in (2.1), β1 = (1, 1, 0, 0)′ and β1 = (1, 1, 1, 0, 0, 0, 0, 0, 0)′. Also, we consider three different samples sizes (n = 30, 50, 100), each with 10% randomly assigned outliers whose size is randomly determined to be 7 or 10 and whose sign is + or −. We set model matrix X = {xi j} ~ N(0, 32C), where C = {ρ|ij|}, ρ = 0.5, i = 1, … , n, j = 1, … , p. In addition, we generate errors using N(0, 1), and 100 data sets for each case.

We apply SSVS using the hyperparameter settings: prior correlation matrix RI, and δ prior (3.5) with pj = 0.5 which yields π(δ) ≡ 1/2(p+q+1). We use three different values of $cj2=c2$, c2 = 10, 25, 100. For $τj2$, we consider two cases: $τβ2=τγ2(=τ2)$ or $τβ2≠τγ2$ where $τβ2$ denotes common prior variance for regression coefficients and $τγ2$ denotes common prior variance for outliers. For each setting, initial values are randomly generated from their prior distributions discussed in Section 3.2.

In all cases, 30,000 samples from the MCMC simulation are used to estimate the parameters, where the first 10,000 samples are discarded as burn-in. In addition, we confirm the convergence of the Markov chain by using Gelman-Rubin diagnostic (Gelman and Rubin, 1992); all the values are close to one.

For each case, we compare our approach with Bvs (p + q + 1 < 20) and GibbsBvs (p + q + 1 ≥ 20) function in BayesVarSel. For choosing prior probabilities of models and θ, we use two prior probabilities of models (ScottBerger, Constant) and five prior probabilities of θ in Table 1. We use default values in the package for other conditions.

### 4.2. Criteria

The performances of our proposed procedure are evaluated in two parts: outlier detection and variable selection. In the first part, we use three criteria proposed by Choi et al. (2018) to detect outliers. Let nO be the number of true outliers, nD be the number of detected outliers, nCD be the number of correctly detected outliers, nID be the number of incorrectly detected outliers, nIU be the number of incorrectly undetected outliers, and nCU be the number of correctly undetected non-outliers (Table 2). Then, we use the relative frequency of perfect detection (PD), the relative frequency of only-swamping with detection (overdetection) (OS), and the average number of detected outliers (AN) to compare performance.

$PD=1nsim∑s=1nsimI (nCD(s)=nO(s)=nD(s)),OS=1nsim∑s=1nsimI (nIU(s)=0,nID(s)>0,nCD(s)>0),AN=1nsim∑s=1nsim(#{j:γ^j,s≠0}).$

In the second part, to select variables, we use two criteria (Choi, et al., 2018). Let CS be the relative frequency of correct selection, the AN be the average number of selected variables.

$CS=1nsim∑s=1nsimI({j:β^j,s≠0}={j:β^j≠0}),AN=1nsim∑s=1nsim(#{j:β^j,s≠0}).$

Also, to compare the performance of our approach and BayesVarSel, we consider the following two models: the model with the highest probability model (HPM) and the model consisting of variables with inclusion probability greater than 0.5 (MPM) (Barbieri and Berger, 2004).

### 4.3. Simulation results

The assignment of prior probabilities π(δ) of BayesVarSel, is “Constant”, which stands for π(δ) = 1/2p+q and intercept (β0) presents in all models. Because the results of MPM similar to the results of HPM, we present the results of HPM in a tabular form. The results over the simulated 100 data sets are summarized in Table 3 and Table 4. Except for few cases discussed below, simulation results show that the SSVS performs better than the BayesVarSel regardless of sample sizes. Our method using SSVS shows the best results at c2 = 10 and $τβ2=τγ2$. In contrast, the method using BayesVarSel shows different results depending on the prior probability of θ. For prior probabilities for models in Table 1, “contrast” is better than “ScottBerger”. For prior probabilities for the coefficients, gZellner and FLS are better than the other three priors of θ.

With regard to variable selection, Table 3 (p = 4) indicates that the CS of our method is better than that of BayesVarSel, and the AN of our method tends to be closer to the number of true variables. With regard to outlier detection, when c2 = 10 and $τβ2=τγ2$, our method performs best (PD is over 0.95 and PD + OS ≈ 1). However, BayesVarSel tends to overdetect outliers more than our method. The results of p = 9 (Table 4) are similar to the results of p = 4 (Table 3).

We now examine one data set in order to show the detailed procedures which lead into the detailed results. Consider the following model: n = 50, p = 4, and β1 = (1, 1, 0, 0)′. To identify outlier candidates, we calculate the absolute values of the intercept estimators ($γi*$) and sort them in order of their magnitude. Up to 20% of the total data is determined to be outlier candidates. Figure 1 displays the estimated intercepts.

Accordingly, the dataset (Y and Z) is sorted by the order of $γi*$ and we find the data of the numbers 13, 25, 33, 2, 15, 24, 44, 39, 32, and 50 to be outlier candidates in the final model. We then simultaneously perform outlier detection and variable selection by using SSVS with c2 = 10 and $τβ2=τγ2$. Table 5 and Table 6 summarize high frequency models and the estimation results on simulated data. In conclusion, we can select the model that includes variable β1, β2 and determine that observations (2, 13, 15, 25, 33) are multiple outliers.

5. Real data analysis

This section illustrate the performance of our method on Scottish Hill Racing data (Atkinson, 1986). This data set is used by Hoeting et al. (1996), Kim et al. (2008) and Menjoge and Welsch (2010) to evaluate their respective methods. This data contains the record-winning times for 35 hill races in Scotland and two independent variables (distance and climb).

To identify outlier candidates, we calculate the absolute values of the intercept estimators ($γi*$) and sort them in order of their magnitude. Up to 20% of the data is determined to be the outlier candidates. Figure 2 displays the estimated intercepts in our example data.

Accordingly, the dataset is sorted by the order of $γi*$ and we find the data of the numbers 11, 18, 31, 26, 33, 17, and 5 to be outlier candidates in the final model. Then, we simultaneously perform outlier detection and variable selection by using SSVS with c2 = 10 and $τβ2=τγ2$.

High frequency models and estimation results of our example data are summarized in Table 7 and Table 8. The result of BayesVarSel is the same results; therefore, β1 and β2 are included in the final model, and observation 18 is determined as an outlier.

6. Discussion

In this paper, we have adopted the mean-shift outlier model in order to include information on outliers. This approach to modeling outliers is used by Kim et al. (2008) and Menjoge and Welsch (2010). The first step in these methods determines outlier candidates and the second step classifies outliers among them.

Accordingly, we suggest an alternative approach for simultaneous outlier detection and variable selection. First, by using properties of an intercept estimator in the DBRM (Park and Kim, 2018b), outlier candidates are determined and the information on outliers is reflected in the multiple regression model. Second, we select the best model from the model containing all variables including outlier candidates by using SSVS.

As shown in the simulation results and real data analysis, we have found that the performance of the proposed method is good under proper conditions. Furthermore, there is an advantage that the relative sizes of outliers can be confirmed from statistics of results. However, our method is affected by the constant value such as c2 and prior variance τ2. Therefore, we need to develop our method to make robust estimates. We will also to the nonparametric regression model for simultaneous outlier detection and variable selection.

Acknowledgement

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (No. 2018R1D1A1B070).

Figures
Fig. 1. Intercept estimates in DBRM on simulated data with n = 50, p = 4, 1 = (1, 1, 0, 0)′, true outlier location = (2, 13, 15, 25, 33), and size of true outlier = (10, 7, 10, 7, 7).
Fig. 2. Intercept estimates in DBRM in our example data.
TABLES

### Table 1

Priors used in BayesVarSel package

 (a) Prior probabilities for models - prior.models = “ScottBerger” (default)- prior.models = “Constant”- prior.models = “User”, priorprobs (b) Prior probabilities for the coefficients - prior.betas = “Robust” (default)- prior.betas = “ZellnerSiow”- prior.betas = “gZellner”- prior.betas = “FLS”- prior.betas = “Liangetal” (c) Null model contains just the intercept - fixed.cov = c(“Intercept”) (default)- fixed.cov = NULL

### Table 2

Criteria of outlier detection

True Sum

Outlier Non-outlier
Detection Outlier nCD nID nD
Non-outlier nIU nCU nUD

Sum nO nNO n

### Table 3

Results on simulated data with p = 4 and HPM on 100 replicates

n Method Variable selection Outlier detection

CS AN Max Mode PD PD + OS AN Max Mode
30 SSVS c2 = 100 $τβ2=τγ2$ 0.90 2.13 4 2 0.88 0.99 3.12 5 3
$τβ2≠τγ2$ 0.88 2.15 4 2 0.76 0.98 3.26 5 3

c2 = 25 $τβ2=τγ2$ 0.91 2.07 4 2 0.93 0.99 3.08 6 3
$τβ2≠τγ2$ 0.86 2.20 4 2 0.88 0.99 3.16 6 3

c2 = 10 $τβ2=τγ2$ 0.93 2.07 4 2 0.98 0.99 3.00 4 3
$τβ2≠τγ2$ 0.90 2.13 4 2 0.93 0.99 3.08 6 3

BayesVarSel FLS 0.91 2.10 4 2 0.79 0.99 3.22 5 3
gZellner 0.93 2.08 4 2 0.82 0.99 3.18 5 3
Liangetal 0.91 2.10 4 2 0.80 0.99 3.21 5 3
Robust (default) 0.91 2.10 4 2 0.76 0.99 3.26 5 3
ZellnerSiow 0.91 2.10 4 2 0.80 0.99 3.21 5 3

50 SSVS c2 = 100 $τβ2=τγ2$ 0.94 2.07 4 2 0.69 0.95 5.30 8 5
$τβ2≠τγ2$ 0.90 2.13 4 2 0.53 0.95 5.58 9 5

c2 = 25 $τβ2=τγ2$ 0.95 2.05 3 2 0.96 1.00 5.04 6 5
$τβ2≠τγ2$ 0.93 2.08 4 2 0.91 0.99 5.10 7 5

c2 = 10 $τβ2=τγ2$ 0.97 2.02 4 2 0.95 0.95 4.95 5 5
$τβ2≠τγ2$ 0.95 2.05 3 2 0.92 0.95 4.98 6 5

BayesVarSel FLS 0.95 2.05 3 2 0.49 0.95 5.64 9 5
gZellner 0.94 2.07 4 2 0.49 0.95 5.64 9 5
Liangetal 0.94 2.07 4 2 0.48 0.95 5.66 9 5
Robust (default) 0.94 2.07 4 2 0.46 0.95 5.70 9 5
ZellnerSiow 0.94 2.07 4 2 0.47 0.95 5.67 9 5

100 SSVS c2 = 100 $τβ2=τγ2$ 0.93 2.08 4 2 0.70 0.99 10.31 12 10
$τβ2≠τγ2$ 0.96 2.04 3 2 0.65 0.99 10.41 12 10

c2 = 25 $τβ2=τγ2$ 0.97 2.03 3 2 0.93 0.97 10.02 11 10
$τβ2≠τγ2$ 0.99 2.01 3 2 0.92 0.97 10.03 11 10

c2 = 10 $τβ2=τγ2$ 0.96 2.04 3 2 0.99 0.99 9.99 10 10
$τβ2≠τγ2$ 0.97 2.04 4 2 0.98 0.99 10.09 20 10

BayesVarSel FLS 0.97 2.03 3 2 0.41 0.99 10.95 14 10
gZellner 0.89 2.11 3 2 0.21 0.99 11.60 16 11
Liangetal 0.90 2.10 3 2 0.27 0.99 11.42 16 10
Robust (default) 0.90 2.10 3 2 0.25 0.99 11.51 16 11
ZellnerSiow 0.90 2.10 3 2 0.27 0.99 11.42 16 10

### Table 4

Results on simulated data with p = 9 and HPM on 100 replicates

n Method Variable selection Outlier detection

CS AN Max Mode PD PD + OS AN Max Mode
30 SSVS c2 = 100 $τβ2=τγ2$ 0.73 3.44 7 3 0.73 0.91 3.14 6 3
$τβ2≠τγ2$ 0.71 3.47 6 3 0.55 0.91 3.40 6 3

c2 = 25 $τβ2=τγ2$ 0.70 3.27 7 3 0.81 0.87 2.94 4 3
$τβ2≠τγ2$ 0.66 3.47 6 3 0.76 0.87 3.09 6 3

c2 = 10 $τβ2=τγ2$ 0.78 3.20 6 3 0.89 0.91 2.94 4 3
$τβ2≠τγ2$ 0.70 3.48 9 3 0.85 0.90 3.00 4 3

BayesVarSel FLS 0.82 3.12 7 3 0.66 0.91 3.20 6 3
gZellner 0.88 3.01 4 3 0.82 0.91 3.02 5 3
Liangetal 0.84 3.06 5 3 0.68 0.91 3.18 6 3
Robust (default) 0.81 3.14 7 3 0.66 0.91 3.22 6 3
ZellnerSiow 0.84 3.06 5 3 0.68 0.91 3.19 6 3

50 SSVS c2 = 100 $τβ2=τγ2$ 0.87 3.20 6 3 0.79 0.94 5.10 7 5
$τβ2≠τγ2$ 0.77 3.37 8 3 0.57 0.94 5.43 7 5

c2 = 25 $τβ2=τγ2$ 0.87 3.17 6 3 0.92 0.93 4.95 6 5
$τβ2≠τγ2$ 0.79 3.28 6 3 0.90 0.93 5.00 7 5

c2 = 10 $τβ2=τγ2$ 0.89 3.16 6 3 0.94 0.94 4.93 5 5
$τβ2≠τγ2$ 0.87 3.23 7 3 0.92 0.94 4.95 6 5

BayesVarSel FLS 0.88 3.10 5 3 0.64 0.94 5.27 7 5
gZellner 0.90 3.10 6 3 0.66 0.94 5.26 7 5
Liangetal 0.88 3.12 6 3 0.62 0.94 5.31 7 5
Robust (default) 0.85 3.15 6 3 0.62 0.94 5.32 7 5
ZellnerSiow 0.88 3.12 6 3 0.62 0.94 5.31 7 5

100 SSVS c2 = 100 $τβ2=τγ2$ 0.89 3.14 5 3 0.73 0.96 10.21 13 10
$τβ2≠τγ2$ 0.82 3.29 9 3 0.58 0.95 10.35 12 10

c2 = 25 $τβ2=τγ2$ 0.87 3.20 6 3 0.95 0.97 9.99 11 10
$τβ2≠τγ2$ 0.91 3.13 6 3 0.95 0.97 9.99 11 10

c2 = 10 $τβ2=τγ2$ 0.92 3.09 5 3 0.96 0.96 9.96 10 10
$τβ2≠τγ2$ 0.94 3.07 5 3 0.95 0.96 9.97 11 10

BayesVarSel FLS 0.92 3.08 4 3 0.45 0.96 10.73 15 10
gZellner 0.85 3.17 5 3 0.31 0.96 11.08 15 11
Liangetal 0.85 3.17 5 3 0.31 0.96 11.09 15 11
Robust (default) 0.85 3.17 5 3 0.31 0.96 11.10 15 11
ZellnerSiow 0.85 3.17 5 3 0.31 0.96 11.09 15 11

### Table 5

High frequency models simulated data using SSVS

Model Index set Probability

Model selection Outlier detection
1 {β1, β2} {γ13, γ25, γ33, γ2, γ15} 0.99965
2 {β1} {γ13, γ25, γ33, γ2, γ15} 0.00005
3 {β1, β2} {γ13, γ25, γ2, γ15} 0.00010
4 {β2} {γ13, γ25, γ33, γ2, γ15} 0.00010
5 {β1, β2} {γ25, γ33, γ2, γ15} 0.00005
6 {β1, β2} {γ13, γ25, γ33, γ2, } 0.00005

### Table 6

Estimation results simulated data using SSVS

Parameter Quantile (95%) Median Mean sd Inclusion probability

2.5% 97.5%
Variable selection β1 0.816 1.589 1.201 1.202 0.196 0.9999
β2 1.175 2.067 1.623 1.623 0.227 0.99995
β3 −0.071 0.820 0.354 0.359 0.227 0
β4 −0.487 0.227 −0.121 −0.122 0.182 0

Outlier detection γ13 −1.149 −0.639 −0.896 −0.895 0.129 0.99995
γ25 −1.180 −0.681 −0.932 −0.932 0.125 1
γ33 −1.173 −0.682 −0.927 −0.928 0.125 0.9999
γ2 −1.612 −1.123 −1.368 −1.368 0.124 1
γ15 1.140 1.691 1.418 1.418 0.140 0.99995
γ24 −0.151 0.316 0.085 0.084 0.119 0
γ44 −0.097 0.397 0.154 0.153 0.126 0
γ39 −0.433 0.068 −0.185 −0.183 0.127 0
γ32 0.000 0.480 0.239 0.240 0.122 0
γ50 0.090 0.614 0.353 0.352 0.134 0

### Table 7

High frequency models in our example data using SSVS

Model Index set Probability

Model selection Outlier detection
1 {β1, β2} {γ18} 0.9996
2 {β1, β2} { } 0.0003
3 {β2} {γ18} 0.0001

### Table 8

Estimation Results in our example data using SSVS

Parameter Quantile (95%) Median Mean Sd Inclusion probability

2.5% 97.5%
Variable selection β1 21.045 35.307 28.155 28.177 3.621 0.9999
β2 18.938 29.414 24.186 24.178 2.643 1

Outlier detection γ5 −5.043 0.273 −2.429 −2.414 1.354 0
γ17 −2.277 3.543 0.620 0.609 1.474
γ33 −1.299 4.622 1.697 1.684 1.502 0
γ26 −5.038 0.324 −2.361 −2.362 1.365 0
γ31 −6.267 −0.638 −3.527 −3.512 1.419 0
γ18 8.725 14.084 11.377 11.386 1.358 0.9997
γ11 −0.476 9.577 4.590 4.574 2.558 0

References
1. Atkinson AC (1986). [Influential observations, high leverage points, and outliers in linear regression]: comment: aspects of diagnostic regression analysis. Statistical Science, 1, 397-402.
2. Barbieri MM and Berger JO (2004). Optimal predictive model selection. The Annals of Statistics, 32, 870-897.
3. Bayarri MJ, Berger JO, Forte A, and Donato GG (2012). Criteria for Bayesian model choice with application to variable selection. The Annals of Statistics, 40, 1550-1577.
4. Belsley DA, Kuh E, and Welsch RE (1980). Regression Diagnostics, New York, Wiley.
5. Choi IH, Park CG, and Lee KE (2018). Outlier detection and variable selection via difference based regression model and penalized regression. Journal of the Korean Data & Information Science Society, 29, 815-825.
6. Donato GG and Forte A (2017). BayesVarSel : Bayes factors, model choice and variable selection in linear models R package version 1.8. 0 . Available on line access from https://cran.rproject.org/web/packages/BayesVarSel/BayesVarSel.pdf
7. George EI and McCulloch RE (1993). Variable selection via Gibbs sampling. Journal of the American Statistical Association, 88, 881-889.
8. George EI and McCulloch RE (1997). Approaches for Bayesian variable selection. Statistica Sinica, 7, 339-373.
9. Gelman A and Rubin DB (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7, 457-511.
10. Hoeting J, Raftery AE, and Madigan D (1996). A method for simultaneous variable selection and outlier identification in linear regression. Computational Statistics and Data Analysis, 22, 251-270.
11. Kahng MW, Kim YI, Ahn CH, and Lee YG (2016). Regression Analysis (2nd ed), Seoul, Yulgok.
12. Kim S, Park SH, and Krzanowski WJ (2008). Simultaneous variable selection and outlier identification in linear regression using the mean-shift outlier model. Journal of Applied Statistics, 35, 283-291.
13. Menjoge RS and Welsch RE (2010). A diagnostic method for simultaneous feature selection and outlier identification in linear regression. Computational Statistics and Data Analysis, 54, 3181-3193.
14. Park CG (2018). A study on robust regression estimators in heteroscedastic error models. Journal of the Korean Data & Information Science Society, 29, 339-350.
15. Park CG and Kim I (2018a). Outlier detection using difference-based variance estimators in multiple regression. Communications in Statistics - Theory and Methods, 47, 5986-6001.
16. Park CG and Kim I (2018b). Outlier detection using difference based regression Model. Communications in Statistics - Theory Methods. under review
17. Park CG, Kim I, and Lee Y (2012). Error variance estimation in nonparametric regression under Lipschitz condition and small sample size. Journal of Statistical Planning and Inference, 142, 2369-2385.
18. Rousseeuw PJ (1984). Least median of squares regression. Journal of the American Statistical Association, 79, 871-888.
19. Weisberg S (2004). Applied Linear Regression (3rd ed), New York, Wiley.