TEXT SIZE

CrossRef (0)
Penalized variable selection for accelerated failure time models

Eunyoung Parka, Il Do Ha1,a

aDepartment of Statistics, Pukyong National University, Korea
Correspondence to: 1Department of Statistics, Pukyong National University, 45, Yongso-ro, Nam-gu, Busan 48513, Korea. E-mail: idha1353@pknu.ac.kr
Received March 19, 2018; Revised June 29, 2018; Accepted September 15, 2018.
Abstract

The accelerated failure time (AFT) model is a linear model under the log-transformation of survival time that has been introduced as a useful alternative to the proportional hazards (PH) model. In this paper we propose variable-selection procedures of fixed effects in a parametric AFT model using penalized likelihood approaches. We use three popular penalty functions, least absolute shrinkage and selection operator (LASSO), adaptive LASSO and smoothly clipped absolute deviation (SCAD). With these procedures we can select important variables and estimate the fixed effects at the same time. The performance of the proposed method is evaluated using simulation studies, including the investigation of impact of misspecifying the assumed distribution. The proposed method is illustrated with a primary biliary cirrhosis (PBC) data set.

Keywords : AFT model, LASSO, penalized likelihood, SCAD, variable selection
1. Introduction

In survival analysis, accelerated failure time (AFT) model has been introduced as a useful alternative to proportional hazards (PH) model (Lawless, 1982). The PH model is modelled by fixed effects (e.g., regression coefficients) acting multiplicatively on the hazard rate of individual survival time. However, in the AFT model the fixed effects act linearly on the individual survival time, thus making the interpretation of the fixed effects easier than in the PH model. AFT model is robust against the misspecification of the assumed model due to its log-linear transformation (Hutton and Monaghan, 2002; Ha et al., 2002). In this paper, we are interested in the development of a variable-selection procedure in the AFT model. Recently, variable-selection methods using a penalized likelihood with penalty functions have been widely studied in various statistical models, such as linear models, generalized linear models (GLMs), and Cox’s (1972) PH models (Tibshirani, 1996; Fan and Li, 2001). The advantages of these methods are the ability to select important variables and estimates the regression coefficients of the covariates, simultaneously. Selecting relevant variables from a regression model with a number of covariates is important in data analysis including survival analysis.

Various penalized variable-selection methods in the semiparametric AFT model with an unspecified distribution have been studied (Huang et al., 2006; Cai et al., 2009; Huang and Ma, 2010; Xu et al., 2010; Wang and Song, 2011; Zhang et al., 2018). Parametric survival models and their functional forms (e.g., survival function) are simple and they would be useful in survival analysis if the model assumption is correct or less sensitive against the inference. The fixed effects (i.e., regression coefficients) in parametric AFT model with a specified distribution (e.g., lognormal or Weibull) are relatively robust against the misspecification of the assumed distribution as compared to nuisance parameters in random error terms (Hutton and Monaghan, 2002; Ha et al., 2002). Thus, we are interested in studying the behaviors of variable selection of fixed effects under parametric AFT model.

In this paper, we develop variable-selection procedures of fixed effects in parametric AFT model using a penalized likelihood approach. Here we consider two useful parametric distributions, lognormal and Weibull distributions, for survival analysis. For the variable selection, we use three popular penalty functions, least absolute shrinkage and selection operator (LASSO) (Tibshirani, 1996), adaptive LASSO (ALASSO) (Zou, 2006), and smoothly clipped absolute deviation (SCAD) (Fan and Li, 2001). We also show how to derive the penalized likelihood procedure. The performance of the proposed method is evaluated using simulation studies. In particular, the simulation shows that the proposed variable-selection method is somewhat robust against the misspecification of the assumed model. The proposed method is illustrated with a primary biliary cirrhosis (PBC) (Tibshirani, 1997) data set which is well known in the literature.

This paper is organized as follows. In Section 2, we briefly review the AFT model, and propose a penalized variable-selection method using AFT model, including the derivations of the estimation procedures. In Section 3, the results of simulation studies are presented to evaluate the validity of the proposed method. The proposed method is illustrated with the PBC data in Section 4. Discussion is given in Section 5. Finally, technical details are given in the Appendix.

2. Variable selection for accelerated failure time models

### 2.1. Accelerated failure time model

Let Ti be the survival time (failure time) for each subject (i = 1, . . . , n) and let Ci be the corresponding random censoring time. AFT model is to describe a linear relationship between the logarithm of survival time and covariates as:

$log Ti=xiTβ+ϵi,$

where xi = (1, xi1, . . . , xi,p–1)T is a covariates vector of the ith subject, β = (β0, β1, . . . , βp–1)T is a p×1 vector of regression coefficients corresponding to xi, and ϵi is a random error.

For the distribution of ϵi, we consider two popular parametric distributions, i.e., normal and extreme value (EV) distributions. If $ϵi~N(0,σϵ2)$ having the density

$f(ϵi)=(2πσϵ2)-12exp(-ϵi22σϵ2),$

Ti has the lognormal (LN) distribution with location parameter $xiTβ$ and scale parameter $ψ=σϵ2$. If ϵi follows an EV distribution with scale parameter σ having the density

$f(ϵi)=σ-1exp{(ϵiσ)-exp(ϵiσ)},$

Ti follows Weibull distribution with scale parameter $λ0=exp{-(xiTβ)ψ}$ and shape parameter ψ = 1/σ. In particular, the Weibull distribution is a flexible model because of an unique distribution satisfying both AFT and PH models (Lawless, 1982).

In this paper, if Ti in AFT model (2.1) follows LN distribution, we call the model LN AFT model. If Ti follows Weibull distribution, we call it Weibull AFT model. We follow two usual assumptions under non-informative censoring (Ha et al., 2002; Zhou, 2005; Zhang et al., 2018):

Assumption 1

Given covariates xi, Ti’s and Ci’s are conditionally independent and the pairs (Ti, Ci)’s are also conditionally independent for i = 1, . . . , n.

Assumption 2

Given covariates xi, Ci’s are conditionally non-informative about Ti’s.

Based on these two assumptions, we make inferences as shown below.

### 2.2. Variable selection procedure

Now, we present how to derive a variable selection procedure using a penalized likelihood. In survival analysis with random censoring, observable random variables are given by

$Yi=min(log Ti,log Ci) and δi=I(Ti≤Ci).$

Let λ(t) be the hazard function of Ti, and let $Λ(t)=∫0tλ(k)dk$ be the corresponding cumulative hazard function. Under Assumptions 1 and 2, the log-likelihood for AFT model (2.1) is defined by

$ℓ=ℓ(θ)=∑i=1n{δi log λθ(yi)-Λθ(yi)},$

where θ = (β, ψ)T and ψ is a parameter in random error term of ϵi. Note here that ψ = 1/σ in EV of ϵi and $ψ=σϵ2$ in normal.

For variable selection of fixed effects β in model (2.1), we use the following penalized log-likelihood (Fan and Li, 2001), denoted by p, given by

$ℓp=ℓp(θ)=ℓ(θ)-n∑k=0p-1Jγ(∣βk∣),$

where Jγ(·) is a penalty function with a tuning parameter γ. A larger value of γ tends to choose a simple model, whereas a smaller value of γ inclines to a complex model. Here, we use the three penalty functions, LASSO, ALASSO, and SCAD. The forms of three penalty functions are:

• LASSO (Tibshirani, 1996):

$Jγ(∣β∣)=γ∣β∣.$

• ALASSO (Zou, 2006):

$Jγ(∣β∣)=γ∣β∣w,$

where w is a known weights vector.

• SCAD (Fan and Li, 2001):

$Jγ′(∣β∣)=γI(∣β∣≤γ)+(aγ-∣β∣)+a-1I(∣β∣>γ),$

where a = 3.7 and x+ denotes the positive part of x.

Figure 1 displays the shapes of LASSO and SCAD functions under γ = 1. A good penalty function should produce estimates that satisfy unbiasedness, sparsity, and continuity (Fan and Li, 2001, 2002). The LASSO is a well-known penalty, but it does not satisfy these three properties. Thus, Fan and Li (2001, 2002) and Zou (2006) have shown that SCAD and ALASSO satisfy the three properties and that they can perform well as the oracle procedure in terms of selecting the correct subset models and estimating the true non-zero coefficients, simultaneously.

For the variable selection, we want to find the estimators β̂ which maximize the penalized loglikelihood p in (2.5), given by

$β^=arg maxβ ℓp.$

We call the resulting estimators penalized maximum likelihood estimators (PMLEs). The PMLEs are obtained by solving the following estimating equations:

$∂ℓp∂βk=∂ℓ∂βk-n∑k=0p-1[Jγ(∣βk∣)]′=0, (k=0,1,…,p-1).$

Here we use $[Jγ(∣βk∣)]′=Jγ′(∣βk∣)sgn(∣βk∣)≈{Jγ′(∣βk(0)∣)/∣βk(0)∣}βk for βk≈βk(0)$ by local quadratic approximation (LQA) (Fan and Li, 2001), and sgn(·) is the sign function. It can be shown that the negative Hessian matrix of β based on p can be explicitly written as a simple matrix form:

$Hp=Hp(ℓp;β)=-∂2ℓp∂β∂βT=XTWX+n Σγ,$

where X is a n × p model matrix of covariates xi’s and W is a weight matrix with a diagonal element wi, i.e.,

$W=-∂2ℓ∂η∂ηT=diag(wi),$

with a linear predictor η = in AFT model (2.1) and $Σγ=diag{Jγ′(∣βj∣)/∣βj∣}$. Let U(·) = φ(·)/{1 – Φ(·)} be the hazard function of N(0, 1), where φ(·) and Φ(·) are the density and cumulative distribution functions of N(0, 1), respectively. In LN AFT model, $wi={δi+(1-δi)ξ(mi)}/σϵ2$, where ξ(mi) = U(mi){U(mi) – mi}, U(mi) = φ(mi)/{1 – Φ(mi)}, and $mi=(yi-xiTβ)/σϵ$. In Weibull AFT model, wi = Λi/σ2, where Λi = exp(mi) and $mi=(yi-xiTβ)/σ$. We can obtain the PMLEs of β from the Newton-Raphson method; its one-step formula is given by

$β^(1)=β^(0)+[-ℓp″ (β^(0))]-1 ℓp′ (β^(0)),$

where β̂(0) is the initial values of β, $ℓp′(β)=∂ℓp(β)/∂β$, and $-ℓp″(β)=Hp(β)$. The nuisance parameter ψ in the error term of model (2.1) is obtained from the following estimating equation:

$∂ℓp∂ψ=∂ℓ∂ψ=0$

since ψ does not depend on the penalty function. More details for the estimating equations are given in Appendix. Then we compute the sandwich standard error (Fan and Li, 2001; Ha et al., 2014) for β̂, from variance-covariance matrix

$cov(β^)=(Hββ+n Σγ)-1Hββ(Hββ+n Σγ)-1,$

where Hββ = −2ℓ/∂β∂βT = XTWX.

Wang et al. (2007) showed that the generalized cross validation (GCV) approach cannot select the tuning parameters satisfactorily, with a nonignorable overfitting effect in the resulting model. For the selection of tuning parameter γ, we use a Bayesian information criterion type (BIC-type) criterion (Ha et al., 2014), given by

$BIC*(γ)=-2ℓ(β^,ψ^)+log(n)df,$

where df = tr[(Hββ + nγ)−1Hββ] is an effective degree of freedom.

In summary, an outline of the proposed variable-selection algorithm is described as follows.

• Step 1. Find initial values of β and ψ.

• Step 2. In the inner loop, we maximize p in (2.5) for β and ψ.

• Step 3. In the outer loop, we find γ that minimizes BIC*(γ) in (2.14).

After convergence, we compute the estimated standard errors for β̂. For the initial values of β in LASSO and ALASSO, we use the estimates from the AFT model without penalty. For the weights w of the ALASSO in (2.7), following Zhang and Lu (2007) and Wang and Song (2011), we use

$w=1∣β˜∣,$

where β̃ are non-penalized coefficient estimates. Following Ha et al. (2014, 2017), for the initial values of β in SCAD, we use the LASSO solutions. Our procedures were implemented by using R programs.

3. Simulation study

Simulation studies, based upon 100 replications of simulated data, are presented to evaluate the performance of the proposed variable-selection procedure for AFT models. Here, we compare the performances of the variable-selection methods using LASSO, ALASSO, and SCAD. Below we consider the two distributions (LN, Weibull) for this purpose. Following the simulation scheme of Fan and Li (2001), we generate the data from the AFT model (2.1) with the true regression parameters

$β=(β0,β1,β2,β3,β4,β5,β6,β7,β8)T=(1,0.8,0,0,1,0,0,0.6,0)T.$

Here, the corresponding covariates x = (1, x*) and covariates x* = (x1, . . . , x8)T are generated with and AR(1) structure with a correlation coefficient ρ = 0.5. Note that x1, x4, and x7 are important covariates. We also consider three sample sizes n = 100, 300, and 500. The corresponding censoring times Ci’s are generated from an uniform distribution with a parameter empirically determined to achieve approximately the right censoring rate about 45%. As the measures of variable selection, we consider the average number of zero coefficients (C and IC), the probability of choosing the true model (PT) and mean squared error (MSE). Following Zhang and Lu (2007) and Wang and Song (2011), we summarize the median of MSEs over 100 replications to measure prediction accuracy; it is defined by MSE($β^$) = ($β^$β)T∑($β^$β), where ∑ is the population covariance matrix of the covariates. Here, the “C”(5 is the best) indicates the average number of regression coefficients, of the five true zeros, correctly found to zero, and “IC” (0 is the best) indicates the average number of the four true non-zeros incorrectly set to zero.

For the LN case we consider $σϵ2=1$, and for the Weibull σ = 0.5 (i.e., ψ = 1/σ = 2; increasing hazard), σ = 1 (i.e., ψ = 1/σ = 1; exponential distribution with constant hazard) and σ = 2 (i.e., ψ = 1/σ = 0.5; decreasing hazard). Table 1 (LN case) and Tables 24 (Weibull case) summarize the simulation results. Tables 14 indicate that the ALASSO and SCAD overall perform well as compared to the LASSO. The ALASSO and SCAD methods are further improved with n, while the LASSO method is not. In particular, the SCAD method outperforms the LASSO and ALASSO in terms of “C”, “PT”, and “MSE”.

We also investigated the robustness of the proposed method when the true distribution of ϵ in the AFT model (2.1) is misspecified. Following Xu et al. (2010), we considered two misspecified distributions, a t-distribution with degree of freedom 3 (denoted by t3) and a mixture distribution with 0.5N(0, 1) + 0.5N(0, 9) (denoted by Mix). Here, t3 and Mix are non-normal distributions with a common mean 0, but their variances are 3 and 5, respectively. For this purpose, the LN AFT model is fitted when the distribution of ϵ is N(0,1), t3 or Mix. Investigating the behavior of fitting LN AFT model is interesting because it becomes a classical normal regression model under log-transformation of survival time and its covariate effect is estimated unbiasedly even if the baseline distribution is misspecified under no censoring (Hutton and Monaghan, 2002). The simulation scheme is the same as before, except for considering an additional high censoring with 70%. Table 5 summarizes the results using a moderate sample size as in n = 300; Table 5 also shows that fitting the proposed LN AFT model is overall robust against misspecified distributions, t3 and Mix. As expected, the MSEs are increased with censoring rate from 45% to 70%. We find that the proposed method is still robust, except for a higher IC under SCAD with Mix distribution, when censoring rate is high as in 70%.

In addition, we investigated the robustness of Weibull AFT model against mis-specifying distribution. Here, Weibull AFT model is fitted when the distribution of ϵ is non-Weibull (i.e., t3 or Mix) under the same simulation scheme above. We again find that the simulation results (not shown) are similar to those evident in

4. Illustration

For the illustration of the proposed method in Section 2, we consider the PBC data of the liver (Tibshirani, 1997). A total of 424 PBC patients met eligibility criteria for the randomized placebo controlled trial of the drug D-penicillamine. Here we consider 312 patients who participated in the randomized trial. Censoring rate due to survival was 59.8%. Table 6 summarizes the variables used in the analysis. For the analyses, all covariates (i.e., all variables except for Id, Futime and Status in Table 6) are standardized.

As presented above, we consider the two AFT models (i.e., LN and Weibull cases) with covariates in Table 7. First, we use two standard criteria of model selection: Akaike information criterion (AIC) and BIC, given by AIC= −2 + 2p and BIC = −2 + log(n) * p. We conduct model selection under no penalty and choose a model with lower AIC and BIC values. Table 7 indicates the results.

From Table 7, we select the LN AFT model because the values of AIC and BIC in the LN are all smaller than those of the Weibull. Here we checked the adequacy of the lognormal assumption of survival time. This can be checked by a normal hazard plot (Klein and Moeschberger, 2003, p.410), i.e., we plot Φ−1(1 – Ŝ0(t)) versus log t as shown in Figure 2. Here, Φ−1(·) is the inverse (i.e., probit function) of standard normal cumulative distribution function and Ŝ0(t) is the Kaplan-Meier estimate of the baseline survival function S0(t). This is expected to show an approximate straight line if the assumption of lognormal distribution is appropriate. Figure 2 shows approximatively a linear trend for the probit survival against the log of time. Therefore, the assumption of lognormal as the baseline distribution seems appropriate: see also Royston (2001) for the usefulness of lognormal AFT model. Accordingly, we use the LN AFT model for the variable selection.

Table 8 shows the estimated coefficients and SEs for the PBC in the LN case. As the result of the penalized variable selection, the values of the tuning parameters γ that minimize the BIC* in (2.14) are 0.073 for LASSO, 0.013 for ALASSO and 0.110 for SCAD, respectively. The estimates of $σϵ2$ are 0.850, 0.629, 0.697 and 0.727 under no penalty (γ = 0), LASSO, ALASSO, and SCAD, respectively. The LASSO chooses eleven covariates (Age, Sex, Ascites, Spiders, Edema, Bili, Albumin, Copper, Sgot, Protime, and Stage) out of the 17 covariates except for the intercept. We also confirm that these variable-selection results are similar with the LASSO results in Cox’s PH model by Tibshirani (1997) even if signs of both estimates are opposite. The SCAD choose eight covariates (Age, Edema, Bili, Albumin, Copper, Sgot, Protime, and Stage). The ALASSO selects one more variable (i.e., Ascites) than in SCAD, which is not significant under no penalty. In particular, the non-zero estimates by the SCAD are generally similar to the corresponding estimates under no penalty. LASSO selects many covariates, which are not significant under no penalty. This may be because the LASSO selects unimportant variables more than ALASSO and SCAD, as evident in the lower “C” values of the LASSO in Table 1. The findings indicate that the LASSO might not properly identify important variables in the AFT models; for the frailty models, see Ha et al. (2014).

5. Discussion

Through penalized likelihood approach, we have shown the procedures that select important variables in the AFT model. We have demonstrated via simulation studies and illustration that the proposed variable-selection methods generally work well. Here we have found that the SCAD method performs better than the LASSO and ALASSO methods. The results confirm those in semi-parametric frailty hazard models by Ha et al. (2014).

The AFT model has some advantages over Cox’s PH model as follows (Ha et al., 2017, pp.31–32): (i)AFT model does not require a PH assumption (i.e., a strong assumption) as in the Cox’s model; (ii) The interpretation of regression coefficients is easier in the AFT model than in the Cox’s model; (iii) The estimated regression parameters in AFT model are relatively robust against misspecification of the model assumption, while ones in the Cox’s model can be biased. In addition, following Reid (1994), Cox pointed out that “AFT models are in many ways more appealing” than the PH models “because of their quite direct physical interpretation”.

We have also demonstrated via a simulation study that the proposed method is somewhat robust against misspecification of the assumed distribution. It would be also interested to investigate the robustness of the LN or Weibull AFT model against a further mis-specifying distribution, for example, when the true distribution of survival time T is not smooth and has change points. However, comparing with an existing variable selection procedure for semiparametric AFT model will be more informative about the setting in which the proposed method is useful; this would be an interesting future work.

We have developed the variable-selection methods in AFT models with low-dimensional covariates (n > p). Developing the penalized AFT models with high-dimensional covariates (n < p) would be an interesting topic. The proposed methods are based on parametric penalized-likelihood approaches that allow for LN and Weibull distributions. Therefore, an extension to semi-parametric AFT models (Huang et al., 2006; Huang and Ma, 2010) with an unspecified error distribution would be suitable for further work.

Furthermore, the proposed method can be extended to AFT models allowing for random effects that can be useful for analyzing correlated survival data (Ha et al., 2017).

Figures
Fig. 1. Penalty functions of LASSO and SCAD. LASSO = least absolute shrinkage and selection operator; SCAD = smoothly clipped absolute deviation.
Fig. 2. Plot of probit(1 – survival) against log of days.
TABLES

### Table 1

Simulation results under LN AFT model ($σϵ2=1$)

n Method C IC PT MSE
100 LASSO 2.62 0.00 0.02 0.132
ALASSO 4.18 0.00 0.41 0.079

300 LASSO 2.42 0.00 0.00 0.052
ALASSO 4.39 0.00 0.45 0.021

500 LASSO 2.68 0.00 0.03 0.032
ALASSO 4.50 0.00 0.59 0.015

LN= lognormal; AFT = accelerated failure time; MSE= mean squared error; LASSO = least absolute shrinkage and selection operator; ALASSO = adaptive LASSO; SCAD = smoothly clipped absolute deviation.

### Table 2

Simulation results under Weibull AFT model (σ = 0.5)

n Method C IC PT MSE
100 LASSO 2.18 0 0.00 0.569
ALASSO 4.29 0 0.51 0.027

300 LASSO 2.43 0 0.02 0.017
ALASSO 4.53 0 0.63 0.008

500 LASSO 2.57 0 0.02 0.011
ALASSO 4.66 0 0.72 0.005

AFT = accelerated failure time; MSE= mean squared error; LASSO = least absolute shrinkage and selection operator; ALASSO = adaptive LASSO; SCAD = smoothly clipped absolute deviation.

### Table 3

Simulation results under Weibull AFT model (σ = 1)

n Method C IC PT MSE
100 LASSO 2.41 0 0.02 0.203
ALASSO 4.02 0 0.35 0.112

300 LASSO 2.59 0 0.03 0.074
ALASSO 4.47 0 0.52 0.034

500 LASSO 2.51 0 0.04 0.052
ALASSO 4.52 0 0.58 0.018

AFT = accelerated failure time; MSE= mean squared error; LASSO = least absolute shrinkage and selection operator; ALASSO = adaptive LASSO; SCAD = smoothly clipped absolute deviation.

### Table 4

Simulation results under Weibull AFT model (σ = 2)

n Method C IC PT MSE
100 LASSO 2.95 0.12 0.05 0.746
ALASSO 4.05 0.32 0.30 0.596

300 LASSO 2.84 0.00 0.06 0.268
ALASSO 4.42 0.01 0.50 0.166

500 LASSO 2.92 0.00 0.02 0.212
ALASSO 4.48 0.00 0.55 0.091

AFT = accelerated failure time; MSE= mean squared error; LASSO = least absolute shrinkage and selection operator; ALASSO = adaptive LASSO; SCAD = smoothly clipped absolute deviation.

### Table 5

Simulation results of fitting LN AFT model (n = 300) when the normal assumption is satisfied or violated

Error Censoring Method C IC PT MSE
N(0, 1) 45% LASSO 2.42 0.00 0.00 0.052
ALASSO 4.39 0.00 0.45 0.021

70% LASSO 2.40 0.00 0.02 0.128
ALASSO 4.38 0.00 0.45 0.045

t3 45% LASSO 2.80 0.00 0.00 0.080
ALASSO 4.51 0.00 0.56 0.046

70% LASSO 2.61 0.00 0.00 0.176
ALASSO 4.32 0.01 0.43 0.109

Mix 45% LASSO 2.90 0.00 0.04 0.139
ALASSO 4.58 0.01 0.65 0.097

70% LASSO 2.88 0.00 0.02 0.252
ALASSO 4.42 0.02 0.52 0.192

Note: t3, t-distribution with degree of freedom 3; Mix, mixture distribution with 0.5N(0, 1) + 0.5N(0, 9).

LN= lognormal; AFT = accelerated failure time; MSE= mean squared error; LASSO = least absolute shrinkage and selection operator; ALASSO = adaptive LASSO; SCAD = smoothly clipped absolute deviation.

### Table 6

Explanation of variables for primary biliary cirrhosis data

Variable Explanation
Id Case number
Futime Number of days from registration to death
Status Status at endpoint (0: survival (59.8 %), 1: death)
Drug Types of drugs (1: D-penicillmain, 2: placebo)
Age In years
Sex Sex (0: male, 1: female)
Ascites Presence of ascites (0: no, 1: yes)
Hepato Presence of hepatomegaly or enlarged liver (0: no, 1: yes)
Spiders Blood vessel malformations in the skin (0: no, 1: yes)
Edema Presence of edema (0: no edema, 0.5: untreated or successfully treated, 1: edema despite diuretic therapy)
Bili Serum bilirunbin (mg/dl)
Chol Serum cholesterol (mg/dl)
Albumin Serum albumin (g/dl)
Copper Urine copper (ug/day)
Alk_phos Alkaline phosphotase (U/liter)
Sgot SGOT (U/ml)
Trig Triglycerides (mg/dl)
Platelet Platelets per cubic (ml/1000)
Protime Prothrombin time
Stage Histologic stage of disease

### Table 7

Model selection for AFT model with primary biliary cirrhosis data

AIC BIC
LN −195.41 426.82 492.00
Weibull −197.91 431.82 496.99

AFT = accelerated failure time; AIC = Akaike information criterion; BIC = Bayesian information criterion; LN = lognormal.

### Table 8

Variable selection using LN AFT model for primary biliary cirrhosis data

Variable No penalty LASSO LASSO ALASSO SCAD
Intercept 8.073(0.086) 7.885(0.060) - 7.994(0.065) 7.989(0.066)
Drug −0.002(0.069) 0.000(0.000) 0.00(0.00) 0.000(0.000) 0.000(0.000)
Age −0.221(0.080) −0.139(0.039) 0.17(0.09) −0.179(0.047) −0.099(0.028)
Sex 0.091(0.068) 0.016(0.011) −0.01(0.03) 0.000(0.000) 0.000(0.000)
Ascites −0.112(0.076) −0.092(0.032) 0.04(0.07) −0.023(0.009) 0.000(0.000)
Hepato −0.005(0.080) 0.000(0.000) 0.00(0.00) 0.000(0.000) 0.000(0.000)
Spiders −0.116(0.072) −0.051(0.024) 0.02(0.05) 0.000(0.000) 0.000(0.000)
Edema −0.185(0.081) −0.191(0.042) 0.18(0.11) −0.246(0.046) −0.304(0.053)
Bili −0.202(0.086) −0.204(0.043) 0.35(0.12) −0.244(0.047) −0.306(0.053)
Chol −0.048(0.074) 0.000(0.000) 0.00(0.00) 0.000(0.000) 0.000(0.000)
Albumin 0.106(0.077) 0.100(0.034) −0.22(0.10) 0.029(0.011) 0.051(0.018)
Copper −0.148(0.073) −0.152(0.040) 0.21(0.11) −0.143(0.037) −0.116(0.031)
Alk_phos −0.040(0.061) 0.000(0.000) 0.00(0.00)) 0.000(0.000) 0.000(0.000)
Sgot −0.187(0.075) −0.103(0.035) 0.09(0.08) −0.118(0.038) −0.030(0.012)
Trig 0.022(0.072) 0.000(0.000) 0.00(0.00) 0.000(0.000) 0.000(0.000)
Platelet 0.004(0.072) 0.000(0.000) 0.00(0.00) 0.000(0.000) 0.000(0.000)
Protime −0.167(0.073) −0.123(0.038) 0.09(0.09) −0.133(0.038) −0.080(0.024)
Stage −0.244(0.091) −0.181(0.044) 0.21(0.09) −0.259(0.055) −0.275(0.057)

### †

indicates the results of variable selection from Cox’s PH model by Tibshirani (1997).

LN = lognormal; AFT = accelerated failure time; LASSO = least absolute shrinkage and selection operator; ALASSO = adaptive LASSO; SCAD = smoothly clipped absolute deviation.

References
1. Cai, T, Huang, J, and Tian, L (2009). Regularized estimation for the accelerated failure time model. Biometrics. 65, 394-404.
2. Cox, DR (1972). Regression models and life-tables. Journal of the Royal Statistical Society-Series B. 34, 187-220.
3. Fan, J, and Li, R (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association. 96, 1348-1360.
4. Fan, J, and Li, R (2002). Variable selection for Cox’s proportional hazards model and frailty model. The Annals of Statistics. 30, 74-99.
5. Ha, ID, Jeong, JH, and Lee, Y (2017). Statistical Modelling of Survival Data with Random Effects: h-Likelihood Approach. Singapore: Springer
6. Ha, ID, Lee, Y, and Song, JK (2002). Hierarchical likelihood approach for mixed linear models with censored data. Lifetime Data Analysis. 8, 163-176.
7. Ha, ID, Pan, J, Oh, S, and Lee, Y (2014). Variable selection in general frailty models using penalized h-likelihood. Journal of Computational and Graphical Statistics. 23, 1044-1060.
8. Huang, J, and Ma, S (2010). Variable selection in the accelerated failure time model via the bridge method. Lifetime Data Analysis. 16, 176-195.
9. Huang, J, Ma, S, and Xie, H (2006). Regularized estimation in the accelerated failure time model with high-dimensional covariates. Biometrics. 62, 813-820.
10. Hutton, JL, and Monaghan, PF (2002). Choice of parametric accelerated life and proportional hazard models for survival data: asymptotic results. Lifetime Data Analysis. 8, 375-393.
11. Klein, JP, and Moeschberger, S (2003). Survival Analysis: Techniques for Censored and Truncated Data. Berlin: Springer
12. Lawless, JF (1982). Statistical Models and Methods for Lifetime data. New York: Wiley
13. Reid, N (1994). A conversation with Sir David Cox. Statistical Science. 9, 439-455.
14. Royston, P (2001). The lognormal distribution as a model for survival time in cancer, with an emphasis on prognostic factors. Statistica Neerlandica. 55, 89-104.
15. Tibshirani, R (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society B. 58, 267-288.
16. Tibshirani, R (1997). The LASSO method for variable selection in the Cox model. Statistics in Medicine. 16, 385-395.
17. Wang, H, Li, R, and Tsai, CL (2007). Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika. 94, 553-568.
18. Wang, X, and Song, L (2011). Adaptive lasso variable selection for the accelerated failure models. Communications in Statistics - Theory and Methods. 40, 4372-4386.
19. Xu, J, Leng, C, and Ying, Z (2010). Rank-based variable selection with censored data. Statistics and computing. 20, 165-176.
20. Zhang, HH, and Lu, W (2007). Adaptive LASSO for Cox’s proportional hazards model. Biometrika. 94, 691-703.
21. Zhang, Z, Sinha, S, Maiti, T, and Shipp, E (2018). Bayesian variable selection in the accelerated failure time model with an application to the surveillance, epidemiology, and end results breast cancer data. Statistical Methods Medical Research. 27, 971-990.
22. Zhou, M (2005). Empirical likelihood analysis of the rank estimator for the censored accelerated failure time model. Biometrika. 92, 492-498.
23. Zou, H (2006). The adaptive Lasso and its oracle properties. Journal of American Statistical Association. 101, 1418-1429.