
Linear discriminant analysis (LDA) requires an estimation of the inverse conditional covariance matrix where the pooled sample covariance matrix is a manageable solution. However, the pooled sample covariance matrix is singular when the model is high-dimensional where the number of predictive variables exceeds the sample size. The performance of the LDA is undermined (Krzanowski
Meanwhile, there is a direct connection between the LDA and linear regression (Hastie
In this paper, we propose the penalized LDA with the moderately clipped LASSO (MCL) (Kwon
We proved that, with probability tending to one, MCL penalized LDA is the same as the oracle LASSO that is a theoretically optimal LASSO obtained by using relevant predictive variables only. The equivalence can be obtained using LASSO penalized LDA as proved by Mai
The rest of the paper is organized as follows. Section 2 introduces the penalized LDA. Section 3 introduces MCL and related statistical properties. Section 4 shows the results of numerical studies. Relevant proofs are given in the
Let
which minimizes the misclassification error,
where
where
is the Bayes direction vector with
Let
where
where the class probabilities are estimated by the sample class proportions,
There is an intimate connection (Hastie
for some constant
ary and
which implies that we can cast the LDA into the framework of the LSE.
The equations in (
for some
In addition, Mai
whenever
A natural extension for LASSO in (
where
and
for some
Note that ∇
In this subsection, we provide some asymptotic properties for MCL. We assume that there is a nonempty subset such that
and
. The main results imply that MCL is asymptotically equivalent to a theoretical estimator, the oracle LASSO:
for some is unknown. However, the oracle LASSO plays an important role in developing asymptotic properties of MCL as studied in Mai
Before proceeding, we define some notations. For any vector , let
. For any matrix
and
, let
, and
the cardinality of
.
We first introduce a lemma that gives sufficient conditions for the uniqueness of a minimizer of
.
Lemma 1 is a slight modification of the second order Karush-Kuhn-Tucker sufficient conditions for
Note that
Lemma 1 implies that
,
.
Lemma 2 gives sufficient conditions for a minimizer to be unique in
.
Note that the second condition in Lemma 3 is equivalent to the first condition in Lemma 2 under the first condition in Lemma 3 so that Lemma 3 is a corollary of Lemma 2.
We now present the main results that show the oracle LASSO satisfies the conditions in Lemma 3 asymptotically so that the oracle LASSO is equivalent to MCL defined in (
(C1) There are positive constants,
(C2) The model and tuning parameters satisfy
as and
(C3) There exists a sequence
as
Condition (C1) is given by Mai of
which corresponds to the sparse Riesz condition in Zhang (2010) imposed on the design matrix in the linear regression.
Theorem 1 holds for
The conditions (C2) and (C3) can be simplified as
if
for the high-dimensional linear regression (Kwon
In this section, we present the results of numerical studies including simulations and real data analysis. We obtained all the penalized estimators using R package
The simulation studies were based on the conditional distribution in (
We set
For each method, we used
Table 1 shows the results when
Table 2 shows the results when
We conclude that MCL can be a nice alternative to LASSO for the high-dimensional penalized LDA. The MCL can correctly identify the sparse Bayes direction vector, keeping almost the same prediction accuracy as LASSO. The MCL1 performed quite well regardless of the simulation designs considered in this paper, which aligns with the recommendation for the heuristic choice of
The R package
Table 3 summarizes the results. In most cases, LASSO had the best prediction accuracy selecting the most variables. The MCPs had the worst prediction accuracy but selected the least variables. The best prediction accuracy occurred when
In this paper, we studied the high dimensional penalized LDA with MCL. The nature of MCL produces the same shrinkage effect as LASSO and makes the selection process the same as MCP. Therefore MCL shows similar or better prediction accuracy than LASSO correctly recovering the sparsity of the direction vector. We proved that MCL is selection consistent under reasonable regularity conditions, which was supported by various numerical experiments. One disadvantage of MCL compared with LASSO may be the additional tuning parameter
Averages of the four measures when
300 | 600 | 900 | 300 | 600 | 900 | 300 | 600 | 900 | 300 | 600 | 900 | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
TPS | Bayes | 5 | 5 | 5 | 5 | 5 | 5 | 10 | 10 | 10 | 10 | 10 | 10 |
Lasso | 5.000 | 5.000 | 5.000 | 4.990 | 5.000 | 5.000 | 9.760 | 10.000 | 10.000 | 9.140 | 9.985 | 10.000 | |
MCL1 | 4.995 | 5.000 | 5.000 | 4.945 | 5.000 | 5.000 | 9.025 | 9.970 | 10.000 | 8.110 | 9.790 | 9.995 | |
MCL2 | 4.905 | 5.000 | 5.000 | 4.610 | 4.995 | 5.000 | 7.785 | 9.855 | 9.985 | 5.615 | 9.285 | 9.915 | |
MCL3 | 3.975 | 4.980 | 5.000 | 2.645 | 4.795 | 4.995 | 3.405 | 8.205 | 9.640 | 1.225 | 5.210 | 8.255 | |
MCP1 | 4.865 | 5.000 | 5.000 | 4.580 | 5.000 | 5.000 | 6.595 | 9.580 | 9.970 | 5.200 | 9.050 | 9.855 | |
MCP2 | 4.965 | 5.000 | 5.000 | 4.900 | 5.000 | 5.000 | 8.450 | 9.940 | 9.995 | 7.440 | 9.710 | 9.975 | |
MCP3 | 4.980 | 5.000 | 5.000 | 4.940 | 5.000 | 5.000 | 9.155 | 9.970 | 10.000 | 8.250 | 9.860 | 9.995 | |
FPS | Bayes | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Lasso | 22.620 | 22.535 | 22.770 | 24.500 | 24.930 | 24.565 | 41.920 | 44.765 | 42.535 | 41.700 | 45.615 | 45.305 | |
MCL1 | 1.655 | 1.555 | 1.605 | 2.170 | 1.895 | 2.435 | 5.060 | 1.590 | 1.110 | 9.765 | 2.670 | 1.200 | |
MCL2 | 0.065 | 0.015 | 0.005 | 0.100 | 0.015 | 0.015 | 0.425 | 0.100 | 0.020 | 0.430 | 0.105 | 0.025 | |
MCL3 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.015 | 0.000 | 0.000 | 0.010 | 0.000 | 0.000 | |
MCP1 | 0.705 | 0.385 | 0.370 | 0.780 | 0.410 | 0.355 | 1.420 | 1.050 | 0.410 | 1.275 | 1.320 | 0.735 | |
MCP2 | 6.720 | 3.795 | 1.985 | 7.180 | 5.175 | 2.835 | 8.560 | 12.285 | 9.380 | 8.885 | 12.270 | 10.625 | |
MCP3 | 8.790 | 7.670 | 5.885 | 10.090 | 9.010 | 7.335 | 15.790 | 16.195 | 13.510 | 16.385 | 18.165 | 15.685 | |
CMI | Bayes | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Lasso | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
MCL1 | 0.610 | 0.550 | 0.585 | 0.455 | 0.550 | 0.590 | 0.020 | 0.435 | 0.590 | 0.000 | 0.300 | 0.595 | |
MCL2 | 0.855 | 0.985 | 0.995 | 0.620 | 0.980 | 0.985 | 0.060 | 0.790 | 0.965 | 0.000 | 0.430 | 0.900 | |
MCL3 | 0.390 | 0.980 | 1.000 | 0.080 | 0.815 | 0.995 | 0.000 | 0.155 | 0.715 | 0.000 | 0.000 | 0.170 | |
MCP1 | 0.475 | 0.725 | 0.725 | 0.235 | 0.695 | 0.735 | 0.000 | 0.230 | 0.670 | 0.000 | 0.100 | 0.415 | |
MCP2 | 0.025 | 0.090 | 0.385 | 0.010 | 0.075 | 0.190 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.005 | |
MCP3 | 0.000 | 0.025 | 0.065 | 0.005 | 0.025 | 0.065 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | |
ERR | Bayes | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 |
Lasso | 0.2169 | 0.2064 | 0.2046 | 0.2204 | 0.2082 | 0.2036 | 0.2442 | 0.2168 | 0.2102 | 0.2571 | 0.2223 | 0.2116 | |
MCL1 | 0.2085 | 0.2030 | 0.2031 | 0.2114 | 0.2043 | 0.2018 | 0.2372 | 0.2098 | 0.2056 | 0.2521 | 0.2153 | 0.2059 | |
MCL2 | 0.2188 | 0.2054 | 0.2037 | 0.2261 | 0.2073 | 0.2027 | 0.2653 | 0.2202 | 0.2101 | 0.2769 | 0.2306 | 0.2132 | |
MCL3 | 0.2576 | 0.2125 | 0.2059 | 0.2739 | 0.2213 | 0.2072 | 0.3466 | 0.2582 | 0.2259 | 0.3235 | 0.2816 | 0.2433 | |
MCP1 | 0.2107 | 0.2020 | 0.2028 | 0.2159 | 0.2029 | 0.2008 | 0.2644 | 0.2127 | 0.2045 | 0.2706 | 0.2196 | 0.2051 | |
MCP2 | 0.2163 | 0.2032 | 0.2027 | 0.2201 | 0.2048 | 0.2015 | 0.2559 | 0.2167 | 0.2078 | 0.2647 | 0.2240 | 0.2095 | |
MCP3 | 0.2159 | 0.2052 | 0.2034 | 0.2189 | 0.2066 | 0.2028 | 0.2498 | 0.2165 | 0.2094 | 0.2600 | 0.2232 | 0.2102 |
TPS = number of true positive selection; FPS = number of false positive selection; CMI = correct model identification; ERR = classification error rate.
Averages of the four measures when
300 | 600 | 900 | 300 | 600 | 900 | 300 | 600 | 900 | 300 | 600 | 900 | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
TPS | Bayes | 5 | 5 | 5 | 5 | 5 | 5 | 10 | 10 | 10 | 10 | 10 | 10 |
Lasso | 4.960 | 5.000 | 5.000 | 4.750 | 5.000 | 5.000 | 7.945 | 9.970 | 10.000 | 5.310 | 9.785 | 9.995 | |
MCL1 | 4.860 | 5.000 | 5.000 | 4.430 | 4.990 | 5.000 | 6.145 | 9.815 | 9.985 | 4.100 | 8.915 | 9.940 | |
MCL2 | 3.590 | 4.930 | 5.000 | 2.875 | 4.585 | 4.985 | 3.615 | 8.305 | 9.825 | 1.690 | 5.960 | 8.700 | |
MCL3 | 2.225 | 3.125 | 4.435 | 1.785 | 2.585 | 3.395 | 1.620 | 3.460 | 5.385 | 0.355 | 2.410 | 3.425 | |
MCP1 | 4.875 | 5.000 | 5.000 | 4.480 | 4.990 | 5.000 | 5.200 | 9.715 | 9.995 | 3.195 | 8.810 | 9.935 | |
MCP2 | 4.960 | 5.000 | 5.000 | 4.850 | 5.000 | 5.000 | 6.675 | 9.960 | 10.000 | 4.570 | 9.635 | 9.995 | |
MCP3 | 4.965 | 5.000 | 5.000 | 4.870 | 5.000 | 5.000 | 7.220 | 9.930 | 10.000 | 5.215 | 9.675 | 9.995 | |
FPS | Bayes | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Lasso | 42.995 | 43.620 | 44.890 | 43.260 | 49.450 | 47.895 | 64.935 | 89.360 | 90.535 | 39.150 | 92.305 | 97.670 | |
MCL1 | 1.655 | 0.595 | 1.015 | 3.860 | 0.860 | 0.780 | 10.120 | 2.315 | 0.705 | 10.955 | 6.495 | 1.555 | |
MCL2 | 0.780 | 0.165 | 0.150 | 0.675 | 0.390 | 0.140 | 2.360 | 2.040 | 0.595 | 0.860 | 2.585 | 1.680 | |
MCL3 | 0.055 | 0.005 | 0.005 | 0.045 | 0.000 | 0.005 | 0.215 | 0.050 | 0.015 | 0.040 | 0.010 | 0.005 | |
MCP1 | 0.790 | 0.400 | 0.320 | 1.310 | 0.435 | 0.375 | 1.570 | 1.265 | 0.440 | 1.340 | 2.010 | 0.830 | |
MCP2 | 4.160 | 1.475 | 1.805 | 6.835 | 2.210 | 1.445 | 12.175 | 7.875 | 3.140 | 10.275 | 14.315 | 4.255 | |
MCP3 | 13.330 | 4.360 | 2.815 | 17.530 | 6.785 | 3.180 | 27.175 | 26.075 | 11.665 | 22.630 | 36.215 | 19.085 | |
CMI | Bayes | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Lasso | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
MCL1 | 0.340 | 0.665 | 0.630 | 0.110 | 0.605 | 0.620 | 0.000 | 0.205 | 0.605 | 0.000 | 0.010 | 0.340 | |
MCL2 | 0.060 | 0.800 | 0.895 | 0.010 | 0.435 | 0.890 | 0.000 | 0.025 | 0.505 | 0.000 | 0.000 | 0.090 | |
MCL3 | 0.000 | 0.080 | 0.605 | 0.000 | 0.015 | 0.145 | 0.000 | 0.000 | 0.015 | 0.000 | 0.000 | 0.000 | |
MCP1 | 0.475 | 0.665 | 0.755 | 0.210 | 0.690 | 0.735 | 0.000 | 0.295 | 0.700 | 0.000 | 0.080 | 0.475 | |
MCP2 | 0.100 | 0.480 | 0.595 | 0.050 | 0.400 | 0.555 | 0.000 | 0.030 | 0.255 | 0.000 | 0.000 | 0.095 | |
MCP3 | 0.000 | 0.105 | 0.410 | 0.005 | 0.045 | 0.215 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | |
ERR | Bayes | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 |
Lasso | 0.2600 | 0.2283 | 0.2215 | 0.2662 | 0.2326 | 0.2199 | 0.3353 | 0.2536 | 0.2317 | 0.3346 | 0.2681 | 0.2387 | |
MCL1 | 0.2364 | 0.2176 | 0.2157 | 0.2465 | 0.2171 | 0.2125 | 0.3191 | 0.2283 | 0.2153 | 0.3234 | 0.2480 | 0.2176 | |
MCL2 | 0.2896 | 0.2455 | 0.2302 | 0.2800 | 0.2503 | 0.2306 | 0.3570 | 0.2881 | 0.2504 | 0.3267 | 0.2909 | 0.2631 | |
MCL3 | 0.3163 | 0.2988 | 0.2754 | 0.2965 | 0.2795 | 0.2717 | 0.4002 | 0.3480 | 0.3313 | 0.3320 | 0.3127 | 0.3048 | |
MCP1 | 0.2214 | 0.2125 | 0.2121 | 0.2298 | 0.2105 | 0.2085 | 0.3044 | 0.2175 | 0.2095 | 0.3081 | 0.2311 | 0.2097 | |
MCP2 | 0.2192 | 0.2127 | 0.2121 | 0.2226 | 0.2107 | 0.2085 | 0.2843 | 0.2134 | 0.2094 | 0.3030 | 0.2228 | 0.2081 | |
MCP3 | 0.2272 | 0.2135 | 0.2120 | 0.2314 | 0.2122 | 0.2090 | 0.2903 | 0.2199 | 0.2114 | 0.3095 | 0.2307 | 0.2115 |
TPS = number of true positive selection; FPS = number of false positive selection; CMI = correct model identification; ERR = classification error rate.
number of incorrectly classified samples and average of the model sizes
LASSO | MCL1 | MCL2 | MCL3 | MCP1 | MCP2 | MCP3 | |||
---|---|---|---|---|---|---|---|---|---|
Chowdary | Errors | 400 | 1 | 1 | 2 | 2 | 4 | 4 | 5 |
800 | 2 | 2 | 2 | 4 | 3 | 6 | 3 | ||
1600 | 3 | 4 | 2 | 2 | 7 | 7 | 6 | ||
Sizes | 400 | 36.28 | 28.50 | 24.78 | 21.84 | 11.44 | 11.60 | 12.33 | |
800 | 35.20 | 25.46 | 23.10 | 20.17 | 10.10 | 10.78 | 13.38 | ||
1600 | 37.29 | 27.85 | 24.75 | 21.06 | 9.31 | 10.63 | 10.53 | ||
Gordon | Errors | 400 | 2 | 2 | 2 | 1 | 3 | 3 | 3 |
800 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | ||
1600 | 2 | 2 | 3 | 3 | 3 | 3 | 2 | ||
Sizes | 400 | 41.28 | 37.33 | 24.41 | 16.71 | 7.17 | 6.79 | 7.03 | |
800 | 44.31 | 37.72 | 22.46 | 15.33 | 11.71 | 12.22 | 13.08 | ||
1600 | 49.06 | 42.38 | 28.46 | 25.50 | 19.31 | 16.50 | 15.75 | ||
Burczynski | Errors | 400 | 6 | 6 | 7 | 9 | 15 | 13 | 15 |
800 | 8 | 9 | 13 | 16 | 13 | 14 | 17 | ||
1600 | 8 | 9 | 10 | 13 | 17 | 16 | 16 | ||
Sizes | 400 | 47.67 | 41.50 | 26.19 | 24.06 | 13.70 | 13.81 | 13.13 | |
800 | 50.04 | 43.92 | 25.45 | 19.13 | 12.02 | 11.81 | 10.75 | ||
1600 | 49.35 | 43.69 | 22.50 | 18.59 | 17.14 | 17.83 | 14.77 | ||
Chin | |||||||||
Errors | 400 | 12 | 11 | 13 | 15 | 21 | 22 | 22 | |
800 | 15 | 16 | 16 | 18 | 21 | 21 | 20 | ||
1600 | 14 | 18 | 15 | 14 | 13 | 13 | 13 | ||
Sizes | 400 | 33.62 | 27.28 | 21.53 | 15.16 | 9.74 | 9.69 | 9.61 | |
800 | 41.48 | 31.98 | 23.98 | 16.36 | 5.22 | 5.04 | 4.97 | ||
1600 | 33.22 | 18.94 | 13.81 | 12.80 | 4.00 | 4.11 | 4.80 |