Binary classification is frequently encountered in machine learning applications. Whaba (2002) categorizes the binary classification into hard and soft classification. Hard classification predicts a class label directly and the support vector machine (SVM) (Vapnik, 1995) that trains the decision boundary falls in this category. However, soft classification seeks the class probability denoted by
Recent applications often regarded a predictor as a function,
We propose a model-free soft classification method with a functional predictor. In particular, we extend the idea of Wang
The rest of the article is organized as follows. In Section 2, we provide a review of the probability estimation scheme based on WSVM that serves as a building block of our proposal. In Section 3, we propose a class probability estimator in binary classification with a functional predictor, and then extend the idea to the multi-class classification via a pairwise coupling algorithm. In Section 4, we conduct simulation studies to evaluate the finite sample performance of the proposed method, and illustration to real data in Section 5. Finally, concluding summaries are given in Section 6.
We start with a brief review of a model-free class probability estimation scheme based on the WSVM proposed by Wang
Suppose we are given a set of training samples
where
Plugging (2.2) into (2.1), the WSVM (2.1) is equivalently rewritten as the finite dimensional optimization problem:
A connection between decision function
where
Fisher consistency (2.4) provides a natural way to recover
where
We are given a set of binary responses
Park
where
Now, it is straightforward to define the FWSVM as the WSVM (2.3) with the kernel given in (3.1). In this regard, FWSVM is a version of WSVM (2.3). One distinguishing feature of the WSVM and the FWSVM is the piecewise linearity of (
Figure 1 shows the illustration of the piecewise linear solution paths of (a)
We extend the proposed method to the multi-class problem with a
where
where
Finally, we estimate
We conduct a simulation study to evaluate the finite-sample performance of the proposed probability estimator. We first generate a binary response
where
We compute the thee quantities as performance measure for the independent test set (
CE:
PD:
WD:
All the three performance measures yield the smaller values for the better performance in estimating the class probability. Finally, we generated
For training the FWSVM, the B-spline basis system with 10 equally spaced knots is employed and
In binary classification, we consider four different mean functions as follows.
(B1)
(B2)
(B3)
(B4)
Model (B1) and (B2) are linear while (B3) and (B4) are nonlinear. The mean functions of two classes are parallel in (B1) and (B3), and crossed in (B2) and (B4). Table 1 reports the comparison results to FLR under the independent covariance kernel function. One can observe that the proposed method outperforms FLR for all scenarios under consideration. The results are similar for other two different covariance kernels which are relegated to Supplementary Materials to avoid redundancy.
We consider three-class classification with four different mean functions as:
(M1)
(M2)
(M3)
(M4)
Analogous to the binary classification model (B1)–(B4), Model (M1) and (M2) have linear and (B3) and (B4) have nonlinear mean functions for each classes. The mean functions are parallel in (M1) and (M3), and crossed in (M2) and (M4). Table 2 reports the comparison results under the independent covariance kernel. For the multi-class cases, WD measure is not considered due to the ambiguity in determining suitable weights. Similar to the binary cases, our method outperforms the FLR in all scenarios under consideration. Again, the results for the other covariance structures are relegated to Supplementary Materials.
Our method shows advantageous performance in estimating class probability estimator in both binary and multi-class classifications.
In this section, we illustrate our method to two real data sets: tecator data for binary, and phoneme data for multi-class classification.
Tecator data consists of 215 meat samples. Response variable is the percentage of fat content in meat, and we dichotomize it: take 1 if it is greater than 15 and −1 otherwise. Functional predictor is the absorbance values measured on each 100 wavelengths. Figure 2 depicts the (a) functional predictors and their (b) derivatives, both of which are obtained by employing a B-spline basis with 10 equally spaced knots. We decide to use derivatives as predictors because the two classes are more clearly separated in terms of derivatives. We employ the grid search to find an optimal
Phoneme data consists of 500 samples with five different classes, containing 100 samples in each class. The response variable represents five phonemes: /sh/, /dcl/, /iy/, /aa/, and /ao/. The predictor is functional and composed of values of log-periodogram observed at each 150 discretized frequency points. Figure 4 depicts the functional predictors in each classes of the Phoneme data estimated from the B-spline system with 10 equally spaced knots.
We obtain an optimal
In this article, we propose a model-free approach to estimate class probability when the predictor is functional, by extending the idea of Wang
This work was supported by a National Research Foundation of Korea (NRF) grant funded by the Korea government (MIST), Grant number 2018R1D1A1B07043034 and No.2019R1A4A1028134.
Piecewise linear solution paths for the FWSVM solution as function of
Tecator data: functional predictors obtained by employing the B-spline basis system. Derivative looks more informative for the classification.
Tecator data: (a) depicts the cross-validated CE for different values of
Phoneme data: illustrations of functional predictors obtained by employing B-spline. Class 4 and 5 look difficult to distinguish.
Phoneme data: sub-panel (a) depicts the cross-validated CE. (b) compares the cross-validated class probability estimates of
Simulation results for binary classification with the independent covariance kernel
Model | CE | PD | WD | |||||
---|---|---|---|---|---|---|---|---|
FWSVM | FLR | FWSVM | FLR | FWSVM | FLR | |||
(B1) | 100 | 0.3 | 0.001(0.000) | 0.031(0.004) | 0.001(0.000) | 0.029(0.004) | 0.000(0.000) | 0.000(0.000) |
0.5 | 0.000(0.000) | 0.041(0.005) | 0.000(0.000) | 0.039(0.005) | 0.000(0.000) | 0.000(0.000) | ||
300 | 0.3 | 0.000(0.000) | 0.029(0.002) | 0.000(0.000) | 0.028(0.002) | 0.000(0.000) | 0.000(0.000) | |
0.5 | 0.000(0.000) | 0.039(0.003) | 0.000(0.000) | 0.037(0.003) | 0.000(0.000) | 0.000(0.000) | ||
(B2) | 100 | 0.3 | 0.042(0.011) | 0.130(0.011) | 0.032(0.004) | 0.068(0.007) | 0.010(0.001) | 0.017(0.002) |
0.5 | 0.110(0.018) | 0.182(0.014) | 0.046(0.006) | 0.087(0.008) | 0.016(0.002) | 0.023(0.002) | ||
300 | 0.3 | 0.025(0.007) | 0.125(0.007) | 0.039(0.003) | 0.063(0.004) | 0.012(0.001) | 0.016(0.001) | |
0.5 | 0.086(0.015) | 0.174(0.009) | 0.054(0.019) | 0.099(0.030) | 0.018(0.005) | 0.024(0.004) | ||
(B3) | 100 | 0.3 | 0.001(0.000) | 0.031(0.004) | 0.001(0.000) | 0.029(0.004) | 0.000(0.000) | 0.000(0.000) |
0.5 | 0.000(0.000) | 0.041(0.005) | 0.000(0.000) | 0.039(0.005) | 0.000(0.000) | 0.000(0.000) | ||
300 | 0.3 | 0.000(0.000) | 0.029(0.002) | 0.000(0.000) | 0.028(0.002) | 0.000(0.000) | 0.000(0.000) | |
0.5 | 0.000(0.000) | 0.039(0.003) | 0.000(0.000) | 0.037(0.003) | 0.000(0.000) | 0.000(0.000) | ||
(B4) | 100 | 0.3 | 0.034(0.007) | 0.124(0.010) | 0.031(0.007) | 0.080(0.018) | 0.009(0.002) | 0.017(0.002) |
0.5 | 0.095(0.016) | 0.174(0.013) | 0.041(0.005) | 0.089(0.008) | 0.014(0.002) | 0.022(0.002) | ||
300 | 0.3 | 0.020(0.006) | 0.120(0.007) | 0.032(0.002) | 0.066(0.004) | 0.009(0.001) | 0.016(0.001) | |
0.5 | 0.074(0.014) | 0.166(0.008) | 0.038(0.004) | 0.083(0.005) | 0.013(0.001) | 0.021(0.001) |
Averaged values of CE, PD, and WD over 100 independent repetitions are reported along with the corresponding standard errors in parentheses.
CE = cross entropy; PD = absolute probability difference; WD = weighted absolute probability difference; FWSVM = functional weighted support vector machines; FLR = functional logistic regression.
Simulation results for multi-class classification with the independent covariance kernel
Model | CE | PD | ||||
---|---|---|---|---|---|---|
FWSVM | FLR | FWSVM | FLR | |||
(M1) | 200 | 0.3 | 0.002(0.001) | 0.100(0.005) | 0.002(0.001) | 0.090(0.005) |
0.5 | 0.007(0.003) | 0.138(0.007) | 0.005(0.001) | 0.117(0.006) | ||
500 | 0.3 | 0.000(0.000) | 0.097(0.004) | 0.001(0.000) | 0.087(0.004) | |
0.5 | 0.005(0.003) | 0.134(0.006) | 0.003(0.001) | 0.114(0.005) | ||
(M2) | 200 | 0.3 | 0.005(0.002) | 0.094(0.006) | 0.005(0.001) | 0.079(0.005) |
0.5 | 0.020(0.006) | 0.129(0.007) | 0.010(0.002) | 0.100(0.006) | ||
500 | 0.3 | 0.002(0.002) | 0.092(0.005) | 0.005(0.001) | 0.078(0.004) | |
0.5 | 0.015(0.005) | 0.127(0.006) | 0.009(0.002) | 0.099(0.005) | ||
(M3) | 200 | 0.3 | 0.002(0.001) | 0.100(0.005) | 0.002(0.001) | 0.090(0.005) |
0.5 | 0.007(0.003) | 0.138(0.007) | 0.005(0.001) | 0.117(0.006) | ||
500 | 0.3 | 0.000(0.000) | 0.097(0.004) | 0.001(0.000) | 0.087(0.004) | |
0.5 | 0.005(0.003) | 0.134(0.006) | 0.003(0.001) | 0.114(0.005) | ||
(M4) | 200 | 0.3 | 0.002(0.001) | 0.108(0.007) | 0.002(0.001) | 0.096(0.006) |
0.5 | 0.007(0.003) | 0.149(0.009) | 0.005(0.001) | 0.125(0.007) | ||
500 | 0.3 | 0.000(0.000) | 0.106(0.005) | 0.001(0.000) | 0.094(0.004) | |
0.5 | 0.004(0.003) | 0.146(0.007) | 0.003(0.001) | 0.123(0.005) |
Averaged values of CE and PD over 100 independent repetitions are reported along with the corresponding standard errors in parentheses.
CE = cross entropy; PD = absolute probability difference; FWSVM = functional weighted support vector machines; FLR = functional logistic regression.