search for

CrossRef (0)
The shifted Chebyshev series-based plug-in for bandwidth selection in kernel density estimation
Communications for Statistical Applications and Methods 2024;31:337-347
Published online May 31, 2024
© 2024 Korean Statistical Society.

Soratja Klaichimab, Juthaphorn Sinsomboonthonga, Thidaporn Supapakorn1,a

aDepartment of Statistics, Faculty of Science, Kasetsart University, Thailand;
bFaculty of Liberal Arts, Rajamangala University of Technology Rattanakosin, Thailand
Correspondence to: 1 Department of Statistics, Faculty of Science, Kasetsart University, Bangkok, Thailand 10903. E-mail: fscitdps@ku.ac.th
Received September 17, 2023; Revised January 26, 2024; Accepted February 12, 2024.
Kernel density estimation is a prevalent technique employed for nonparametric density estimation, enabling direct estimation from the data itself. This estimation involves two crucial elements: selection of the kernel function and the determination of the appropriate bandwidth. The selection of the bandwidth plays an important role in kernel density estimation, which has been developed over the past decade. A range of methods is available for selecting the bandwidth, including the plug-in bandwidth. In this article, the proposed plug-in bandwidth is introduced, which leverages shifted Chebyshev series-based approximation to determine the optimal bandwidth. Through a simulation study, the performance of the suggested bandwidth is analyzed to reveal its favorable performance across a wide range of distributions and sample sizes compared to alternative bandwidths. The proposed bandwidth is also applied for kernel density estimation on real dataset. The outcomes obtained from the proposed bandwidth indicate a favorable selection. Hence, this article serves as motivation to explore additional plug-in bandwidths that rely on function approximations utilizing alternative series expansions.
Keywords : kernel density estimation, Chebyshev, shifted Chebyshev, plug-in, bandwidth
1. Introduction

Density estimation is indeed the process of constructing an estimate of the probability density function (pdf) from an available data. This estimation not only represents the data distribution but also provides summary statistics such as the mean, median, variance, moments and quantiles. Furthermore, density estimates provide information about distribution characteristics, including skewness, kurtosis, and multimodality within the data. The estimation of pdf is a fundamental concept in statistics and a widely researched topic. There are two commonly methods for density estimation: parametric and nonparametric methods. The parametric method assumes that the data is drawn from a known distribution, whereas the nonparametric method aims to estimate the density function directly from the data. Several nonparametric density estimation techniques commonly include the histogram, naïve density estimator, nearest neighbor method, and orthogonal series estimator. Kernel density estimation (KDE) is a widely used nonparametric density estimation. The KDE relies on the kernel function which determines the weight assigned to each data point, and the bandwidth which controls the smoothness of the estimate. Hence, the selection of the bandwidth is the most crucial in the context of kernel density estimation.

The selection of the bandwidth is a critical issue that arises in the context of KDE, as the performance of the KDE depends on the chosen bandwidth. A small bandwidth value results in an undersmoothed density, while a large bandwidth value leads to an oversmoothed density (Gramacki, 2018). There are various methods to determine a bandwidth for KDE. The primary categories of bandwidths for KDE include rules-of-thumb (ROT), cross-validation (CV), and plug-in (PI). The plug-in method has been demonstrated to offer excellent performance in many cases. Due to its demonstrated excellent performance, the plug-in method is the common first choice in practical applications. However, there is still room for further improvement in its implementation (Wand and Jones, 1995). Plug-in bandwidths are operated on the straightforward concept of substituting estimated values of the unknown quantities into formulas for achieving the asymptotically optimal bandwidth.

In order to overcome optimal bandwidth, this study introduces the proposed plug-in bandwidth. This bandwidth leverages the first kind shifted Chebyshev polynomials, providing a solution to the problem. The effectiveness of the methods relies on the estimation of integrated squared density derivative functionals, a subject that has been explored by many researchers. Silverman (1986) provided a comprehensive overview of density estimation techniques, including discussions on bandwidth selection and the role of integrated squared density derivative functionals. Sheather and Jones (1991) discussed a data-driven method for bandwidth selection in kernel density estimation, which related to integrated squared density derivative functionals. Raykar and Duraiswami (2006) developed the algorithms for estimating density derivatives using the univariate Gaussian kernel. These algorithms are utilized to calculate the optimal bandwidth for kernel density estimation. Tenreiro (2011, 2020) proposed direct plug-in bandwidth for the KDE based on the Fourier series and the Hermite series. In a recent study, Dharmani (2022) introduced a bandwidth selection by employing the near Gaussian assumption. This assumption enables the use of the Gram-Charlier A series as an approximation to the function for the purpose of estimating its density derivative. The objective of this paper is to derive a bandwidth by using the first kind shifted Chebyshev polynomials as an approximation to the density function. This is aimed at estimating the integrated squared density derivative functionals.

The remaining sections of this article are organized as follows. Section 2 provides an overview of the fundamental properties of kernel density estimation. In Section 3, various methods for bandwidth selection are discussed. These methods include least squares cross-validation bandwidth, an improved version of rules of thumb bandwidth, and the Sheather and Jones plug-in bandwidth. Section 4 offers a brief definition of the first kind shifted Chebyshev polynomials, then utilizes them to approximate the underlying density function, and finally presents the proposed plug-in bandwidth based on this estimator. Section 5 presents a simulation study of the proposed bandwidth, examining its performance under different distributions and sample sizes using the R programming language. Additionally, Section 6 applies the proposed bandwidth to real dataset. Finally, Section 7 concludes the article with a summary of the findings.

2. Kernel density estimation

The kernel density estimator for a random sample X1, X2, . . . , Xn drawn from a common and typically unknown density f (x), as defined by Rosenblatt (1956) and Parzen (1962), is expressed as


where K (x) is the kernel function and h is the bandwidth with positive value.

The kernel function K (x) plays as the weight function and satisfies the following properties: K (x) ≥ 0, K (x) = K (−x) , ∫ K (x) dx = 1, ∫ xK (x) dx = 0 and k2 = ∫ x2K (x) dx ≠ 0. The bandwidth h determines the level of smoothness of the density estimate.

In practice, it is common to consider a global error criterion that measures the distance between the estimated density function (x; h) and the true density function f (x). One such error criterion is the integrated squared error (ISE) given by ISE (f^(x;h))=-[f^(x;h)-f(x)]2dx. A more appropriate approach would involve analyzing the expected value of this quantity, known as the mean integrated square error (MISE), which is defined as

MISE (f^(x;h))=E-[f^(x;h)-f(x)]2dx=-Bias2(f^(x;h))dx+-Var(f^(x;h))dx=14h4k22(f(x))2dx+1nhk0+o{(nh)-1+h4},

where k0 = ∫ (K (x))2dx and k2 = ∫ x2K (x) dx (Gramacki, 2018).

The assumptions are f (x) is assumed to be sufficiently smooth: Its second derivative f″(x) is bounded, continuous and square integrable. Also, if (nh)−1 → 0 and h → 0 as n → ∞ then MISE (x; h) → 0. It obtains the asymptotic mean integrated square error (AMISE) as follows:

AMISE f^(x;h)=14h4k22(f(x))2dx+1nhk0=14h4k22θ2+1nhk0,

where θ2 = ∫( f″ (x))2dx, k0 = ∫ (K (x))2dx and k2 = ∫ x2K (x) dx.

3. Bandwidth selection

Several methods are available for determining appropriate bandwidth for KDE. The three main types of bandwidths are as follows: Cross-validation (CV), rules-of-thumb (ROT) and plug-in (PI) (Gramacki, 2018). Cross-validation involves techniques like least squares cross-validation (LSCV), biased cross-validation (BCV), and smoothed cross-validation (SCV). Rules-of-thumb includes approaches such as Silverman’s rule of thumb and its improved version. Plug-in methods have been explored by various authors, including Park and Marron (1990), Sheather and Jones (1991), and Hall et al. (1991), among others. This article provides concise explanations of the following methods chosen for study: least squares cross-validation, the improved version of Silverman’s rule of thumb, and Sheather and Jones plug-in bandwidth.

3.1. Least squares cross validation

A well-known method for selecting the bandwidth is least squares cross-validation (LSCV), proposed by Rudemo (1982) and Bowman (1984). The main objective is to find the optimal bandwidth h that minimizes the ISE using the estimator (x; h) for density f (x). The integrated squared error of (x; h) is represented as

ISE f^(x;h)=[f^(x;h)-f(x)]2dx=f^(x;h)2dx-2f^(x;h)f(x)dx+f(x)2dx.

(Silverman, 1986; Wand and Jones, 1995).

The first term ∫ (x; h)2dx of (3.1) can be calculated from the data which was proved by Härdle (1991) as


where K * K (x) is the convolution of K (x).

The second term ∫ (x; h) f (x) dx of (3.1), which depends on h and involves the unknown density f (x), has to be estimated. Notice that, ∫ (x; h) f (x) dx is the expected value of (x; h) which can be estimated by


where f^-i(Xi;h)=(1/((n-1)h))ΣjinK((x-Xj)/h) is the estimator based on the sample with Xi deleted.

Finally, the last term ∫ f (x)2dx of (3.1) is independent of the bandwidth h. Therefore, the last term can be moved to the left side of the equation and can be written as

ISE f^(x;h)-f(x)2dx=f^(x;h)2dx-2f^(x;h)f(x)dx.

As a result, Equations (3.2) and (3.3) are inserted into the Equation (3.4), leading to the least squares cross-validation function as

LSCV (h)=1n2hi=1nj=1nK*K(Xj-Xih)-2n(n-1)hi=1nj1nK(Xi-Xjh)

(Härdle et al., 2004). The bandwidth that minimizes the function LSCV (h) is denoted by hLSCV.

3.2. Rules of thumb

The optimal bandwidth minimizes AMISE (x; h) with respect to h, and solving the first partial derivative with respect to h yields


where θ2 = ∫( f ″(x))2dx, k0 = ∫ (K (x))2dx and k2 = x2K (x) dx.

The rule-of-thumb bandwidth is determined by replacing the density function f (x) with the normal distribution having zero mean and variance σ2 and using the Gaussian kernel function in Equation (3.6). The rule-of-thumb bandwidth, denoted by hROT1, is calculated using the formula


The rule-of-thumb bandwidth is sensitive to outliers, which cause an overestimation of σ and lead to a larger bandwidth. To make the estimator more robust, the interquartile range (IQR) is used. The improved version of the rule-of-thumb bandwidth, denoted as hROT2, is defined as

hROT2=1.06n-15min (σ,IQR1.34)

(Härdle et al., 2004).

3.3. Plug-in

The concept of plug-in bandwidth was originally introduced by Woodroofe (1970). This concept is based on the idea of using an optimal bandwidth that minimizes AMISE( (x; hSCBS)). Plug-in bandwidth is based on the substitution of the unknown quantity θ2 = ∫ ( f″ (x))2dx. Sheather and Jones (1991) also proposed the plug-in bandwidth, denoted as hSJDP. They provided a solution for this bandwidth selection method as


The pilot bandwidth for the estimation of ψ4 is a function γ of h. The choice of γ is defined by


where ψ̂4 (g1) and ψ̂6 (g2) are kernel estimates of ψ4 and ψ6. The choice of g1 and g2 are formulated by

g1=(-2K(4)(0)ψ^6k2n)17         and         g2=(-2K(6)(0)ψ^8k2n)19,

where ψ^6=-15/(16πσ^7),ψ^8=105/(32πσ^9),K(4)(0)=3/2π and K(6)(0)=-15/2π (Wand and Jones, 1995).

4. The shifted Chebyshev series based plug-in bandwidth

Bandwidth selection in KDE using the AMISE criteria involves estimating the second-order derivative of the unknown density being estimated. The first kind shifted Chebyshev series expansion can be used as an approximation method for an unknown density function. This section will cover the necessary background on the first kind shifted Chebyshev polynomials and derive the bandwidth.

4.1. The shifted Chebyshev polynomials

The first kind Chebyshev polynomials of degree m, where m ∈ {0, 1, 2, . . .}, are denoted as Tm (x) and defined on the interval [−1, 1] . A more general recurrence relation is

Tm+1(x)=2xTm(x)-Tm-1(x)         with         T0(x)=1         and         T1(x)=x.

In order to use the first kind Chebyshev polynomials on a finite range [a, b], a transformation can be applied to generate the so-called the first kind shifted Chebyshev polynomials. This transformation involves using the equation

y=(b-a2)x+(b-a2),   then x=(2b-a)y+(b+ab-a).

Afterward, the first kind shifted Chebyshev polynomials are generated by

Tm*(x)=Tm(2b-ay-b+ab-a),   for   x[a,b].

In the context of the interval [a, b], the approximating function can be approximated using the first kind shifted Chebyshev series as


where the coefficients are defined via the formula



x˜k=cos (2k-12m)π,         k=0,1,,m-1

is the Chebyshev zero nodes.

The integration of the squared function of the second-order derivative of the first kind shifted Chebyshev series expansion θ2 = ∫ ( f″ (y))2dy can be expressed as


By substituting the values of θ2 from (4.7) into the expressions for the optimal bandwidth derived in (3.6), the resulting bandwidth can be described as the first kind shifted Chebyshev series-based bandwidth:


4.2. The optimal value of m

The bandwidth hSCBS is influenced by the number of terms m in the first kind shifted Chebyshev expansion used in estimating θ2,m. The performance and smoothness of this expansion are also affected by m, acting as a smoothing parameter and representing the number of terms in the series expansion. The best choice for the number of terms is the smallest m that results in the lowest ISE. Then, the error in function expansion using the first kind shifted Chebyshev expansion is determined by MISE ( (x; hSCBS)) = E ∫ [ (x; hSCBS) − f (x)]2dx.

5. Simulation study

In this section, the aim is to evaluate the performance of the proposed bandwidth hSCBS and compare with three other bandwidths used for density estimation: least squares cross-validation bandwidth (hLSCV), the improved version of the rules of thumb bandwidth (hROT2), and the Sheather and Jones plug-in bandwidth (hSJDP). This study compares different bandwidths for density estimation using fifteen normal mixture densities constructed by Marron and Wand (1992). These densities include various shapes, such as unimodal, bimodal, trimodal, and multimodal, each defined and visualized in Marron and Wand’s work. For each distribution, sample sizes of n = 25, 50, 100, 150, and 200 are considered, and the MISE of the estimator is computed over 500 replications. The Gaussian kernel function is used in all cases.

The main idea is to find the optimal number of terms in the expansion (m) that allows a good approximation of f (x) using the first kind shifted Chebyshev series expansion, in the sense of the mean integrated squared error (MISE). The results have been presented through box plots showing the estimated MISE as a function of the number of terms in the expansion, as shown in Figure 1. This graph illustrates the influence of the terms m on MISE across three density distributions (#4, #6, and #12). The x-axis labels represent the sequential terms in the expansion, and the medians of MISE are displayed in the box plots. Furthermore, a solid circle is used to indicate the optimal number of terms in the expansion, which corresponds to the smallest MISE. Figure 1(a) displays the MISE of Density #4 with sample size of n = 50, while Figure 1(b) shows the MISE of Density #6 with sample size n = 50. Figure 1(c) and Figure 1(d) show MISE for Density #12 with two different sample sizes n = 25 and n = 200, respectively. The performance is influenced by the combination of density and sample size, and specific combinations yield better results.

The simulation results evaluate the performance of the plug-in bandwidth by finding the bandwidth that minimizes the mean integrated squared error, MISE ( (x; hSCBS)). Table 1 shows the MISE values for different bandwidths, and the bold text value indicates the smallest MISE associated with the bandwidths hLSCV, hROT2, hSJDP and hSCBS. Overall, the proposed bandwidth hSCBS demonstrates good performance compared to the other bandwidths, except for Density #6 (n = 50) and #11 (n = 50), where the improved version of rules of thumb bandwidth (hROT2) shows better performance. The simulation results indicate that the suggested bandwidth is a good choice for various scenarios. It provides excellent performance compared to all the other bandwidths under consideration, even though the calculation is quite complex.

6. Real data analysis

In this section, kernel density estimation is applied to real datasets. The performance of the proposed bandwidth hSCBS is verified against other bandwidths. All calculations are performed using the R programming language.

A real dataset named “flywheels” from Anderson-Cook (1999) and comprising 60 observations on flywheel imbalance angles, will be utilized. This analysis focuses on how different bandwidth choices influence kernel density estimates and histograms, serving as methods to understand data distribution. Figure 2 displays a histogram with 14 bins for this dataset. The density seems to exhibit asymmetric bimodal behavior. The kernel density estimate, using different bandwidth options such as hLSCV, hROT2, hSJDP and the proposed hSCBS, is also overlaid on Figure 2. Different bandwidth options will be compared to find the best approach. The kernel density estimate using the suggested bandwidth hSCBS fits well across the dataset, as confirmed in Table 2 by the lowest mean square error (MSE) value for hSCBS.

7. Conclusion

When selecting bandwidth for kernel density estimation, the direct plug-in method is the common initial approach, but there is room for enhancement. Estimating the bandwidth involves finding the integration of the squared function of the second-order derivative of the unknown density to be estimated. This article introduces a bandwidth selection technique by incorporating the estimation of θ2,m = ∫ ( f″ (x))2dx through the first kind shifted Chebyshev polynomials as an approximation to the function f (x).

The simulation studies revealed that the first kind shifted Chebyshev series-based plug-in bandwidth (hSCBS) performs well among other bandwidths, such as least squares cross-validation bandwidth, improved rule of thumb bandwidth, and Sheather and Jones plug-in bandwidth. When applying kernel density estimation to estimate the density of the “flywheels” dataset, the results indicate that the proposed bandwidth (hSCBS), with the lowest MSE, offers the best performance compared to other bandwidth methods. Even though obtaining the proposed bandwidth might be complex, it is still a favorable choice for practical application.


The authors would like to thank Kasetsart University and Rajamangala University of Technology Rattanakosin for the support.

Fig. 1. Boxplot of MISE (f (x; hSCBS)) with the number of term m.
Fig. 2. Histogram and kernel density estimates for the “flywheels” dataset.

Table 1

MISE ( (x; hSCBS)) × 10−3 bases on the bandwidths hLSCV, hROT2, hSJDP and hSCBS with 500 replications for each case


Density #12510.78194.89377.72002.9581

Density #22515.83669.680914.27886.8853

Density #325129.2468154.727990.044872.9098

Density #425219.2996172.4827136.027284.3708

Density #525827.9669471.6216753.9018292.4579

Density #6257.93743.99416.14313.7115

Density #72516.209318.08538.51677.8264

Density #82512.23266.21898.58535.4875

Density #92510.46424.18655.67584.2808

Density #102536.839123.519126.612922.3816

Density #11258.07314.54906.47994.1913

Density #122516.72579.669912.10608.6525

Density #132510.67915.80317.62415.7830

Density #142532.035924.353515.450313.6719

Density #152520.858128.716019.944515.7466

Table 2

MSE ( (x; hSCBS)) × 10−3 for kernel density estimates with varying bandwidths


  1. Anderson-Cook CM (1999). A tutorial on one-way analysis of circular-linear data. Journal of Quality Technology, 31, 109-119.
  2. Bowman AW (1984). An alternative method of cross-validation for the smoothing of density estimates. Biometrika, 71, 353-360.
  3. Dharmani B (2022). Gram-charlier a series based extended rule-of-thumb for bandwidth selection in univariate kernel density estimation. Austrian Journal of Statistics, 51, 141-163.
  4. Gramacki A (2018). Nonparametric Kernel Density Estimation and Its Computational Aspects, Springer, Switzerland.
  5. Hall P, Sheather SJ, Jones MC, and Marron JS (1991). On optimal data-based bandwidth selection in kernel density estimation. Biometrika, 78, 263-269.
  6. Härdle W (1991). Smoothing Techniques: With Implementation in S, Springer, New York.
  7. Härdle W, Müller M, Sperlich S, and Werwatz A (2004). Nonparametric and Semiparametric Models, Springer, Berlin.
  8. Marron JS and Wand MP (1992). Exact mean integrated squared error. The Annals of Statistics, 20, 712-736.
  9. Park BU and Marron JS (1990). Comparison of data-driven bandwidth selectors. Journal of the American Statistical Association, 85, 66-72.
  10. Parzen E (1962). On estimation of a probability density function and mode. The Annals of Mathematical Statistics, 33, 1065-1076.
  11. Raykar VC and Duraiswami R (2006). Fast optimal bandwidth selection for kernel density estimation. Proceedings of the 2006 SIAM International Conference on Data Mining, Bethesda, MD, 524-528.
  12. Rosenblatt M (1956). A central limit theorem and a strong mixing condition. Proceedings of the National Academy of Sciences of the United States of America, 42, 43-47.
    Pubmed KoreaMed CrossRef
  13. Rudemo M (1982). Empirical choice of histograms and kernel density estimators. Scandinavian Journal of Statistics, 9, 65-78.
  14. Silverman BW (1986). Density Estimation for Statistics and Data Analysis, Chapman & Hall, London.
  15. Sheather SJ and Jones MC (1991). A reliable data-based bandwidth selection method for kernel density estimation. Journal of the Royal Statistical Society: Series B (Methodological), 53, 683-690.
  16. Tenreiro C (2011). Fourier series-based direct plug-in bandwidth selectors for kernel density estimation. Journal of Nonparametric Statistics, 23, 533-545.
  17. Tenreiro C (2020). Bandwidth selection for kernel density estimation: A Hermite series-based direct plug-in approach. Journal of Statistical Computation and Simulation, 90, 3433-3453.
  18. Wand MP and Jones MC (1995). Kernel Smoothing, Chapman & Hall/CRC, New York.
  19. Woodroofe M (1970). On choosing a delta-sequence. The Annals of Mathematical Statistics, 41, 1665-1671.