TEXT SIZE

search for



CrossRef (0)
Modified information criterion for testing changes in generalized lambda distribution model based on confidence distribution
Communications for Statistical Applications and Methods 2022;29:301-317
Published online May 31, 2022
© 2022 Korean Statistical Society.

Suthakaran Ratnasingama, Elena Buzaianub, Wei Ning1,c

aDepartment of Mathematics, California State University San Bernardino, USA;
bDepartment of Mathematics and Statistics, University of North Florida, USA;
cDepartment of Mathematics and Statistics, Bowling Green State University, USA
Correspondence to: 1 Department of Mathematics and Statistics, Bowling Green State University, Bowling Green, OH 43403, USA. E-mail: wning@bgsu.edu
Received September 25, 2021; Revised November 12, 2021; Accepted November 22, 2021.
 Abstract
In this paper, we propose a change point detection procedure based on the modified information criterion in a generalized lambda distribution (GLD) model. Simulations are conducted to obtain empirical critical values of the proposed test statistic. We have also conducted simulations to evaluate the performance of the proposed methods comparing to the log-likelihood method in terms of power, coverage probability, and confidence sets. Our results indicate that, under various conditions, the proposed method modified information criterion (MIC) approach shows good finite sample properties. Furthermore, we propose a new goodness-of-fit testing procedure based on the energy distance to evaluate the asymptotic null distribution of our test statistic. Two real data applications are provided to illustrate the use of the proposed method.
Keywords : change-point, modified information criterion, generalized lambda distribution
1. Introduction

Pearson (1985) gave a four-parameter system of probability density functions and fitted the parameters by the method of moments (MME). Tukey (1960) proposed one-parameter lambda distribution. Tukey’s lambda was generalized, for the purpose of generating random variables for Monte Carlo simulation studies, to the four-parameter generalized lambda distribution (GLD), by Ramberg and Schmeiser (1972) and Ramberg and Schmeiser (1974). Ramberg et al. (1979) developed a four-parameter model together with the necessary tables for fitting a wide variety of curves. Since the early 1970s, the GLD has been applied in many fields of endeavor with continuous probability density functions. The generalized lambda distribution family with four parameters λ1, λ2, λ3, and λ4, denoted by GLD(λ1, λ2, λ3, λ4), has the density function,

f(x)=λ2λ3yλ3-1+λ4(1-y)λ4-1,         at x=Q(y),

where Q(y) is the percentile function defined as

Q(y)=λ1+yλ3-(1-y)λ4λ2,

where 0 ≤ y ≤ 1. The λ1 and λ2 are the location and scale parameter respectively. Further, λ3 and λ4 determine the skewness and kurtosis. We note there that not all choices of λ1, λ2, λ3, λ4 lead to a valid distribution, as described in the following theorem.

Theorem 1

The GLD(λ1, λ2, λ3, λ4) specifies a valid distribution if and only if

g(y,λ3,λ4)λ3yλ3-1+λ4(1-y)λ4-1,

has the same sign for all y in [0, 1], as long as, λ2, takes that sign. In particular, the GLD(λ1, λ2, λ3, λ4) specifies a valid distribution if λ2, λ3, λ4all have the same sign.

More details for GLD family refer to Karian and Dudewicz (2000). This paper is organized as follows. In Section 2, we develop the change point detection procedure for the GLD model based on the MIC and provides preliminaries on how to construct a confidence curve and confidence set for the specified level. The simulation results are presented in Section 3. Our new method for the goodness-of-fit of the asymptotic null distribution of Sn is discussed in Section 4. Two real data applications are provided in Section 5. The summary of the results and some discussion are given in Section 6.

2. Methodology

2.1. Modified information approach

In this section, we use the modified information approach (MIC) to detect changes in a GLD model. Chen et al. (2006) proposed the MIC as a the modification of the BIC approach by emphasizing the model complexity in the context of change point problems to include the contribution of the location of the change point.

In general, we consider multiple changes in the data set. Vostrikova (1981) proposed the binary segmentation method which could detect multiple structural changes recursively at most one change point at each step. In the first step, this method allows us to scan the whole data set by testing the null hypothesis of no change versus the alternative hypothesis of having one change. Once the first change (if there is any) has been located, the data is divided to two subsequences which are before the change point and after the change point. Then the second step is to repeat the same scanning procedure in Step 1 to these two subsequences respectively by assuming at most one change in each subsequence. Such a process will be repeated until there will be no further subsequences having change points. By doing so, we can find all the possible changes as well as estimate their locations. She also showed that the binary segmentation procedure is consistent. In particular, the multiple change point problem may be viewed as a single problem along with the binary segmentation procedure. Therefore, in this paper, we establish the testing procedure for a single change point detection method, however, it can be easily generalised to multiple change point problem, if needed.

Let X1, . . . , Xn be a sequence of independent random variables from a GLD model with parameters,

(λ1(1),λ2(1),λ3(1),λ4(1)),(λ1(2),λ2(2),λ3(2),λ4(2)),,(λ1(n),λ2(n),λ3(n),λ4(n)).

We are interested in testing the changes in all parameters simultaneously. Thus, we would like to test the hypotheses are,

H0:λi(1)=λi(2)==λi(n),         i=1,2,3,4,

versus,

H1:λi(1)=λi(2)==λi(k)λi(k+1)==λi(n),         i=1,2,3,4,

where 1 ≤ k < n. Then we can define the modified information criteria MIC(n) under H0 and MIC(k) under H1 respectively according to Chen et al. (2006) as follows,

MIC(n)=-2ln LH0(λ^1,λ^2,λ^3,λ^4)+4log(n),=-2[nlog (λ^2)-i=1nlog (λ^3y1λ^3-1+λ^4(1-yi)λ^4-1)]+4log(n),MIC(k)=-2ln LH1(λ^1(1),λ^2(1),λ^3(1),λ^4(1),λ^1(n),λ^2(n),λ^3(n),λ^4(n))+{8+(2kn-1)2}log(n),=-2[nlog (λ^2(1))+(n-k)log(λ^2(n))-i=1klog (λ^3(1)y1λ^3(1)-1+λ^4(1)(1-yi)λ^4(1)-1)]-i=k+1nlog(λ^3(n)y1λ^3(n)-1+λ^4(n)(1-yi)λ^4(n)-1)+{8+(2kn-1)2}log(n),

where λ̂i s, i = 1, 2, 3, 4 are MLEs under H0, and λ^i(1),λ^i(n), i = 1, 2, 3, 4 are MLEs under H1 which are estimated by R package GLDEX (Su, 2016). Let k be the possible change location in the range of 1 ≤ k < n. Then we accept H0 if,

MIC(n)min1k<nMIC(k),

which indicates there is no change point in the data set, and we reject H0 if

MIC(n)>min1k<nMIC(k),

which indicates that there exists a change point in the data set. Consequently we can estimate the change point location by

MIC(k^)=min1k<nMIC(k).

Further, we define the test statistic Sn based on MIC(n) and MIC(k) to test the null hypothesis of no change versus the alternative hypothesis of one change as follows,

Sn=MIC(n)-min1k<nMIC(k)+4log(n).

We reject the null hypothesis for a sufficiently large value of S n. The standardization term 4 log(n) removes the constant term in the difference of MIC(n) and MIC(k).

2.2. Confidence distribution, profile log-likelihood and deviance functions

Confidence distributions (CD) are distribution estimates to be interpreted as distributions of epistemic probabilities. The concept of a CD is similar to a point estimator and it can be referred to as a sample-dependent distribution that can represent confidence intervals of all levels for a parameter of interest. A formal definition of CD can be found in Schweder and Hjort (2002). Furthermore, Schweder and Hjort (2016) systematically studied the theoretical properties of the CD. A detailed analysis of recent developments of CD has been given by Xie and Singh (2013). More applications of the CD can be found in the literature, including bootstrap distributions, p-value functions, normalized likelihood functions, and Bayesian posteriors, among others, Schweder and Hjort (2002), Singhet al. (2005), Singhet al. (2007), Singh and Xie (2012), and Shen et al. (2018).

The CD for change point analysis has been investigated by Cunen et al. (2018) and they construct confidence curves for change locations using the log-likelihood approach. Ratnasingam and Ning (2020) studied the change point detection procedure based on the CD combining with the MIC to construct the confidence set for the change estimate for a skew normal change point model. They also investigated the confidence distribution for detecting and estimating changes in a three-parameter Weibull distribution (Ratnasingam and Ning, 2021). In this paper, we study the CD-based detection procedure along with MIC for a GLD change point model. Next, we describe a procedure to construct a confidence curve for a four-parameter GLD change point model.

Suppose X1, . . . , Xk be a sequence of independent random variables with the density function f (xL) and Xk+1, . . . , Xn coming from the population with the density function f (xR). Now the log-likelihood function is defined as

(k,ΘL,ΘR)=i=1klog (f(xi,ΘL))+i=k+1nlog (f(xi,ΘR)),

where ΘL and ΘR are the parameter space of the pre-change and post-change distributions respectively. By maximizing the log-likelihood function above of a given k, we can obtain the profile log-likelihood function as follows.

prof(k)=maxΘL,ΘR((k,ΘL,ΘR))=(k,Θ^L,Θ^R),

where Θ̂L and Θ̂R are MLEs of ΘL and ΘR respectively. The estimated change point location corresponds to the maxk(prof(k)). The deviance function is defined as

D(k,x)=2[prof(k^)-prof(k)],

where x = (x1, x2, . . . , xn). The confidence curve for k based on the deviance function can be obtained through simulation.

cc(k,xobs)=ϕk(D(k,xobs))=Pk,Θ^L,Θ^R(D(k,x)<D(k,xobs)).

where the cc(k, xobs) < α under the true value of k. By simulation, we compute

cc(k,xobs)=1Bj=1BI(D(k,xj*)<D(k,xobs)),

for a large number of B of simulated copies of dataset xobs. For each possible value of k, we simulated data xj*, j = 1, . . . , B from f (xL) and f (xR) to the left and right side of k, respectively. Furthermore, the change point location is estimated by (2.7). More precisely, our approach depends on the location of the change point. This approach is different from the method used in Cunen et al. (2018). For more details, we refer the readers to Cunen et al. (2018) and Ratnasingam and Ning (2020).

3. Simulation results

In this section, due to the difficulty in deriving its the analytic properties, we use simulations to investigate the critical values and the performance of the proposed test statistic Sn .

3.1. Critical values

We now describe how to obtain empirical critical values for our test statistic S n, and for Tn, the test statistics proposed by Ning and Gupta (2009), who considered the classical Bayesian information criterion (BIC) to detect multiple changes in a GLD model, and then compare the performances their performances.

Tn=BIC(n)-min1k<nBIC(k)+4log n,

where BIC(n) under H0 and BIC(k) under H1 are given by

BIC(n)=-2ln LH0(λ^1,λ^2,λ^3,λ^4)+4log(n),BIC(k)=-2ln LH1(λ^1(1),λ^2(1),λ^3(1),λ^4(1),λ^1(n),λ^2(n),λ^3(n),λ^4(n))+8log n.

The major difference between Sn and Tn is that the penalty term in the MIC(k) incorporates the contribution of the location of the change point k associated with the complexity of the model, which is not accounted for by the BIC(k). In order to make a fair power comparison between Sn and Tn, we simulate the critical values for both test statistics under the same null distributions with the same sample sizes for given significance levels.

There are two general approaches available to compute critical values. They are simulation-and bootstrap-based approaches. The simulation-based approach requires the estimation of the test statistics values Sn and Tn under the null hypothesis of a certain number of repetitions, and critical values are equal to the percentiles of the sorted values of the test statistics from the simulations. The second method uses the bootstrap to obtain the asymptotic critical values for the test statistics. In this method, a certain number of samples are drawn from a null distribution with replacement over and over again from a null distribution. These are called bootstrap samples. The critical values for a given significance level correspond to the percentiles of sorted test statistics values.

When using the bootstrap method to obtain simulated critical values of a test statistic, we need to ensure that the bootstrap samples are re-sampled from data under the null distribution. In the simulation-based approach, however, the distribution under the null hypothesis has been determined before re-sampling. Therefore, it is known to satisfy H0 which can be used to generate a sample. Thus, in simulations, both approaches will obtain similar critical values. However, for real data, it would be an issue for the bootstrap method since we do not know whether the data satisfies H0 or H1. Therefore, we can’t perform re-sampling directly on the data. The following strategy will be taken. We first assume the data satisfying H0, which indicates it should be fitted by a single GLD, GLD0 = GLD(λ̂1,λ̂2,λ̂3,λ̂4), where λ̂i, i = 1, 2, 3, 4 can be obtained by R package GLDEX. Then we generate a random sample based on GLD0 denoted by x1, x2, . . . , xn. Bootstrap samples are drawn from this generated sample with replacement, denoted by y1(i),y2(i),,yn(i), i = 1, 2, . . . , B. For each bootstrap sample, we calculate Sn denoted by Sn(i), i = 1, 2, . . . , B. Thus, the p-value can be approximated as follows

P-value=1Bi=1BI(Sn(i)Sn(*)),

where I(·) is the indicator function and Sn(*) is the value of Sn calculated from the original real data.

3.2. Critical values of Sn and Tn

In our simulation study, we set up the null distribution to be GLD(2, 1, 0.19, 0.19) and choose sample sizes n = 50, 60, 80, 100, 150 with significance levels α = 0.01, 0.05, 0.1. We obtain the empirical critical values for Sn (2.1) and Tn (3.1) through the simulations as follows,

  • Step 1: We generate data with various sample sizes n = 50, 60, 80, 100, 150 from GLD(2, 1, 0.19, 0.19).

  • Step 2: For each generated sample, we calculate Sn and Tn respectively.

  • Step 3: We repeat the above steps M = 1,000 times. Then the corresponding percentiles of these Sn and Tn values are the critical values at give significance level α = 0.01, 0.05, and 0.1.

The empirical critical values are provided in Table 1.

We should note here that, for the real data application, we should follow the bootstrap method proposed in Section 3.1 to calculate the p-value since whether the true distribution of the data satisfies H0 or H1 is unknown.

In this subsection, we provide results of the simulation study for the coverage probability, confidence sets, and consistency of the change point estimator. Three different sample sizes n = {50, 100, 150} are considered and each with various change point locations. For instance, sample size n = 50, the change point positions are set to k = 10, 15, and 25. The pre-change data are obtained from GLD(2, 1, 0.19, 0.19). One method is considered better than other methods if it produces thinner confident sets and retaining the right coverage for a given test level.

3.3. Power comparison

In this section, we conducted simulations under different scenarios to investigate the performance of test procedures based Sn and Tn in terms of power. We set up the distribution before change to be GLD(2, 1, 0.19, 0.19), and the distribution after the change to be GLD(λ1(n),λ2(n),λ3(n),λ4(n)) with various values listed in Table 2.

Further, various sample sizes n = {50, 100, 150} and different change point locations have been considered under each sample sizes. We consider the change point locations where τ positions {10, 15, 25} for sample size n = 50, {15, 25, 50} for sample size n = 100, and {35, 50, 75} for sample size n = 150. Note that the cases {40, 35, 25} for sample size n = 50, and {85, 75, 50} n = 100, and {155, 100, 75} for sample size n = 150 are fully symmetrical and thus they have the same power, coverage probability, and average size. Our test statistics (2.1) and (3.1) correspond to the MIC and BIC procedures respectively. The critical values at a given significance level are obtained through the simulations proposed in Section 3.2. The simulations results are recorded in Table 2.

Regardless of the method used for the power calculations, the power of the test increases as the sample size becomes larger. Moreover, it appears that the MIC based method gives larger power compared to the BIC based method. This may be due to that MIC method depends on the location of the change point. We also notice that the power of the test increases as the differences between the parameters increases.

3.4. Coverage probability, confidence sets & consistency of the estimator comparison

First, we examine the coverage probabilities when the method has the exact right coverage for the specified level. We compare our method, the MIC based method, with the log-likelihood based approach proposed in Cunen et al. (2018). The simulation results are listed in Table 3 for when α = {0.50, 0.90, 0.95, 0.99}. The performance of the MIC based method outperforms the log-likelihood based method in all cases. However, both methods provide slightly over coverage for a level α = 0.5 and this is more apparent as the sample size and the difference among the parameters increase. For example, sample size n = 100, the change point location k = 15 with the post-change distribution GLD(2.5, 1.5, 0.69, 0.69) and test level α = 0.90 the coverage level based on MIC method is 0.79 as compared to the log-likelihood based method which gives only 0.74.

Next, we compute the average sizes of confidence sets for MIC and log-likelihood based methods. The results are summarized in Table 4. It can be seen that the MIC based method gives smaller confidence sets compared to the log-likelihood based approach proposed in Cunen et al. (2018). We observe that the size of the confidence set becomes smaller when the difference between the parameters increases. For example, for n = 50, test level α = 0.5, and the post-change distribution GLD(2.5, 1.5, 0.69, 0.69) the average length of the confidence set based on MIC is equal to 7.342, however, the log-likelihood based approach provides slightly large confidence set which average length is 7.556.

We also investigate the consistency of the estimator of the actual change location k through a numerical study. The difference between the estimated location and the actual change point location is set to δ. Further, we compute the bias and mean squared error (MSE) as well. Table 5 summarizes the simulation results below. It can be concluded from the simulations that the P(|MICk| ≤ δ) is greater than P(|BICk| ≤ δ). However, the difference between these two probabilities decreases as the sample size n increases.

4. Goodness-of-fit test for Sn

According to Chen et al. (2006), under certain conditions, the asymptotic null distribution of the proposed test statistic Sn is the χ2-distribution with d degrees of freedom, where d is the number of parameters in the null model. In this section, we develop a Goodness-of-fit test to evaluate the asymptotic null distribution of Sn using the energy distance.

Energy distance, as described in Székely (2000), is defined to be the statistical distance between probability distributions. The associated statistics, named energy statistics, are the function of energy distances. The concept is motivated by Newton’s gravitational potential energy which is a function of the distance between two objects. Thus the idea of energy statistics is to consider statistical observations as heavenly bodies governed by a statistical potential energy, which is zero if and only if an underlying statistical null hypothesis is true. Székely and Rizzo (2013) defined the energy distance ℰ(X, Y) between two independent d-dimensional random variables X and Y is computed by

(X,Y)=2EX-Yd-EX-Xd-EY-Yd,

provided E|X|, E|Y| < ∞. Here X′ is and i.i.d copy of X and Y′ is an i.i.d copy of Y. ℰ(X, Y) ≥ 0 and ℰ(X, Y) = 0 if and only if X=dY. There have been numerous studies based on the energy distance. For example, Székely and Rizzo (2005) proposed a test based on the energy distance for multivariate normality. Rizzo (2009), Yang (2012), and Rizzo and Haman (2016) considered one-sample goodness-of-fit tests for Pareto distributions, Univariate stable distributions and asymmetric Laplace distributions based on the energy distance. Székely and Rizzo (2004) constructed an energy-distance-based test for testing equality of distributions under high dimensional settings. Baringhaus and Franz (2004) also proposed a new multivariate two-sample test based on energy distance. Kim et al. (2009) and Matteson and James (2014) studied change point problems incorporating the energy distance. In the univariate case, d = 1, the energy distance (4.1) becomes,

(X,Y)=2EX-Y-E|X-X|-E|Y-Y|.

Then the one-sample energy statistic for the goodness-of-fit test based on the energy distance is given by the following definition.

Definition 1

Let X1, . . . , Xn be a random sample from a univariate population with distribution F and let x1, . . . , xn be the observed values of the random variables in the sample. Then the single sample energy statistic for testing the hypotheses H0 : F = F0vs H1 : F ≠ F0is

Ψn=nn(x1,,xn,X)=n(2ni=1nExi-X-E|X-X|-1n2i=1nj=1n|xi-xj|).

where X and X′ are independent and identically distributed with distribution F0and the expectations are taken with respect to the null distribution F0.

As we mentioned earlier, according to Chen et al. (2006), the Sn defined in (2.1) follows the χ42 distribution. Thus, our testing hypothesis becomes H0:F=χ42 vs H1:Fχ42. Therefore, we only need to derive the energy statistics formula for the χ42 distribution.

Theorem 2

Let Y~χ42, then for any fixed x ∈ ℝ

Ex-Y=2xFY(x)-x-EY-(4-12e-y2[x(x+4)+8]).
ProofEx-Y=xy(x-y)fY(y)dy+x>y(y-x)fY(y)dy=x(2FY(x)-1)-E(Y)+2xyfY(y)dy=x(2FY(x)-1)-E(Y)-2-xyfY(y)dy=x(2FY(x)-1)-E(Y)-20xy122Γ(2)y42-1e-y2dy=2xFY(x)-x-EY-(4-12e-x2[x(x+4)+8])

Theorem 3

Let X and X′ be independent identically distributed random variables. Then

E|X-X|4ni=1nyiFX-1(yi)-2ni=1nFX-1(yi),

where FX-1is the inverse CDF of X, n is the number of equally sized sub-intervals of [0, 1] and yi is chosen from the ith sub-interval.

The proof of Theorem 3 is similar to Rizzo and Haman (2016). Thus, details are omitted to conserve space. According to Rizzo (2002), the last term of the (4.2) can be written as,

1n2i=1nj=1n|xi-xj|=2n2k=1n(2k-1-n)y(k),

where y(1)y(1) ≤ ·· · ≤ y(n) is the ordered sample. Hence, the one sample energy statistic based on Definition 1 can be re-written as follows.

Ψn=nn(x1,,xn,X)=n{2ni=1n2xiFY(xi)-xi-EX-[4-12e-xi2[xi(xi+4)+8]]-(4n*i=1n*yiFX-1(yi)-2n*i=1n*FX-1(yi))-1n2i=1nj=1nxi-xj}.

where n* is the number of equally sized sub-intervals of [0, 1] and yi is chosen to be in the ith subinterval. The one sample energy goodness-of-fit test statistic procedure is described below,

  • Generate a data x1, x2, . . . , xn from the χ42 distribution.

  • Compute the energy statistics of the data x1, x2, . . . , xn using the formula (4).

  • Repeat Steps 1 & 2 for 5000 times and obtain Ψn(1),Ψn(2),,Ψn(5000).

  • The critical value can be obtained by finding a 95% quantile of the energy statistics.

  • Simulate Sn values using the equation (2.1) and denote them as Sn(1),Sn(2),,Sn(B).

  • Compute the energy statistic of the data Sn(1),Sn(2),,Sn(B).

  • Compare the energy statistic in Step 6 with the critical value found in Step 4, if the critical value exceeds the energy statistic, we conclude that Sn values come from χ42 distribution otherwise not.

We follow the above procedure to conduct the one sample energy goodness-of-fit test statistic. The critical value at 5% significance is 9.5037. The energy statistic for the Sn data is 11.60781 so we reject the null hypothesis at 5% significance level. This suggests that the asymptotic null distribution of Sn does not follow a χ42 distribution. Below we construct the Chi-Square Q-Q plot for Sn test statistic values.

Figure 1 shows that there is a significant deviation from the reference line. This confirms our previous results, that the asymptotic null distribution of Sn does not follow a χ42 distribution. Thus, the MIC statistic derived from the GLD change point model does not comply with the asymptotic properties established in Chen et al. (2006).

5. Application

In this section, the proposed method is applied to analyze two stock market returns data from the Brazilian and Chilean markets. These data sets were previously used in the literature Ngunkeng and Ning (2014), Ratnasingam and Ning (2020). We assume that changes occur simultaneously across all four parameters. The stock return ratio is obtained through the following transformation.

Rt=Pt+1-PtPt,         t=1,2,,n-1.

5.1. Brazilian market return ratio data

In order to identify multiple changes and to create the appropriate confidence sets, we use the proposal method along with binary segmentation procedure. First, MIC(262) = −807.3228. Then the min1≤k<262 MIC(k) = MIC(87) = −824.2035. Thus the estimated change point location = 87, which corresponds to the 88th location in the data set. The MLEs of the pre-change parameters and post-change parameters are (0.0089, 64.4382, 0.0156, −0.0085) and (0.0010, 32.8107, −0.0777, −0.0389) respectively and the 95% confidence set for the change point estimate is {87, 88, 89}. We then split the data sets into two subsets which are below k(≤ 87) and above k(> 87), then the proposed method is applied recursively in each subset in order to detect all changes in the data sets. This iterative process stops until there are no further changes detected. In particular, all the change points are obtained from our procedure {58, 88, 144, 240, 254}. As opposed to a GLD model used in Ratnasingam and Ning (2020), we identify one additional change point at 253. When compared with Ngunkeng and Ning (2014), we locate an additional change at 58th data point. Figures 2 and 3 show the confidence curves for all change-point estimates and the 95% confidence sets are marked red dashed lines. All change points are graphed in Figure 4.

5.2. Chilean market return ratio data

First, we compute MIC(n) = −1032.118 and min1≤k<262 MIC(k) = MIC(111) = −1047.216. Thus the change point estimate is 111. The corresponding change point in the data is 112. Then the MLEs of the pre-change and post-change parameters are (−0.0042, 58.20230.3994, 0.1338) and (0.0012, 48.6707, −0.0726, −0.0061) respectively. Further, the 95% confidence set for the change point is {110, 111, 112, 113, 114}. Next we apply the binary segmentation procedure to detect all the changes in the data set. They are {98, 112, 170, 181}. When compared with Ratnasingam and Ning (2020), we found an additional change point in the Chilean market data and only one change point. Further, we found an additional change point 98 when compared to Ngunkeng and Ning (2014). The confidence curves for all change point estimates and the 95% confidence sets are marked red dashed lines in Figure 6.

6. Discussion

In this paper, we propose a change point detection procedure for a GLD model based on the modified information criterion. In order to use as much information about the change point location, the proposed procedure takes into account the effect in terms of model complexity in regards to the location of the change point. We provide confidence sets for the change point location for a specified level α. We also obtain empirical critical values of the test statistics have been found. Simulations conducted with different sample sizes and various change point locations show that our method performs well in terms of a larger power, smaller confidence set, and smaller MSE when compared to other methods. Furthermore, we introduce a new goodness-of-fit test based on energy distance to determine the asymptotic null distribution of S n. The use and the advantage of the proposed method are illustrated via two stock market data sets.

Figures
Fig. 1. The Chi-square Q-Q plot for values.
Fig. 2. (a): Confidence curve for the data 1 돞 i 돞 262 the change point atk = 87, (b): Confidence curve for the fist subset below (k 돞 87), the change point at k = 57, (c): Confidence curve for the sequence 88 돞 i 돞 262 the change point at k = 240, and (d): Confidence curve for the sequence 241 돞 i 돞 262 the change point at k = 254.
Fig. 3. (e): Confidence curve for the data 88 돞 i 돞 153 the change point at k = 55.
Fig. 4. The weekly stock return data for Brazil with change point estimates.
Fig. 5. (a): Confidence curve for the data 1 돞 i 돞 261 the change point at k = 111, (b): Confidence curve for the data 1 돞 i 돞 111 the change point at k = 97, (c): Confidence curve for the data 112 돞 i 돞 261 the change point at k = 57, and (d): Confidence curve for the data 170 돞 i 돞 261 the change point at k = 10.
Fig. 6. The weekly stock return data for Chile with change point estimates.
TABLES

Table 1

Critical values of Sn and Tn

nMethodα

0.010.050.1
50Sn13.267215.664019.0798
Tn9.355211.752015.1678

60Sn12.591914.548018.9868
Tn8.497610.453714.8925

80Sn11.287613.186518.5648
Tn6.90568.804514.1827

100Sn11.419813.666517.8261
Tn6.81469.061313.2209

150Sn9.641411.577315.7964
Tn4.63076.566610.7858

Table 2

Power comparison between MIC and BIC for α = 0.05

nkModel(λ1(n),λ2(n),λ3(n),λ4(n))

(2.5, 1.5, 0.69, 0.69)(3, 2, 1.19, 1.19)(4, 3, 2.19, 2.19)
5010MIC0.7960.8820.930
BIC0.7260.8740.926

15MIC0.8620.9300.960
BIC0.8400.9240.950

25MIC0.9520.9880.976
BIC0.9340.9880.972

10015MIC0.8920.9720.998
BIC0.8820.9680.996

25MIC0.9140.9821.000
BIC0.9080.9780.998

50MIC0.9720.9921.000
BIC0.9640.9921.000

15035MIC0.9480.9801.000
BIC0.9380.9781.000

50MIC0.9740.9981.000
BIC0.9720.9981.000

75MIC0.9941.0001.000
BIC0.9861.0001.000

Table 3

Coverage probability comparison between MIC and log-likelihood methods

nkα(λ1(n),λ2(n),λ3(n),λ4(n))

(2.5, 1.5, 0.69,0.69)(3,2, 1.19,1.19)(4,3,2.19,2.19)

MICloglikMICloglikMICloglik
50100.500.380.350.400.380.450.42
0.900.800.780.880.870.900.88
0.950.860.840.930.920.950.95
0.990.980.960.990.980.991.00

150.500.420.390.420.400.520.52
0.900.820.820.890.880.910.90
0.950.910.890.950.940.950.95
0.990.980.980.990.981.000.99

250.500.460.440.440.430.520.52
0.900.880.870.890.890.890.89
0.950.930.920.950.940.950.95
0.990.980.981.000.990.991.00

100150.500.440.420.490.480.530.521
0.900.830.800.890.860.890.88
0.950.890.870.940.910.930.92
0.990.960.950.990.970.990.99

250.500.480.470.490.480.530.53
0.900.880.850.900.890.910.90
0.950.930.910.950.930.950.94
0.990.970.980.990.991.000.99

500.500.480.470.510.500.540.55
0.900.890.870.910.880.900.91
0.950.950.940.960.940.960.95
0.990.981.001.000.990.990.99

150350.500.490.470.510.500.570.56
0.900.880.860.890.870.900.88
0.950.930.910.940.920.950.95
0.990.980.990.990.990.991.00

500.500.500.490.550.540.580.57
0.900.880.860.900.890.910.91
0.950.940.930.950.940.950.95
0.990.970.950.990.980.991.00

750.500.530.510.580.560.580.58
0.900.890.880.910.900.930.93
0.950.940.930.950.940.960.95
0.990.990.990.990.991.001.00

Table 4

Average size comparison between MIC and log-likelihood methods

nkα(λ1(n),λ2(n),λ3(n),λ4(n))

(2.5, 1.5, 0.69, 0.69)(3, 2, 1.19, 1.19)(4, 3, 2.19, 2.19)

loglikMICloglikMICloglikMIC
50100.507.5567.3423.3903.0742.9362.810
0.909.5549.3983.9903.3203.2463.134
0.9510.69210.6064.4303.9903.9063.758
0.9913.79412.7345.6864.6344.3684.146

150.506.7746.5262.9962.6202.7882.584
0.908.4368.2823.4502.8382.8822.674
0.959.4629.3643.9023.6043.5663.350
0.9911.66611.2124.9224.3103.9503.742

250.055.0844.5142.9942.8362.2762.136
0.906.8526.3383.4223.3922.3662.242
0.957.9307.3403.8743.6002.4422.328
0.998.8288.1124.8344.7722.9682.726

100150.506.8786.1002.8282.6282.5462.460
0.908.4888.2823.6683.4662.6042.564
0.959.4148.8524.1443.9282.6942.250
0.9911.96011.6405.0524.4363.3862.918

250.504.9864.8102.3922.0962.2302.166
0.906.4146.3722.6662.3662.3982.208
0.957.2506.8143.1202.8402.5162.268
0.999.5768.8023.9543.6042.8462.738

500.504.2024.1662.2882.1342.1262.070
0.905.6165.3102.5382.3402.3542.214
0.956.5406.1083.0322.8942.4122.360
0.997.8827.3123.8003.6742.6142.592

150350.503.5062.7362.5702.1922.4402.210
0.904.8844.0642.8982.7282.5742.340
0.955.7504.8883.0342.8122.8262.580
0.997.9726.9503.8083.5963.4682.874

500.502.5242.2882.2682.1462.2182.122
0.903.9103.6182.7462.5222.2402.194
0.954.7164.4802.8422.7662.7742.476
0.996.8806.5623.5843.4282.9462.712

750.502.8362.1642.2122.0462.1341.898
0.904.1623.7102.4942.2542.3582.234
0.954.7444.5802.8042.5282.5922.392
0.995.5145.1023.3843.1362.8782.650

Table 5

The consistency of change location estimator

δnkP(|k| ≤ δ)Bias()MSE()

MICBICMICBICMICBIC
150120.8510.8540.2050.1820.2050.182
250.8700.8600.1870.1940.1870.194

100250.8830.8790.1960.1990.1960.199
500.8870.8860.2060.2050.2060.205

150370.9140.8870.2010.1850.2010.185
750.9100.9090.1730.1730.1730.173

200500.8950.9100.1910.1990.1910.199
1000.9130.9130.2100.2100.2100.210

300750.8920.9040.1690.1960.1690.196
1500.9100.9100.2230.2230.2230.223

250120.9220.9230.3470.3200.4890.458
250.9400.9300.3270.3340.4670.474

100250.9560.9520.3420.3450.4880.491
500.9570.9550.3460.3430.4860.481

150370.9720.9560.3170.3230.4330.461
750.9690.9690.2910.2930.4090.413

200500.9660.9620.3330.3030.4750.407
1000.9610.9610.3060.3060.4020.402

300750.9640.9660.3030.3200.4570.444
1500.9750.9750.3530.3530.4830.483

350120.9620.9650.4670.4460.8490.836
250.9660.9610.4050.4270.7010.753

100250.9730.9760.3930.4170.6410.707
500.9790.9790.4120.4150.6840.697

150370.9850.9810.3560.3980.5500.686
750.9880.9880.3480.3500.5800.584

200500.9880.9890.3990.3840.6730.650
1000.9870.9870.3840.3840.6360.636

300750.9850.9810.3760.3650.6460.579
1500.9890.9890.3950.3950.6090.609

References
  1. Baringhaus L and Franz C (2004). On a new multivariate two-sample test. Journal of Multivariate Analysis, 88, 190-206.
    CrossRef
  2. Cunen C, Hermansen G, and Hjort NL (2018). Confidence distributions for change-points and regime shifts. Journal of Statistical Planning and Inference, 195, 14-34.
    CrossRef
  3. Chen J, Gupta AK, and Pan J (2006). Information criterion and change point problem for regular models. The Indian Journal of Statistics, 68, 252-282.
  4. Karian ZA and Dudewicz EJ (2000). Fitting Statistical Distributions, The Generalized Lambda Distribution and Generalized Bootstrap Methods, Boca Raton, CRC Press.
  5. Kim AY, Marzban C, Percival DB, and Stuetzle W (2009). Using labeled data to evaluate change detectors in a multivariate streaming environment. Signal Processing, 89, 2529-2536.
    CrossRef
  6. Matteson DS and James NA (2014). A nonparametric approach for multiple change point analysis of multivariate data. Journal of the American Statistical Association, 109, 334-345.
    CrossRef
  7. Ngunkeng G and Ning W (2014). Information approach for the change-point detection in the skew normal distribution and its applications. Sequential Analysis, 33, 475-490.
    CrossRef
  8. Ning W and Gupta AK (2009). Change point analysis for generalized lambda distribution. Communications in Statistics-Simulation and Computation, 38, 1789-1802.
    CrossRef
  9. Pearson K (1985). Contributions to the mathematical theory of evolution. Philisophical Transactions of the Royal Society of London A, 185, 71-110.
  10. Ramberg JS and Schmeiser BW (1972). An approximate method for generating symmetric random variables. Communications of the ACM, 15, 987-990.
    CrossRef
  11. Ramberg JS and Schmeiser BW (1974). An approximate method for generating asymmetric random variables. Communications of the ACM, 17, 78-82.
    CrossRef
  12. Ramberg JS, Tadikamalla PR, Dudewicz EJ, and Mykytka EF (1979). A probability distribution and its uses in fitting data. Technometrics, 21, 201-204.
    CrossRef
  13. Ratnasingam S and Ning W (2020). Confidence distributions for skew normal change-point model based on modified information criterion. Journal of Statistical Theory and Practice, 4, 1-21.
  14. Ratnasingam S and Ning W (2021). Modified information criterion for regular change point models based on confidence distribution. Environmental and Ecological Statistics, 28, 303-322.
    CrossRef
  15. Rizzo ML (2002). A new rotation invariant goodness-of-fit test (PhD thesis) , Bowling Green State University.
  16. Rizzo ML (2009). New goodness-of-Fit tests for Pareto distributions. Journal of the International Association of Actuaries, 39, 691-715.
  17. Rizzo ML and Haman JT (2016). Expected distances and goondess-of-fit for the asymmetric Laplace distribution. Statistics and Probability Letters, 117, 158-164.
    CrossRef
  18. Schweder T and Hjort N (2002). Confidence and likelihood. Scandinavian Journal of Statistics, 29, 309-332.
    CrossRef
  19. Schweder T and Hjort NL (2016). Confidence, Likelihood, Probability: Statistical Inference with Confidence Distributions, Cambridge University Press.
    CrossRef
  20. Shen J, Liu R, and Xie M (2018). Prediction with confidence-a general framework for predictive inference. Journal of Statistical Planning and Inference, 195, 126-140.
    CrossRef
  21. Singh K, Xie M, and Strawderman WE (2005). Combining information from independent sources through confidence distributions. Annals of Statistics, 33, 159-183.
    CrossRef
  22. Singh K, Xie M, and Strawderman WE (2007). Confidence distribution (cd): distribution estimator of a parameter. Lecture Notes-Monograph Series Complex Datasets and Inverse Problems: Tomography, Networks and Beyond, 54, 132-150.
    CrossRef
  23. Singh K and Xie M (2012). CD posterior-combining prior and data through confidence distributions. Contemporary Developments in Bayesian Analysis and Statistical Decision Theory: A Festchrift in Honor of Williams E. Strawderman (D. Fourdrinier et al. eds.), IMS Collection, 8, 200-214.
  24. Su S (2016). Fitting flexible parametric regression models with GLDreg in R. Journal of Modern Applied Statistical Methods, 15.
    CrossRef
  25. Sz챕kely GJ (2000). Technical report 0305: E-statistics: energy of statistical samples, Department of Mathematics and Statistics, Bowling Green State University.
  26. Sz챕kely GJ and Rizzo ML (2004). Testing for equal distributions in high dimension. Mathematics, 1-6.
  27. Sz챕kely GJ and Rizzo ML (2005). A new test for multivariate normality. Journal of Multivariate Analysis, 93, 58-80.
    CrossRef
  28. Sz챕kely GJ and Rizzo ML (2013). Energy statistics: A class of statistics based on distances. Journal of Statistical Planning and Inference, 143, 1249-1272.
    CrossRef
  29. Tukey JW (1960). The Practical Relationship Between the Common Transformations of Percentages of Counts and of Amounts Tukey, J. W. Technical Reports, 36, Statistical Techniques Research Group, Princeton University.
  30. Vostrikova (1981). Detecting 쐂isorder in multidimensional random processes. Soviet Mathematics Doklady, 24, 55-59.
  31. Xie M and Singh K (2013). Confidence distribution, the frequentist distribution estimator of a parameter: A review. International Statistical Review, 81, 2-39.
    CrossRef
  32. Yang G (2012). The energy goodness-of-fit test for univariate stable distributions (PH. D Thesis) , Bowling Green State University.