This article deals with threshold-asymmetric volatility models for over-dispersed and zero-inflated time series of count data. We introduce various threshold integer-valued autoregressive conditional heteroscedasticity (ARCH) models as incorporating over-dispersion and zero-inflation via conditional Poisson and negative binomial distributions. EM-algorithm is used to estimate parameters. The cholera data from Kolkata in India from 2006 to 2011 is analyzed as a real application. In order to construct the threshold-variable, both local constant mean which is time-varying and grand mean are adopted. It is noted via a data application that threshold model as an asymmetric version is useful in modelling count time series volatility.
Over the past three decades, there has been increasing interest in modeling integer-valued time series because of the broad range of potential applicability to epidemiology (Cardinal
The first order integer-valued autoregressive model (INAR(1)) with Poisson distribution has been introduced by McKenzie (1985). Alzaid and Al-Osh (1990) extended it to
Although modeling time series of count data with Poisson distribution is a useful tool, in practice, over-dispersion and zero-inflation in the time series is easily led to a violation of major assumptions that the variance is equal to the mean, and the parameters are to be positive. Ferland
In this paper we study conditional variance (volatility) for over-dispersion, zero-inflation, and serial dependence of count time series data. The organization of this paper is as follows. Section 2 re-introduces existing models as threshold integer-valued analogue of the autoregressive conditional heterosckedastic (ARCH) model by adding threshold-asymmetric effects to the models. It is noted that innovation follows either Poisson distribution or negative binomial distribution. Their estimation method is discussed in Section 3. Section 4 illustrates appropriate threshold model building strategies via applying proposed threshold-models to actual, highly skewed, zero-inflated, and serially correlated data example of cholera disease in Kolkata in India from 2006 to 2011 (Ali
The first-order integer-valued ARCH (INARCH(1), for short) model (Ferland, 2006) is defined as a conditional Poisson model defined by
where
where the parameters
To accommodate over-dispersion in the data, one may consider the model for which negative binomial distribution is used to model the process. The first-order integer-valued negative binomial ARCH (NB-INARCH(1), for short) model (Zhu, 2011; Yoon and Hwang, 2015a) is defined as
where the parameter
where
In order to capture zero-inflation in the count data, Zhu (2012a) and Yoon and Hwang (2015b) investigated the following first-order zero-inflated Poisson ARCH (ZIP-INARCH(1)) model which is formulated as
where ZIP(
where
where
With replacing Poisson by negative binomial distribution in (
where ZINB(
where 0 <
Here
For each model discussed in Section 2, we use EM algorithm to estimate the parameters following the method proposed by Zhu (2012a). See also Yoon and Hwang (2015a, 2015b) for the application of EM algorithm in the context of count time series. Since general steps are the same except conditional log-likelihood function, first-order and second-order derivatives of the log-likelihood function with respect to parameters, we discuss INTARCH(1) case only, defined by (
The likelihood function of Poisson distribution in INTARCH(1) model is
The conditional log-likelihood function is
The first derivative of the log-likelihood with respect to
while the second derivative is
The iterative EM procedure estimates the parameter
where
The estimates
In this section, via real data application, we illustrate INTARCH, NB-INTARCH, ZIP-INTARCH, and ZINB-INTARCH models which are defined in Section 3. We consider time series of weekly cholera cases from Kolkata in India, consisting of 260 observations starting from 39
A histogram of the series shows there are 94 zeros which is 36% of the series. The zero-inflation index (
where
The sample autocorrelation and partial autocorrelation function of the series in entire period, 1
The initial values
where [
The results of model fitting for entire period and two subset periods are summarized in Tables 1
where log
In each subset period (see Tables 2 and 3), it is seen overall that negative binomial models and zero-inflated negative binomial models are appropriate. See the ZINB1-INTARCH(1) model fitted in the 1st half period series and observe that a substantial improvement is obtained when using local constant threshold rather than using grand mean threshold. However, in the Table 3 (2
In this paper we have discussed various threshold-asymmetric (ARCH-type conditionally heteroscedastic) volatility models to analyze integer-valued count time series. Over-dispersion and zero-inflation are accommodated using negative binomial distributions. The EM method is adopted to estimate parameters. Two threshold variables, viz., grand mean and local constant mean, are considered in various threshold models. It is noted that the local constant mean works usually better than the grand mean while the grand mean seems better than the local constant mean in case when high peak season is prominent in short time period (see 2
We are grateful for sharing the data in this paper to International Vaccine Institute (IVI), Seoul, Korea and the National Institute of Cholera and Enteric Disease (NICED), Kolkata in India who jointly owned the data. We thank the two anonymous referees for careful reading of the paper. SY Hwang’s work was supported by a grant from the National Research Foundation of Korea (NRF-2018R1A2B2004157).
Parameter estimates: entire period
Models | Threshold value | AIC | BIC | ||||||
---|---|---|---|---|---|---|---|---|---|
INARCH(1) | 0.79603 | 0.69163 | 1129.8 | 1136.9 | 1125.8 | ||||
INTARCH(1) | Grand mean | 0.79851 | 0.69258 | 0.68577 | 1131.8 | 1136.9 | 1125.8 | ||
INTARCH(1) | Local constant | 0.76621 | 0.66631 | 0.77497 | 1130.0 | 1135.2 | 1124.0 | ||
NB1-INARCH(1) | 0.73410 | 0.68364 | 0.99999 | 959.0 | 964.1 | 953.0 | |||
NB1-INTARCH(1) | Grand mean | 0.73374 | 0.68350 | 0.68450 | 0.99999 | 961.0 | 964.1 | 953.0 | |
NB1-INTARCH(1) | Local constant | 0.70456 | 0.65637 | 0.77028 | 0.99999 | 959.2 | 962.3 | 951.2 | |
NB2-INARCH(1) | 0.81757 | 0.66694 | 0.69629 | 957.5 | 962.6 | 951.5 | |||
NB2-INTARCH(1) | Grand mean | 0.80786 | 0.65285 | 0.69834 | 0.69632 | 959.4 | 962.5 | 951.4 | |
NB2-INTARCH(1) | Local constant | 0.78486 | 0.58835 | 0.80545 | 0.68720 | 957.7 | 960.8 | 949.7 | |
ZIP-INARCH(1) | 0.19694 | 1.15342 | 0.69644 | 1097.1 | 1102.2 | 1091.1 | |||
ZIP-INTARCH(1) | Grand mean | 0.19639 | 1.13052 | 0.69351 | 0.73603 | 1098.9 | 1102.0 | 1090.9 | |
ZIP-INTARCH(1) | Local constant | 0.18967 | 1.07684 | 0.67218 | 0.81824 | 1096.7 | 1099.8 | 1088.7 | |
ZNB1-INARCH(1) | 0.15709 | 0.99536 | 0.70046 | 0.99999 | 971.8 | 974.9 | 963.8 | ||
ZNB1-INTARCH(1) | Grand mean | 0.15634 | 0.96485 | 0.69485 | 0.75873 | 0.99999 | 973.2 | 974.3 | 963.2 |
ZNB1-INTARCH(1) | Local constant | 0.15112 | 0.92517 | 0.66899 | 0.83502 | 0.99999 | 970.2 | 971.3 | 960.2 |
ZNB2-INARCH(1) | 0.00001 | 0.81759 | 0.66694 | 0.69626 | 959.5 | 962.6 | 951.5 | ||
ZNB2-INTARCH(1) | Grand mean | 0.00001 | 0.80787 | 0.65285 | 0.69835 | 0.69630 | 961.4 | 962.5 | 951.4 |
ZNB2-INTARCH(1) | Local constant | 0.00001 | 0.78487 | 0.58835 | 0.80546 | 0.68717 | 959.7 | 960.8 | 949.7 |
AIC = Akaike information criterion; BIC = Bayesian information criterion.
Parameter estimates: 1
Models | Threshold value | AIC | BIC | ||||||
---|---|---|---|---|---|---|---|---|---|
INARCH(1) | 0.74651 | 0.59011 | 516.7 | 522.4 | 512.7 | ||||
INTARCH(1) | Grand mean | 0.65236 | 0.56924 | 0.96502 | 514.6 | 518.3 | 508.6 | ||
INTARCH(1) | Local constant | 0.67205 | 0.52123 | 0.84128 | 513.2 | 516.9 | 507.2 | ||
NB1-INARCH(1) | 0.59222 | 0.59579 | 0.99999 | 431.2 | 434.9 | 425.2 | |||
NB1-INTARCH(1) | Grand mean | 0.57416 | 0.58583 | 0.69974 | 0.99999 | 432.9 | 434.6 | 424.9 | |
NB1-INTARCH(1) | Local constant | 0.56399 | 0.54626 | 0.74637 | 0.99999 | 431.9 | 433.6 | 423.9 | |
NB2-INARCH(1) | 0.61644 | 0.76296 | 0.99596 | 438.5 | 442.2 | 432.5 | |||
NB2-INTARCH(1) | Grand mean | 0.56781 | 0.66614 | 1.16379 | 0.96519 | 438.6 | 440.3 | 430.6 | |
NB2-INTARCH(1) | Local constant | 0.57547 | 0.59934 | 1.07206 | 0.95667 | 438.1 | 439.8 | 430.1 | |
ZIP-INARCH(1) | 0.39696 | 2.15690 | 0.33124 | 488.1 | 491.8 | 482.1 | |||
ZIP-INTARCH(1) | Grand mean | 0.36911 | 1.68879 | 0.40966 | 0.90624 | 488.4 | 490.1 | 480.4 | |
ZIP-INTARCH(1) | Local constant | 0.36896 | 1.73160 | 0.35980 | 0.69876 | 486.9 | 488.6 | 478.9 | |
ZNB1-INARCH(1) | 0.02876 | 0.61859 | 0.61160 | 0.99999 | 433.1 | 434.8 | 425.1 | ||
ZNB1-INTARCH(1) | Grand mean | 0.03345 | 0.60215 | 0.60188 | 0.73599 | 0.99999 | 434.8 | 434.5 | 424.8 |
ZNB1-INTARCH(1) | Local constant | 0.05860 | 0.60291 | 0.55509 | 0.87566 | 0.99999 | 432.9 | 432.6 | 422.9 |
ZNB2-INARCH(1) | 0.00001 | 0.61646 | 0.76294 | 0.99593 | 440.5 | 442.2 | 432.5 | ||
ZNB2-INTARCH(1) | Grand mean | 0.00002 | 0.56783 | 0.66614 | 1.16382 | 0.96514 | 440.6 | 440.3 | 430.6 |
ZNB2-INTARCH(1) | Local constant | 0.00003 | 0.57550 | 0.59934 | 1.07207 | 0.95660 | 440.1 | 439.8 | 430.1 |
AIC = Akaike information criterion; BIC = Bayesian information criterion.
Parameter estimates: 2
Models | Threshold value | AIC | BIC | ||||||
---|---|---|---|---|---|---|---|---|---|
INARCH(1) | 0.79802 | 0.75262 | 585.8 | 591.5 | 581.8 | ||||
INTARCH(1) | Grand mean | 1.06586 | 0.80077 | 0.26593 | 571.6 | 575.3 | 565.6 | ||
INTARCH(1) | Local constant | 0.79866 | 0.75308 | 0.75105 | 587.8 | 591.5 | 581.8 | ||
NB1-INARCH(1) | 0.76593 | 0.74109 | 0.99999 | 513.6 | 517.3 | 507.6 | |||
NB1-INTARCH(1) | Grand mean | 1.02377 | 0.78768 | 0.28480 | 0.99999 | 511.4 | 513.1 | 503.4 | |
NB1-INTARCH(1) | Local constant | 0.76067 | 0.73735 | 0.75371 | 0.99999 | 515.5 | 517.2 | 507.5 | |
NB2-INARCH(1) | 0.87897 | 0.66871 | 0.46770 | 500.5 | 504.3 | 494.5 | |||
NB2-INTARCH(1) | Grand mean | 1.06572 | 0.79207 | 0.27063 | 0.40640 | 495.7 | 497.4 | 487.7 | |
NB2-INTARCH(1) | Local constant | 0.86367 | 0.64001 | 0.71802 | 0.46731 | 502.4 | 504.1 | 494.4 | |
ZIP-INARCH(1) | 0.11934 | 0.97824 | 0.77646 | 575.9 | 579.6 | 569.9 | |||
ZIP-INTARCH(1) | Grand mean | 0.10699 | 1.27958 | 0.80541 | 0.24493 | 565.5 | 567.2 | 557.5 | |
ZIP-INTARCH(1) | Local constant | 0.12072 | 0.97821 | 0.77481 | 0.78243 | 577.9 | 579.6 | 569.9 | |
ZNB1-INARCH(1) | 0.10186 | 0.91877 | 0.76517 | 0.99999 | 519.5 | 521.2 | 511.5 | ||
ZNB1-INTARCH(1) | Grand mean | 0.08183 | 1.18010 | 0.79731 | 0.27344 | 0.99999 | 516.6 | 516.3 | 506.6 |
ZNB1-INTARCH(1) | Local constant | 0.10163 | 0.90772 | 0.75933 | 0.78765 | 0.99999 | 521.3 | 521.0 | 511.3 |
ZNB2-INARCH(1) | 0.00001 | 0.87897 | 0.66873 | 0.46768 | 502.5 | 504.3 | 494.5 | ||
ZNB2-INTARCH(1) | Grand mean | 0.00002 | 1.06575 | 0.79208 | 0.27063 | 0.40637 | 497.7 | 497.4 | 487.7 |
ZNB2-INTARCH(1) | Local constant | 0.00001 | 0.86366 | 0.64004 | 0.71803 | 0.46729 | 504.4 | 504.1 | 494.4 |
AIC = Akaike information criterion; BIC = Bayesian information criterion.