search for

CrossRef (0)
Computational explosion in the frequency estimation of sinusoidal data
Communications for Statistical Applications and Methods 2018;25:431-442
Published online July 31, 2018
© 2018 Korean Statistical Society.

Kaimeng Zhanga, Chi Tim Ng1,a, and Myunghwan Naa

aDepartment of Statistics, Chonnam National University, Korea
Correspondence to: 1Department of Statistics, Chonnam National University, 77, Yongbong-ro, Buk-gu, Gwangju 61186, Korea. E-mail: easterlyng@gmail.com
Received April 20, 2018; Revised June 1, 2018; Accepted June 8, 2018.

This paper highlights the computational explosion issues in the autoregressive moving average approach of frequency estimation of sinusoidal data with a large sample size. A new algorithm is proposed to circumvent the computational explosion difficulty in the conditional least-square estimation method. Notice that sinusoidal pattern can be generated by a non-invertible non-stationary autoregressive moving average (ARMA) model. The computational explosion is shown to be closely related to the non-invertibility of the equivalent ARMA model. Simulation studies illustrate the computational explosion phenomenon and show that the proposed algorithm can efficiently overcome computational explosion difficulty. Real data example of sunspot number is provided to illustrate the application of the proposed algorithm to the time series data exhibiting sinusoidal pattern.

Keywords : ARMA, sinusoidal data, computational explosion
1. Introduction

Time series exhibiting sinusoidal pattern is ubiquitous in digital signal processing, econometrics, agriculture, and biomedicine. For example, Stroebel et al. (2010) summarize statistical methods for detecting biological rhythms. Human breathing data collected during radiation therapy are studied in Choi et al. (2014). White et al. (2011) suggests that the location of the tumor that moves periodically due to the breathing motion during the therapy can be predicted based on autoregressive moving average (ARMA) model. Classical treatment of ARMA model based time series methods is Brockwell and Davis (1991). Unlike the applications in econometrics and agriculture where the period can be naturally considered to be one-year, the frequency and periodicity of certain biological data (e.g., breathing cycle) depend on the physical conditions of the subjects. In such situations, frequency has to be estimated. In addition, the seasonal autoregressive integrated moving average (SARIMA) model requires known periodicity fail to handle the unknown-frequency cases. Periodogram is another important frequency estimation method for the time series data with unknown frequency. However, interpretation of estimation results from periodogram can be difficult when the time series deviates from the sinusoidal pattern, for example, a linear combination of a sinusoidal function and a linear function. In such a situation, non-zero Fourier frequencies are no longer sparse.

The feasibility of analyzing sinusoidal data via ARMA model is studied in Platonod et al. (1992). It is illustrated that sine/cosine model perturbed with error terms can be rewritten as an ARMA model with unit complex characteristic roots. After fitting the ARMA model, the frequency can be inferred indirectly through these characteristic roots. In addition, the combined linear trend and sinusoidal pattern can be described via the multiplicity of complex characteristic roots. In the simulation study, Platonod et al. (1992) only consider examples with small sample size n = 100. As noted in p.292 of Ozaki (2012), most existing numerical nonlinear optimization procedure of ARMA model estimation face the risk of “computational explosion” due to the violation of invertibility condition of MA parameters during the iteration. Such a computational explosion difficulty becomes more serious in the sinusoidal data because the unit complex roots provide non-invertible MA coefficients. This hinders the applications of ARMA model to biomedicine data exhibiting periodic patterns.

Both stationarity condition and invertibility condition are violated in the sinusoidal data and make the situation more difficult. The maximum likelihood estimate cannot be obtained unless the initial values of the innovation terms are supplied by the user since the stationary distribution is not well-defined. The initial values are unknown, but is necessary as the starting point of the recursive procedure as described in Brockwell and Davis (1991). Without the invertibility condition, it is not guaranteed that the influence of such initial values on the maximum likelihood estimate is negligible.

In this paper, a new algorithm is proposed to estimate the ARMA model violating both stationarity and invertibility conditions. The important idea is that the univariate ARMA model always has multivariate vector state-space representation, see Chapter 12 of Brockwell and Davis (1991). By storing the normalized vector and the logarithm of the norm of the vector only, we are able to circumvent the “computational explosion” difficulties.

The paper is organized as follows. In Section 2, the relationship between sinusoidal data and non-invertible ARMA model is discussed. In Section 3, a procedure allowing estimation under noninvertibility is presented. The computational explosion difficulty is further dealt with in Section 4. Simulation studies are given in Section 5, followed by concluding remarks in Section 6.

2. Preliminaries: ARMA versus sinusoidal pattern

Suppose that the model for the data is generated by

yt=k=1KAksin 2πωkt+k=1KBkcos 2πωkt+t,

where εt, t = 1, 2, . . . , T are independent and identically-distributed N(0, 1) random variables. Direct estimation based on the above model is extremely difficult because the local minimum of

t=1T(yt-k=1KAksin 2πωkt-k=1KBkcos 2πωkt)2

is not unique due to the periodicity of sine/cosine function.

Can the ARMA model be used to handle the sinusoidal pattern? The answer is positive, see Platonod et al. (1992). The crucial point is that sine/cosine function is the solution to a recursive relationship. To illustrate some basic ideas, consider a special case where the coefficients are constants of time,

yt=Artsin 2πωt+Brtcos 2πωt+t.

Here, A, B, r, ω are unknown parameters. Note that ytεt = Art sin 2πωt + Brt cos 2πωt fulfills


when the characteristic equation 1 − 1z2z2 = 0 has complex roots r-1cos 2πω±r-1sin 2πω-1. See Brockwell and Davis (1991) for the application of recursive relationships to time series analysis. Transposing terms, Equation (2.1) can be rewritten as


where θ1 = 1 and θ2 = 2. This is a special case of ARMA model with MA coefficients equaling to the AR coefficients. Then, r and ω can be obtained from the roots of the characteristic equation 1 − 1z2z2 = 0. Given r and ω, the coefficients A and B can further be obtained using the usual regression method and the amplitude is estimated as (A2 + B2)1/2.

If the true model is

yt=k=1KAkrktsin 2πωkt+k=1KBkrktcos 2πωkt+t.

One can consider the following model fitting procedure

Procedure 1

(Finding frequency via ARMA model)

  • Fit the ARMA(2K, 2K) model.

  • Find all roots z1, . . . , zK (excluding the conjugates z1, . . . , zK) of the AR characteristic polynomial.

  • Estimate ωK = Arg(1/zK)/2π and rk = |1/zk|.

  • Estimate Ak and Bk, k = 1, 2, . . . , K by regressing yt against r^ktsin 2πω^ktand r^ktcos 2πω^kt, k = 1, 2, . . . , K.

3. Obtaining t, = 1,2, . . . under non-invertible cases

Consider the model


where t, t = 1, 2, . . . , n are independent N(0, σ2) random variables.

Consider the special cases with p = q = 2K and

yt-1yt-1--pyt-p=k=1K(1-2rkcos 2πωkL+rk2L2)yt=k=1K(1-2rkcos 2πωkL+rk2L2)t=t-θ1t-1--θqt-p,

where L is the back-shrift operator. Note that yt exhibits periodic pattern. In such a situation, the MA polynomial has complex roots and the model is non-invertible when rk ≥ 1.

Let − (q −1) = − (q −2) = · · · = 0 = 0 be the initial values. If the invertibility condition holds, the innovation terms for t = 1, 2, . . . can be obtained recursively as

^t( , θ)=yt-1yt-1--pyt-p+θ1^t-1( , θ)++θq^t-q( , θ).

However, if the invertibility condition is violated, the influence of the misspecified initial values − (q −1) = − (q −2) = · · · = 0 = 0 on t(, θ) is not guaranteed to diminish as t →∞. Our solution to the above-mentioned non-invertibility problem is based on the idea that the influences of the unknown true initial values (ε− (q −1), ε− (q −2), . . . , ε0) can be eliminated by regressing ηt(, θ) against q covariates ηt(, θ) = (ηt1(, θ), ηt2(, θ), . . . , ηtq(, θ)) defined in Procedure 2. For i = 1, 2, . . . , q, the ith component ηti(, θ) measures the sensitivity of t(, θ) against unit change in the ith misspecified initial value iq. Denote the vector of regression coefficients by α. The details are shown in the following procedure and the rigorous justifications are given after the presentation of the procedure.

Procedure 2

(New definition of conditional sum of squares)

  • Initialize 溝− (q −1) = − (q −2) = · · · = 0 = 0 and [η− (q −1), η(q −2), . . . , η0] = −Iq.

  • Compute 溝 and η via the recursive relationships,^t( , θ)=yt-1yt-1--pyt-p+θ1^t-1( , θ)++θq^t-q( , θ), η^t( , θ)=θ1 η^t-1( , θ)++θq η^t-q( , θ).

  • Minimize the sum of squaresQ( , θ, α)=t=1T(^t( , θ)- αT η^t( , θ))2.

    Given (, θ), the optimalα is α=(t=1T η^t( , θ) η^tT( , θ))-1(t=1T^t( , θ) η^t( , θ)).

Note that the usual conditional sum of squares method minimizes t=1T^t( , θ)instead.

To illustrate the rationale underlying the procedure, notice that the model can be rewritten as








One can consider this as a regression model with a left-hand-side response, the coefficients are the initial values (ε− (q −1), ε− (q −2), . . . , ε0), and the design matrix is ΘL-1ΘU. It is possible to eliminate the influence of the initial guesses via the above regression model. The computation of the quantities ΘL-1Z and ΘL-1ΘU can be simplified by noting that

[ΘU,ΘL](0ΘL-1Z)=Z 듼 듼 듼and 듼 듼 듼[Θ,ΘL](-IqΘL-1ΘU)=0.

Set − (q −1) = − (q −2) = · · · = 0 = 0 and [η− (q −1), η− (q −2), . . . , η0] =−Iq. Denote the elements of the response vector ΘL-1Z by t(, θ) and the rows of the design matrix by ηt(, θ). Equations (3.3) is equivalent to the recursive formulas in Step 2 of the procedure since all rows in [ΘU,ΘL] are the same excepting zeros entries. The conditional sum of squares defined in Step 3 can be minimized by using any suitable numerical optimization routines. In the simulation studies presented in Section 5, this is performed using the “unconstrained optimization by quadratic approximation (UObyQA)” algorithm in Powell (2002). Note that in invertible cases, ηt(, θ) has the exponentially decaying pattern. However, this is not true in non-invertible cases. Neglecting such terms gives the usual maximum conditional likelihood estimation method. They are not negligible under non-invertible cases.

4. Computational explosion issues

To avoid the difficulties related to the exploding t(, θ) and ηt(, θ), consider an algorithm that stores only the logarithms of the conditional sum of squares Q(, θ, α). Denote

Et=(1,^t( , θ),,^t-q+1( , θ), η^t( , θ),, η^t-q+1( , θ))

and et = Et/||Et||2 be the normalized vector. Let




Procedure 3

(Algorithm storing the logarithm of conditional sum of squares)

  • Set g0= 0 and S0 = 0.

  • Define recursively for t = 1, 2, . . . , Tct=Btet-12,et=Btet-1ct,gt=gt-1+logct2,St=etet+St-1ct2.

  • Compute log Q(, θ, α) = gT + log bTSTb. Here, bT=(0,1,0,,0,- αT,0qT,,0qT)

    andα can be obtained as S[3:(p+q+2),3:(p+q+2)]-1S[3:(p+q+2),2].

The equivalence between Procedures 2 and 3 can be verified easily by mathematical induction arguments.

5. Simulation study

The simulation study illustrates: (i) the computational explosion phenomenon, (ii) the necessity of introducing ηt in Procedure 2, and (iii) the feasibility of frequency estimation. The proposed methods are implemented in CRAN R language. The optimization is carried out using the R package “powell” that implements the “UObyQA” algorithm in Powell (2002).

5.1. Computational explosion phenomenon

Consider the frequency representations of the ARMA model like Equation (2.2)

yt=k=1KAkrktsin 2πωkt+k=1KBkrktcos 2πωkt+0.5t,

, with the following three settings,

  • Setting 1: p = q = 2K = 4, r1 = r2 = 1, (A1, A2) = (2.5, −0.8), (B1, B2) = (1.3, −0.5), and 2πω = (0.25, 0.5).

  • Setting 2: p = q = 2K = 6, r1 = r2 = r3 = 1, (A1, A2, A3) = (2.5, −0.8, 1.2), (B1, B2, B3) = (1.3, −0.5, 0.3), and 2πω = (0.25, 0.5, 0.75).

The time series are then standardized so that the standard deviations always equal to one. The coefficients * = θ* are obtained from the expansion

1-1L--pLt-p=k=1K(1-2rkcos 2πωkL+rk2L2).

To illustrate the computational explosion phenomenon, the critical sample sizes for the stack overflow are investigated. The programs are implemented in CRAN R language. Procedure 2 requires the storage of the values of sums of squares Q. There are stack-overflow-related errors if Q is greater than 10308 since double precision is used in R to store numerical values. The experiment is conducted as follows. Perturb each component of the coefficients * and θ* by N(0, 0.032) random variables. For each T = 10, 20, 30, . . . , 2500, the conditional sum of squares Q is evaluated at such perturbed coefficients using Procedures 2 and 3. The experiment is repeated for 100 times. Different perturbed coefficients are used in different trials. For each T, if there is no stack-overflow-related error in all 100 trials, the average log(Q) value is reported. For Procedure 2, the results of Settings 1 and 2 are presented in Figure 1, panels (a) and (b) respectively. Under Setting 1, computation of average log(Q) results in stack overflow problem after T ≈ 750 and under Setting 2, the critical sample size of the explosion is ≈ 600. Before the stack overflow occurs, average log(Q) is close to 400, or equivalently, the value of Q exceeds 10308, the limit of double precision. This means that Procedure 2 alone can result in computational explosion problem. Therefore, Procedure 3 is necessary. For Procedure 3, the results of average log(Q) under Settings 1 and 2 are reported in Figure 1, panels (c) and (d). Procedure 3 results in no computational explosion problem.

5.2. Necessity of introducing ηt in Procedures 2 and 3

To see the necessity of introducing ηt, Procedures 2 and 3 based on Q( , θ, α)=t=1T(^t( , θ)-α η^t( , θ))2 is compared to the usual conditional sum of squares t=1T^t( , θ). In the simulation, Procedure 3 is implemented in CRAN R language and conditional sum of squares based estimation is performed using the arima function provided by R. Consider the following three settings

  • Setting 1: p = q = 2K = 4, r1 = r2 = 1, (A1, A2) = (2.5, −0.8), (B1, B2) = (1.3, −0.5), and 2πω = (0.25, 0.5).

  • Setting 2: p = q = 2K = 6, r1 = r2 = r3 = 1, (A1, A2, A3) = (2.5, −0.8, 1.2), (B1, B2, B3) = (1.3, −0.5, 0.3), and 2πω = (0.25, 0.5, 0.75).

  • Setting 3: p = q = 2K = 8, r1 = r2 = r3 = r4 = 1, (A1, A2, A3, A4) = (2.5, −0.8, 1.2, 0.7), (B1, B2, B3, B4) = (1.3, −0.5, 0.3, −0.5), and 2πω = (0.25, 0.5, 0.75, 0.9).

For each setting, the experiment is repeated for N = 100 times with different sample sizes T = 100, 300, 600, 1500. The performance is evaluated via the 2% trimmed relative mean square error (RMSE) of the coefficients , θ,

RMSE =(10.8Nn=10.8N (n)- * *2)12 듼 듼 듼and 듼 듼 듼RMSE θ=(10.8Nn=10.8N θ(n)- θ* θ*2)12.

Here, (n) and θ(n) refers to the estimates obtained in the trial with nth smallest mean relative errors and * and θ* refer to the true parameters. The coefficients * = θ* are obtained from the expansion

1-1L--pLt-p=k=1K(1-2rkcos 2πωkL+rk2L2).

Table 1 shows the results of the RMSE. It can be seen that ARMA models fitted by R function arima tend to have greater RMSE than those fitted by Procedure 3, especially for large p cases with p = 6, 8.

5.3. Feasibility of frequency estimation

The same settings as in Section 5.2 are considered. To study the performance of the frequency estimation under ARMAeta model, 2πω is estimated via Procedure 1. Step (1) of Procedure 1 is carried out by minimizing the sum of squares in Procedure 3 using UObyQA optimization algorithm. The computational explosion difficulty is avoided since the sum of squares is obtained from Procedure 3 and the performance is evaluated via the 2% trimmed RMSE of 2πω,


where the ωk* is the true value and ωk(n) is the estimated value with nth smallest mean relative errors.

Table 2 shows the results of RMSE of ω1, . . . , ωk. It can be see that the RMSE increases as k, T, and p increase.

6. Real data analysis

Consider the hysteresis of the solar cycles, that can be measured by the number of sunspots, see Dmitriev et al. (2002) and Suyal et al. (2012). We use the monthly Wolf’s sunspot number data from 1749 to March 2018, with 3230 observations in total, see Figure 2. The data can be obtained from the website of Royal Observatory of Belgium; similar data is also used in the literature of time series analysis, for example, Li et al. (2012) and Tong (1990).

The ARMAeta is applied for the sunspot data and the optimal model is selected by minimizing the Bayesian information criterion (BIC). For ARMAeta(p, p) model, including α, there are a total of 3 × p parameters. The BICs can be defined as


where ln(L) is the fitted sum of squares of the ARMAeta model.

Table 3 shows the BIC values when p = 2, 4, 6, 8 and the coefficients (, θ,ω) of each model are shown in Table 4. From Table 3, the ARMAeta(2, 2) model is chosen because it has the smallest BIC. Table 4 indicates the estimated frequency is 2πω = 0.0511 or ω ≈ 123. This suggests that there are 500/123 ≈ 4 cycles per 500 units of time and agrees with the pattern shown in Figure 2.

7. Conclusions

The estimation problem of ARMA model is discussed under settings where the invertibility is not fulfilled. New algorithms are proposed to deal with non-invertibility difficulty and computational explosion difficulty. The methods developed are useful for the sinusoidal time series data where the frequency is unknown. This facilitates the application of the ARMA method to biomedicine and meteorology such as breathing motion data obtained during radiation therapy.


Chi Tim, Ng’s work is supported by National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (No. NRF-2017R1C1B2011652).

Fig. 1. The log sum of squares values against sample size T. (a) p = 4 case and (b) p = 6 case with sum of squares computed using Procedure 2. (c) p = 4 case and (d) p = 6 case with sum of squares computed using Procedure 3.
Fig. 2. The monthly sunspot number data.

Table 1

The RMSE of coefficients (, θ) of ARMA and ARMAeta models


4 100 1.2098 0.9529 0.0714 0.0572
300 1.2292 0.9467 0.0413 0.0350
600 1.2227 0.9464 0.0418 0.0286
1500 1.1796 0.9663 0.0390 0.0495

6 100 1.0972 0.9697 0.0297 0.0038
300 1.1268 0.9651 0.0183 0.0027
600 1.1357 0.9638 0.0342 0.0029
1500 1.1204 0.9624 0.0609 0.0024

8 100 1.0189 1.0308 0.0035 0.0019
300 1.0055 1.0227 0.0055 0.0014
600 1.0614 0.9825 0.0090 0.0015
1500 1.1555 0.9217 0.0095 0.0014

ARMA is obtained by arima function in R and ARMAeta is obtained by Procedure 3. RMSE = relative mean square error.

Table 2

The RMSE of coefficients 2πω

p T ARMAeta

4 100 0.0079 0.0149
300 0.0096 0.0199
600 0.0639 0.0416
1500 0.1019 0.0613

6 100 0.0723 0.0688 0.0498
300 0.1605 0.1197 0.1184
600 0.1687 0.1152 0.0850
1500 0.1836 0.1409 0.1101

8 100 0.1724 0.0836 0.8350 0.0656
300 0.2106 0.1068 0.1007 0.0737
600 0.2078 0.1068 0.0989 0.0695
1500 0.2104 0.1119 0.0969 0.0617

ARMAeta is fitted by Procedure 3 and 2πω is obtained by Procedure 1. RMSE = relative mean square error.

Table 3

The BIC values of ARMAeta models for the sunspot data

Models ARMAeta(2, 2) ARMAeta(4, 4) ARMAeta(6, 6) ARMAeta(8, 8)
BIC 55112.43 55498.78 69445.98 176840.6

BIC = Bayesian information criterion.

Table 4

The coefficients (, θ, ω) of ARMAeta models for the sunspot data


1 2 3 4 5 6 7 8
ARMAeta(2, 2) p −1.9967 0.9993
θp −1.9986 1.0019
ωk 0.0511

ARMAeta(4, 4) p −3.8922 5.7861 −3.8929 1.0004
θp −3.8927 5.7862 −3.8929 1.0001
ωk 0.0565 0.3249

ARMAeta(6, 6) p −5.6313 13.5533 −17.8440 13.6633 −5.6313 0.9990
θp −5.6313 13.5533 −17.8439 13.5533 −5.6312 0.9990
ωk 2.051 × 10−16 0.3334 0.5162

ARMAeta(8, 8) p −7.0733 22.6740 −43.0201 52.8390 −43.0201 22.6739 −7.0733 1
θp −7.0733 22.6740 −43.0201 52.8390 −43.0201 22.6739 −7.0733 1
ωk 0.0498 0.3247 0.5174 0.7655

  1. Brockwell, PJ, and Davis, RA (1991). Time Series: Theory and Method. New York: Springer
  2. Choi, S, Chang, Y, Kim, N, Park, S, Song, S, and Kang, H (2014). Performance enhancement of respiratory tumor motion prediction using adaptive support vector regression: Comparison with adaptive neural network method. International Journal of Imaging Systems and Technology. 24, 8-15.
  3. Dmitriev, AV, Suvorova, AV, and Veselovsky, IS (2002). Expected hysteresis of the 23-rd solar cycle in the heliosphere. Advances in Space Research. 29, 475-479.
  4. Li, GD, Guan, B, Li, WK, and Yu, PLH (2012). Hysteretic autoregressive time series models. Biometrika. 99, 1-8.
  5. Ozaki, T (2012). Time Series Modeling of Neuroscience Data.
  6. Platonod, AA, Gajo, ZK, and Szabatin, J (1992). General method for sinusoidal frequencies estimation using ARMA algorithms with nonlinear prediction error transformation. IEEE. 5, 441-444.
  7. Powell, MJD (2002). UOBYQA: unconstrained optimization by quadratic approximation. Mathematical Programming, Series B. 92, 555-582.
  8. Stroebel, AM, Bergner, M, Reulbach, U, Biermann, T, Groemer, TW, Klein, I, and Kornhuber, J (2010). Statistical methods for detecting and comparing periodic data and their application to the nycthemeral rhythm of bodily harm: a population based study. Journal of Circadian Rhythms. 8, 1-10.
    Pubmed KoreaMed CrossRef
  9. Suyal, V, Prasad, A, and Singh, HP (2012). Hysteresis in a solar activity cycle. Solar Physics. 276, 407-414.
  10. Tong, G (1990). A Dynamical System Approach.
  11. White, BM, Low, DA, and Zhao, T (2011). Investigation of a breathing surrogate prediction algorithm for prospective pulmonary gating. Medical Physics. 38, 1587-1595.
    Pubmed KoreaMed CrossRef