TEXT SIZE

search for



CrossRef (0)
DR-LSTM: Dimension reduction based deep learning approach to predict stock price
Communications for Statistical Applications and Methods 2024;31:213-234
Published online March 31, 2024
© 2024 Korean Statistical Society.

Ah-ram Leea, Jae Youn Ahna, Ji Eun Choib, Kyongwon Kim1,a

aDepartment of Statistics, Ewha Womans University, Korea;
bDepartment of Statistics and Data Science, Pukyong National University, Korea
Correspondence to: 1 Department of Statistics, Ewha Womans University, 52 Ewhayeodae-gil, Seodaemun-gu, Seoul 03760, Korea. E-mail: kimk@ewha.ac.kr
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No.2021R1A6A1A10039823, RS-2023-00219212).
Received January 18, 2024; Revised January 30, 2024; Accepted January 30, 2024.
 Abstract
In recent decades, increasing research attention has been directed toward predicting the price of stocks in financial markets using deep learning methods. For instance, recurrent neural network (RNN) is known to be competitive for datasets with time-series data. Long short term memory (LSTM) further improves RNN by providing an alternative approach to the gradient loss problem. LSTM has its own advantage in predictive accuracy by retaining memory for a longer time. In this paper, we combine both supervised and unsupervised dimension reduction methods with LSTM to enhance the forecasting performance and refer to this as a dimension reduction based LSTM (DR-LSTM) approach. For a supervised dimension reduction method, we use methods such as sliced inverse regression (SIR), sparse SIR, and kernel SIR. Furthermore, principal component analysis (PCA), sparse PCA, and kernel PCA are used as unsupervised dimension reduction methods. Using datasets of real stock market index (S&P 500, STOXX Europe 600, and KOSPI), we present a comparative study on predictive accuracy between six DR-LSTM methods and time series modeling.
Keywords : dimension reduction, sufficient dimension reduction, long short term memory, recurrent neural network, time series data analysis
1. Introduction

The stock price is one of the most important indexes in the economic and financial sectors as it can reflect the fundamental status and prospects of a business or even of an economic policy. Thus, individual investors, financial institutions, and government have a vested interest in predicting stock market prices to determine the best practices of allocating a variety of resources. However, given the high uncertainty and volatility of stock value, forecasting and analyzing it is highly challenging. Numerous studies have been performed to predict the stock market ranging from a conventional statistical modeling such as time series modeling to machine learning and deep learning approaches. For example, linear time series methodologies such as the moving average model (MA), autoregressive model (AR), autoregressive moving average model (ARMA), and autoregressive integrated moving average model (ARIMA), autoregressive distributed lag model (ADL) have been widely implemented to predict a stock price. For example, the ARIMA model was used Idrees et al. (2019) to forecast daily movements in the Indian stock market. The ADL model was applied to forecast the annual gold prices by Madziwa et al. (2022) and to forecast the daily cryptocurrency returns by Malladi and Dheeriya (2021).

However, because linear time series models feature a linearity assumption, they face limitations when used on nonlinear data. In such cases, alternatives such as the threshold autoregressive models (TAR) and markov switching autoregressive model (MS-AR) are available. TAR, MS-AR models address nonlinear structure by changing AR coefficient at a time according to a relation between previous value and a threshold or hidden status of the previous time, respectively. For instance, Matias and Reboredo (2012) considered MS-AR model for forecasting S&P 500 index. These statistical models impose specific nonlinear structures, for example asymmetry as in the TAR and MS-AR models. Recently, other nonlinear approaches imposing no specific structures are widely considered by machine learning and genetic network algorithms. For example, Tsai et al. (2011) used an ensemble classifier to forecast stock returns. A further approach for handling the nonlinearity of the stock market data is deep learning. Widely used methodologies within this field include multi-layer perceptron (MLP), convolutional neural network model (CNN), recursive neural network (RNN), long short-term model (LSTM), and gated recurrent unit (GRU). In Ramezanian et al. (2019), a comparative analysis between genetic network programming and MLP for the stock market returns was performed. Wang et al. (2018) further proposed a deep CNN to predict the market movement. In this context, more recent studies include Gao et al. (2021), Xiao et al. (2022) and Wang et al. (2022).

RNN, LSTM, and GRU have been known to achieve a competitive preditive performance on sequential data such as time series dataset. Specifically, LSTM works well on this type of data because its structure resolves the gradient loss problem on RNN and retains memory for longer periods of time. Selvin et al. (2017) investigated stock price using LSTM, RNN, and CNN-sliding window model. Chen et al. (2015) further used LSTM to model and predict stock returns in the Chinese stock market. The predictive accuracy for the stock market data can be improved by utilizing an approach that combines a dimension reduction method with MLP and LSTM. In that study, principal component analysis (PCA) was initially used to extract the main components using linear combinations of technical indicators. This approach has been proven to reduce training time and improve predictive performance. In addition, PCA-LSTM was presented by Wen et al. (2020) to predict trends in fluctuation of stock prices. Moreover, better accuracy in forecasting stock price was achieved with MLP by solving multicollinearity problems among independent variables using PCA (Sujatha and Sundaram, 2010). A number of previous studies have explored the use of feature extraction from neural networks in statistical inference. For instance, Tran et al. (2020) utilized features extracted from deep neural networks in generalized linear models and generalized linear mixed models. Additionally, Fong and Xu (2021) proposed a nonlinear dimension reduction method using deep autoencoders, aimed at extracting features from high-dimensional data.

In this paper, we combine the dimension reduction methods with LSTM to achieve better prediction accuracy in predicting stock market indexes such as S&P 500 (USA), STOXX Europe 600 (EU), and KOSPI (South Korea). In particular, we use both supervised and unsupervised dimension reduction methods to extract a component of principal information from an explanatory variable. For dimension reduction methods, we use the linear methods including SIR and PCA, nonlinear methods such as kernel-SIR (KSIR) and kernel-PCA (KPCA), and parsimonious methods, namely sparse-SIR (SSIR) and sparse-PCA (SPCA). Furthermore, we demonstrated that our hybrid methods outperform LSTM with the original input variables. We also demonstrate that our methods are computationally efficient.

The rest of the paper is organized as follows. Section 2 introduces six unsupervised and supervised dimension reduction methods and LSTM. The results of the application of DR-LSTM models to the real dataset are presented in Section 3. This paper ends with a discussion in Section 4.

2. Methods

We propose a two-step approach to predict the stock price by incorporating dimension reduction methods with LSTM. In particular, a variety of dimension reduction methods are discussed including both supervised and unsupervised dimension reduction techniques. Therefore, we propose six combinations of forecasting procedures combining dimension reduction methods with LSTM; This is referred to as a dimension reduction based LSTM (DR-LSTM) framework. We anticipate that DR-LSTM can significantly improve the accuracy of predicting stock price data compared with conventional time series modeling by avoiding the “curse of dimensionality”. Figure 1 summarizes the process of data analysis of the DR-LSTM framework.

2.1. Prediction method

2.1.1. Recurrent neural network

With the development of computational power and the availability of large training datasets, neural network-based methods have become a particularly active research areas in the last decade. In particular, RNN was introduced for situation in which an input variable has a sequential feature. For example, RNN is highly regarded for its performance in document classification, handwriting data, speech recognition, and analyses of time series datasets. Let X = {X1, . . ., XT} be an input variable with Xi = (Xi1, . . ., Xip), H = (H1, . . ., HT) a hidden layer with Ht=(Hi1,,Hiq), and O = {O1, . . .,OT} an output.

As shown in Figure 2, RNN fully utilizes the sequential feature of the input variable. The hidden layer from the previous step, Ht1, and the input variable of the current state, Xt determine Ht based on the weight matrices WX, WH, and Wo. This can be presented as

Hij=σ(k=1pWXkXik+l=1qWH,lHi-1,l+b1)oi=WoHi+b2,

where σ is an activation function, Wo ∈ 꽍q, and b1, b2 are bias. Note that the final product is oT and o1, . . ., oT1 are not used in the prediction step.

Backpropagation through time was presented for the learning process of RNN. However, if we have a long-term time series dataset, the influence from X1 decreases as RNN is trained sequentially. In other words, X1 has a negligible impact on the long-term learning process, implying that the gradients of the weight matrices disappear in long sequences. This long-term dependency issue is one of the major limitations of RNN and is referred to as the a “vanishing gradient problem.”

2.1.2. Long short term memory

LSTM (Hochreiter and Schmidhuber, 1997) was introduced to circumvent the vanishing gradient problem. The primary objective of LSTM is to preserve short-term memory for long-term time intervals. It aims to provide a competitive prediction performance for relatively long-term time series data. In particular, because the stock market dataset – which encapsulates a large amount of data – depends on the long-term history, LSTM presents itself as a suitable approach to prevent the degradation of the learning rate.

As shown in Figure 3, LSTM consists of forget, cell, and output states. The forget state determines the information that will be discarded based on the hidden layer of the previous time step and an input value of the current time step. This can be presented as

ft=σ((WXf,Whf)·(Xt,ht-1))+bf,

where σ is an activation function such as Relu, while bf is a bias of the forget state. Then, cell state can be updated as

ct=ftct-1+itc˜,

where ⊗ is a Hadamard product and

it=σ((WXi,Whi)·(Xt,ht-1))+bi,c˜t=tanh(Wc(Xt,ht-1))+bc,

where bi and bc are biases. Note that it and ct imply the amount of updated information and alternative content, respectively. The cell state determines the amount of information that needs to be maintained from the cell state of the previous time step and the updated information of the current step. Finally, the output state and hidden state at time t can be formulated as

ot=σ((WXo,Who)·(Xt,ht-1))+bo,ht=ottanh(ct),

where bo is the bias.

2.2. Time series approach

In classical stock forecasting, two crucial statistical methodologies play a significant role: Autoregressive (AR) and autoregressive distributed lag (ADL) models and they fit the dataset to the pre-determined model. However, the presence of a latent feature in the dataset is known to limit their applicability. In the next section, we compare the predictive accuracy among the DR-LSTM method, AR model, ADL model, and the original LSTM method using a real stock market dataset.

2.3. Dimension reduction methodologies

2.3.1. Principal component analysis

PCA (Jolliffe, 1986) is one of the most famous dimension reduction methods that finds application in various fields, including but not limited to biology, medicine, economics, and many others. The principal idea of PCA involves finding a direction in the variate that explains its most variation. This can be presented through the following sequential maximization at kth step

maximize var (αX)withα=1such thatα,α1==α,αk-1=0,

where X, αk ∈ 꽍p and 윩·, ·윪 is a inner product. When k = 1, the last part is not necessary. Here, the kth principal component can be presented as αkX. PCA is a hands on method to achieve dimension reduction because it is easy to implement and its result is intuitive.

2.3.2. Sparse PCA

Because the final product of PCA consists of linear combinations of the variables, it is difficult to interpret. That is, if our goal is to select the most important variable that maximizes the variance of X, PCA is not an appropriate choice. To overcome this limitation, SPCA was introduced by Zou et al. (2006). It achieves sparsity via the addition of penalty terms to a regression type optimization problem. This can be formulated as

argminQ,PX-(QPX)2+λii=1kαi2+i=1kλ1,iαiwith respect to QQ=Ik,

where X ∈ 꽍n×p, Q ∈ 꽍p×k, P = (α1, . . ., αk) ∈ 꽍p×k. We obtain α1, . . ., αk and Xαi, i = 1, . . ., k is a sparse principal component. The ridge-type penalty in (2.1) eliminates uncertainty in all instances of X, ensuring the accurate reconstruction of the principal components. Meanwhile, the lasso-type penalty in (2.1) is employed to impose penalties on the loadings of the various principal components. Here, the L1 penalty substantially improves the interpretability of the principal component. For a comprehensive review of SPCA, see Zou et al. (2006).

2.3.3. Kernel PCA

In contemporary data analysis, dataset with a nonlinearity is prevalent. However, the PCA and SPCA methods are limited when nonlinear data are involved. Schölkopf et al. (1997) introduced the KPCA method, which can be presented as follows. First, KPCA map the original dataset to the higher dimensional feature space as

XΦ(X)with Φ:Ω,

where 꼱 is a reproducing kernel Hilbert space. Note that the feature space 꼱 must be accommodate nonlinear information of the original data. Then, similar to PCA, KPCA can be formulated via a maximizing problem as

maxvar (f,Φ(X)) 듼with 듼f=1,

where f ∈ 꼱. Aizerman (1964) and Boser et al. (1992) presented that determining the mapping function Φ is unnecessary. Instead, we use a positive definite kernel such as the Gaussian radial basis function kernel to compute an inner product. In particular, we use a reproducing property induced by reproducing kernel Hilbert space. A more detailed algorithm of KPCA is provided in Schölkopf et al. (1997).

2.3.4. Sliced inverse regression

Sufficient dimension reduction (SDR) is one of the most widely used supervised methods for dimension reduction. The goal of SDR is to study the relationship between X and Y, discarding X ∈ 꽍p while retaining βTX where β ∈ 꽍p×d with dp. This can be formulated as finding β such that

YXβX.

β is not an identifiable parameter as it is true that

YXAβX

for any nonsingular matrix A ∈ 꽍p×p. An identifiable parameter in SDR is the SDR subspace, which is also referred to as span(β). However, span(β) is not unique because enlarging an SDR subspace still an SDR subspace. Under the condition of mild assumptions (Yin et al., 2008), the intersection of all SDR subspaces is again an SDR subspace. It is trivial that this is the smallest SDR subspace and is called a central subspace which is denoted as . This is the objective of estimation in SDR.

A number of SDR methods use an inverse regression approach, which can be easily reformulated as a generalized eigendecomposition problem. One of the most well-known approaches is SIR (Li, 1991) which uses the property that under the linear conditional mean assumption. The goal of SIR is to find span(∑1var(E[X|Y])∑1). This can be presented as finding the first k eigenvectors corresponding to nonzero eigenvalues of ∑1/2var(E[X|Y])∑1/2. Then, it is true that ∑1/2v1, . . ., ∑1/2vk are in where v1, . . ., vk are the first k eigenvectors of ∑1/2 var(E[X|Y])∑1/2. SIR is by far one of the most famous SDR methods because it is easy to implement, computationally efficient, and possesses high estimation accuracy. For more details about implementing SIR, see Li (2018).

2.3.5. Sparse SIR

Similar to PCA, the result from SIR is based on the linear combinations of the original predictor variables. Therefore, a lack of interpretability is one of the main disadvantages of SIR when it is applied to the real dataset. Li (2007) introduced sparse estimation approach to recover a central subspace. They focus on the fact that SIR can be reformulated as an optimization problem in a regression setting. The LASSO penalty term (Zou et al., 2006) was further combined to a regression type SIR, providing a more efficient algorithm (Jolliffe, 1986). This idea can be formulated as follows. Let βi be the columns of (var(E[X|Y]))1/2 and let η, v ∈ 꽍p×d . Then the outcome of SSIR, v, can be presented as a solution of

argminη,uk=1pΣ-1βk-ηvβk2+λtr (vΣv)+i=1dλivisuch that ηΣη=Id,

where λi is non-negative and vi is a column of V. The LASSO constraint in (2.3) induces a sparsity to the estimator and it enhances the interpretability of the result.

2.3.6. Kernel SIR

Detecting nonlinear relationships is also an important issue in supervised dimension reduction problems. Similar to KPCA, if we map an original sample space to a higher dimensional feature space, we can convert a nonlinear problem into a linear setting which enables us to use well-developed linear methods. The kernel trick (Aronszajn, 1950) links dot product to a kernel, which implies that we do not need to specify a mapping function. That is, every inner product between functions can be represented as a kernel matrix of the original dataset. To demonstrate a symmetric structure between SIR and KSIR, we briefly revisit the SIR method. The sample level candidate matrix of SIR can be presented as

k=1kEn(yJk)En[XyJk]En[XyJk],

where J1, . . ., Jh are intervals of a partition of Y, h is the number of slices for Y, and En is an empirical sum. Using the same notation as in KPCA, the candidate matrix for KSIR can be presented as

k=1kEn(yJk)En[Φ(X)yJk]En[Φ(X)yJk],

where Φ(X) is the result of mapping the original input space X to the feature space. Comparing (2.5) to (2.4) clearly reveals that KSIR is a direct nonlinear extension of SIR. Wu (2008) demonstrates that KSIR achieves a meaningful gain in estimation accuracy with a negligible loss in computational efficiency.

3. Data analysis

3.1. Dataset

To demonstrate the efficiency of the DR-LSTM framework, we apply our methods to three stock market indices, S&P 500 (USA), STOXX Europe 600 (EU), and KOSPI (South Korea). The datasets cover the period from January, 2015 to November, 2021. We use six DR-LSTM methods to predict stock market returns. Here, the original predictor variables are the daily open price, close price, low price, high price, and trading volume data, which are visualized in Figures 46.

We reduce the number of dimensions of these predictor variables to 1 or 2 dimensions using six dimension reduction methods, as described in Section 2. We use 60% of the data as a training dataset, 20% as validation data, and the remaining 20% as a test dataset, as described in Figure 7.

For S&P500, the training dataset includes the period from January 2, 2015 to May 30, 2019, and the validation dataset includes data ranging from May 31, 2019 to July 15, 2020. The test dataset contains the period from July 16, 2020 to November 29, 2021. In STOXX Europe 600, we consider the data from January 5, 2015 to June 2, 2019 as training data. The data from June 3, 2019 to July 19, 2020 are validation data. The data from July 20, 2020 to November 29, 2021 are utilized as test data. The KOSPI data from January 2, 2015 to June 4, 2019 are used as the training dataset, the data from June 5, 2019 to July 15, 2020 are used as the validation dataset, and those from July 16, 2020 to November 30, 2021 are considered as the test dataset.

3.1.1. Explanatory data analysis

We transformed the stock indices to log returns for the data analysis.

rt=log (pt/pt-1)=log (pt)-log (pt-1),

where pt represents the daily price of stock on day t. Before applying SDR methods, we implement a Box-Cox transformation that confers X with a multivariate Gaussian distribution. This transformation is necessary because the linear conditional mean assumption is known to be satisfied when X follows an elliptical distribution. As shown in Figures 8, 9, and 10, the transformed predictor variables follow an elliptical distribution which implies that the linear conditional mean assumption is satisfied.

To present the competitiveness of the DR-LSTM method, we compare it with two statistical models, AR and ADL models. AR and ADL models have been widely applied in stock return prediction, see Idrees et al. (2019), Madziwa et al. (2022), Malladi and Dheeriya (2021) and many others for more details. AR model is

Yt=φ0+k=1lφkYt-k+at,

where at is a serially uncorrelated error process having zero mean and finite variance σa2=var(at). The lag l is specified by minimizing the BIC (bayesian information criterion) for the AR(l) model for the train data. The chosen orders are l = 4 for S&P 500, l = 1 for STOXX Europe 600 and l = 2 for KOSPI. ADL model is

Yt=φ0+k=1lφkYt-k+k=1q1β1kXt-k,1++k=1qpβpkXt-k,p+at,

that additionally considers the p-predictor variables (Xt1, . . ., Xtp). As predictor variables, we consider the original predictor variables (p = 4). We also consider the reduced variables (p = 2) by six dimension reduction methods described in Section 2, called ‘PCA-ADL’, ‘SIR-ADL’, ‘SPCA-ADL’, ‘SSIR-ADL’, ‘KPCA-ADL’, ‘KSIR-ADL’. The lags (l, q1, . . ., qp) of ADL models are selected by the BIC. The selected lags are provided in Table A.1 in Appendix A. AR and ADL models capture the linear structure between the stock return and the predictor variables. A consequence of AR and ADL models neglecting a nonlinear structure in data is that the linear AR, ADL models have worse prediction performance than the hidden-nonlinear LSTM models, as will be shown in Table 1 and Table 2.

To utilize AR, ADL models, it is necessary to satisfy the stationary condition. A stationary time series implies that the statistical properties of time series data do not depend on the timeframe. In other words, stationary time series data do not have a trend or seasonality. This can be presented as a constant mean and variance over time, except for March 2020 when the stock market suffered from turbulence due to Covid-19. The return variable appears to be stationary and this is presented in Figure 11. Several statistical tests are available to evaluate assumptions about a stationary state. One of the most widely used tests is the KPSS (Kwiatkowski-Phillips-Schmidt-Shin) test Kwiatkowski et al. (1992), which we employ in this paper. The null hypothesis for this test is that the time series is stationary. In our datasets, since the p-value of the KPSS test for the return variable is greater than 0.1, we cannot reject the stationary assumption and proceed to further analysis under a stationary condition.

3.1.2. Data analysis

After applying linear PCA, the first two principal components explain 79.98% of the variation in KOSPI, 78.59% of that in S&P500, and 77.56% of that in STOXX Europe 600 variation. Thus, we use the first two principal components of PCA, SPCA, and KPCA to predict the returns using LSTM. The correlation plots between the first two principal components and returns for S&P 500, STOXX Europe 600, and KOSPI are presented in Figures from 12 to 14.

In the supervised DR method, we extract two sufficient predictors from SIR, SSIR, and KSIR and use these as predictors for LSTM to predict returns. The relationship between the first two sufficient predictors and returns for S&P 500, STOXX Europe 600, and KOSPI are visualized in Figures from 15 to 17.

There are a number of hyperparameters to optimize the predictability of LSTM. We constructed three-layer LSTM networks with a learning rate of 0.001, timestep of 5, input dimension of 2, hidden dimension of 16, and output dimension of 1. In terms of the loss function, we use mean squared error as a loss function. Adaptive moment estimation, which is a combination of RMSprop and bias-corrected momentum, was used as an optimizer. Early stopping was used to avoid overfitting the model and terminate model training when the loss on the validation data did not decrease for more than 20 epochs. We used Keras to implement LSTM.

The performance of the prediction was evaluated using root mean squared error (RMSE) and mean absolute error (MAE). These can be formulated as

RMSE=1ni=1n(y^i-yi)2,MAE=1ni=1ny^i-yi,

where i is a predicted value and yi is a true observation and n = 343.

Figures 18, 19, and 20 compare the prediction performance between six DR-LSTM methods. Figure 18 shows the prediction values for S&P 500 from July 15, 2020 to November 29, 2021. Figure 19 illustrates the forecast of STOXX Europe 600 from July 20, 2020 to November 29, 2021. Figure 20 represents the prediction of KOSPI from July 16, 2020 to November 30, 2021. Here, the yellow line indicates the test dataset, and the green line represents our prediction found on the DR-LSTM framework. These figures indicate that the forecasted returns show low volatility in our methods with high accuracy.

Table 1 presents RMSE with respect to a variety of the dimension reduction methods. The bold type is for the best predictive model for the stock returns of S&P500, STOXX Europe 600, and KOSPI, respectively. As shown in Table 1, our six DR-LSTM methods have a lower RMSE than the original LSTM for all of S&P500, STOXX Europe 600, and KOSPI. This implies that a variety of dimension reduction methods can be used as a supplemental tools to substantially improve the predictive power of LSTM. In particular, S&P500 exhibits the best prediction results when PCA-LSTM method is used. The usage of the SPCA-LSTM method results in the smallest RMSE for STOXX Europe 600. Furthermore, SIR method exhibits the most accurate results for KOSPI. This is also shown in ADL models. ADL models with the reduced predictor variables give better predictive performance than ADL model with the original predictor variables for all considered stock returns. Moreover, DR-LSTM outperform all ADL models with the original and the reduced predictor variables and AR model. The findings indicate that addressing the nonlinear structure of the return and the predictors by DR-LSTM methods brings better predictive performance than neglecting nonlinear structure by the linear AR, ADL models.

Table 2 shows the result of MAEs based on the utilized the dimension reduction methods. Our DR-LSTM framework reports remarkably lower MAE values than AR, ADL models. DR-LSTM framework can exhibit significant gain in accuracy compared to the original LSTM. With regard to S&P500, SIR-LSTM method exhibits the lowest MAE values. STOXX Europe 600 yields the best prediction results when using SSIR-LSTM method. KOSPI shows the most promising result with the SIR-LSTM method.

4. Discussion

In recent decades, there has been growing interest in predicting stock returns using an artificial neural network. In this context, because of the dataset’s time frame feature, the use of RNN has been widely discussed. However, RNNs are limited in their effectiveness due to their long-term vanishing gradient problem. To overcome this challenge, researchers have developed LSTM, which has found widespread application in forecasting financial datasets. In this paper, we incorporate the idea of dimension reduction into LSTM to enhance predictive performance. The dimension reduction method first simplifies the input data to prevent overfitting. The model’s configuration, including the number and size of layers, is determined through empirical analysis. To further prevent overfitting and improve generalizability, a dropout technique is employed after each LSTM layer. Note that LSTM also conducts dimension reduction in the hidden layer. The model concludes with a fully connected layer. The DR-LSTM model effectively filters and updates relevant data information, leveraging the forgetting and input gates of LSTM for enhanced prediction accuracy in associated data. In particular, we use PCA and SIR as linear, SPCA and SSIR as sparse, and KPCA and KSIR as nonlinear dimension reduction methods. Using datasets on the S&P 500, STOXX Europe 600, and KOSPI dataset from January, 2015 to November, 2021, we demonstrate the superior performance of our DR-LSTM approaches over that of a traditional linear time series methods, AR, ADL, and the cutting edge method, LSTM, in terms of predictive accuracy. This is because our product of dimension reduction methods can capture the most variation in predictor variables or explain all of the information of the conditional distribution of Y given X. Reducing dimensions from 5 variables to 1 or 2 may not significantly enhance efficiency in typical neural network settings where data is abundant. Despite efforts to gather more predictors, our data source is limited in the diversity of predictors, restricting us to only 5. However, in our specific case, this dimension reduction could be beneficial. We are working with a relatively small dataset with closely correlated data, which further reduces the effective size of our dataset. This limitation prevents us from using large-scale recurrent neural networks. By implementing dimension reduction techniques, we are able to work with an input size of 1 to 2 variables. This approach significantly reduces the number of parameters needed for reasonable predictive accuracy, compared to using an input size of 5. Furthermore, repeated dimension reduction steps in the DR-LSTM framework might lead to over-smoothing problems. However, because our DR-LSTM predicts the averaged index value, the DR-LSTM models show improved prediction accuracy in terms of RMSE and MAE compared to the traditional LSTM, but they are insufficient to effectively capture market volatility for use in real-world investments.

In future research, we can consider broader dimension reduction methods such as principal Hessian direction (Li, 1992), directional regression (Li, 2007), and minimum average variance estimation (Xia et al., 2009). We can further consider post dimension reduction inference (Kim et al., 2020, Kim, 2023). Variable selection methods such as LASSO (Tibshirani, 1996) and SCAD (Fan and Li, 2001) can also be used to specify a predictor variable that improves the performance of LSTM. Furthermore, instead of LSTM, we can use a nonlinear time series or an ensemble approach in the prediction step, but we leave these possibilities for future research.

Appendix A: The lags of ADL models

Table A.1 provides the lags (l, q1, . . ., qp) of the ADL models which are selected by the BIC.

Table A.1 : Lags (l, q1, . . ., qp) of the ADL models

Data S&P500 STOXX600 KOSPI

Methods
ADL (2, 1, 1, 1, 1) (1, 1, 1, 1, 1) (1, 5, 1, 1, 1)
PCA-ADL (1, 2, 1) (1, 1, 1) (1, 1, 1)
SIR-ADL (1, 1, 1) (1, 1, 1) (2, 2, 1)
SPCA-ADL (4, 3, 1) (1, 1, 1) (3, 3, 1)
SSIR-ADL (1, 1, 1) (1, 1, 1) (1, 1, 1)
KPCA-ADL (1, 1, 1) (1, 1, 1) (2, 1, 1)
KSIR-ADL (2, 1, 1) (1, 1, 1) (1, 1, 1)
Figures
Fig. 1. Forecasting procedure of DR-LSTM framework.
Fig. 2. Description of an RNN process.
Fig. 3. Description of a sequential LSTM process.
Fig. 4. Predictor variables in S&P500 dataset.
Fig. 5. Predictor variables in STOXX Europe 600 dataset.
Fig. 6. Predictor variables in KOSPI dataset.
Fig. 7. Structures of the dataset.
Fig. 8. Scatterplot of Box-Cox transformed predictor variables of S&P500.
Fig. 9. Scatterplot of Box-Cox transformed predictor variables of EUROSTOXX 600.
Fig. 10. Scatterplot of Box-Cox transformed predictor variables of KOSPI.
Fig. 11. Time series plot of the response variable (return) for stock indicators.
Fig. 12. Correlation plots between the first two principal components from PCA and a return for stock indicators.
Fig. 13. Correlation plots between the first two principal components from KPCA and a return for stock indicators.
Fig. 14. Correlation plots between the first two principal components from SPCA and a return for stock indicators.
Fig. 15. Correlation plots between the first two sufficient predictors from SIR for stock indicators.
Fig. 16. Correlation plots between the first two sufficient predictors from KSIR for stock indicators.
Fig. 17. Correlation plots between the first two sufficient predictors from SSIR for stock indicators.
Fig. 18. A prediction result of S&P500.
Fig. 19. A prediction result of STOXX Europe 600.
Fig. 20. A prediction result of KOSPI.
TABLES

Table A.1

Lags (l, q1, . . ., qp) of the ADL models

DataS&P500STOXX600KOSPI

Methods
ADL(2, 1, 1, 1, 1)(1, 1, 1, 1, 1)(1, 5, 1, 1, 1)
PCA-ADL(1, 2, 1)(1, 1, 1)(1, 1, 1)
SIR-ADL(1, 1, 1)(1, 1, 1)(2, 2, 1)
SPCA-ADL(4, 3, 1)(1, 1, 1)(3, 3, 1)
SSIR-ADL(1, 1, 1)(1, 1, 1)(1, 1, 1)
KPCA-ADL(1, 1, 1)(1, 1, 1)(2, 1, 1)
KSIR-ADL(2, 1, 1)(1, 1, 1)(1, 1, 1)

Table 1

Comparison of RMSE between DR-LSTM framework, AR(1), DR-ADL and the original LSTM

DataS&P500STOXX600KOSPI

Methods
AR0.009050.008700.01079
ADL0.008890.008680.01094
PCA-ADL0.008940.008700.01078
SIR-ADL0.009020.008720.01085
SPCA-ADL0.008890.008660.01077
SSIR-ADL0.009020.008720.01070
KPCA-ADL0.008940.008700.01082
KSIR-ADL0.008930.008670.01092
LSTM0.008890.008690.01074
PCA-LSTM0.008780.008680.01067
SIR-LSTM0.008800.008620.01054
SPCA-LSTM0.008790.008610.01057
SSIR-LSTM0.008880.008630.01065
KPCA-LSTM0.008870.008650.01071
KSIR-LSTM0.008880.008640.01072

Table 2

Comparison of MAE between DR-LSTM framework, AR(1), ADL and the original LSTM

DataS&P500STOXX600KOSPI

Methods
AR0.006770.006260.00840
ADL0.006670.006290.00855
PCA-ADL0.006710.006240.00845
SIR-ADL0.006750.006270.00859
SPCA-ADL0.006700.006260.00837
SSIR-ADL0.006740.006270.00842
KPCA-ADL0.006660.006250.00841
KSIR-ADL0.006650.006230.00858
LSTM0.006610.006190.00830
PCA-LSTM0.006570.006220.00822
SIR-LSTM0.006210.006240.00811
SPCA-LSTM0.006510.006130.00817
SSIR-LSTM0.006620.006110.00816
KPCA-LSTM0.006590.006160.00828
KSIR-LSTM0.006610.006180.00827

References
  1. Aizerman A (1964). Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control, 25, 821-837.
  2. Aronszajn N (1950). Theory of reproducing kernels. Transactions of the American Mathematical Society, 68, 337-404.
    CrossRef
  3. Boser BE, Guyon IM, and Vapnik VN (1992). A training algorithm for optimal margin classifiers. Proceedings of the Fifth Annual Workshop on Computational Learning Theory. , 144-152.
    CrossRef
  4. Chen K, Zhou Y, and Dai F (2015). A LSTM-based method for stock returns prediction: A case study of China stock market. Proceedings of 2015 IEEE International Conference on Big Data (big data) Santa Clara, CA. , 2823-2824.
    CrossRef
  5. Fan J and Li R (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348-1360.
    CrossRef
  6. Fong Y and Xu J (2021). Forward stepwise deep autoencoder-based monotone nonlinear dimensionality reduction methods. Journal of Computational and Graphical Statistics, 30, 519-529.
    Pubmed KoreaMed CrossRef
  7. Gao S, Han L, Luo D, Liu G, Xiao Z, Shan G, Zhang Y, and Zhou W (2021). Modeling drug mechanism of action with large scale gene-expression profiles using GPAR, an artificial intelligence platform. BMC Bioinformatics, 22, 1-13.
    Pubmed KoreaMed CrossRef
  8. Hochreiter S and Schmidhuber J (1997). Long short-term memory. Neural Computation, 9, 1735-1780.
    Pubmed CrossRef
  9. Idrees SM, Alam MA, and Agarwal P (2019). A prediction approach for stock market volatility based on time series data. IEEE Access, 7, 17287-17298.
    CrossRef
  10. Jolliffe IT (1986). Generalizations and adaptations of principal component analysis. Principal Component Analysis, (pp. 223-234), Springer.
    CrossRef
  11. Kwiatkowski D, Phillips PC, Schmidt P, and Shin Y (1992). Testing the null hypothesis of stationarity against the alternative of a unit root: How sure are we that economic time series have a unit root?. Journal of Econometrics, 54, 159-178.
    CrossRef
  12. Kim K, Li B, Yu Z, and Li L (2020). On post dimension reduction statistical inference. The Annals of Statistics, 48, 1567-1592.
    CrossRef
  13. Kim K (2023). A note on sufficient dimension reduction with post dimension reduction statistical inference. AStA Advances in Statistical Analysis, 1-21.
    CrossRef
  14. Li KC (1991). Sliced inverse regression for dimension reduction. Journal of the American Statistical Association, 86, 316-327.
    CrossRef
  15. Li B (2018). Sufficient Dimension Reduction: Methods and Applications with R, New York, CRC Press.
    CrossRef
  16. Li L (2007). Sparse sufficient dimension reduction. Biometrika, 94, 603-613.
    CrossRef
  17. Li KC (1992). On principal Hessian directions for data visualization and dimension reduction: Another application of Stein셲 lemma. Journal of the American Statistical Association, 87, 1025-1039.
    CrossRef
  18. Li B and Wang S (2007). On directional regression for dimension reduction. Journal of the American Statistical Association, 102, 997-1008.
    CrossRef
  19. Madziwa L, Pillalamarry M, and Chatterjee S (2022). Gold price forecasting using multivariate stochastic model. Resources Policy, 76, 102544.
    CrossRef
  20. Malladi RK and Dheeriya PL (2021). Time series analysis of cryptocurrency returns and volatilities. Journal of Economics and Finance, 45, 75-94.
    CrossRef
  21. Matias JM and Reboredo JC (2012). Forecasting performance of nonlinear models for intraday stock returns. Journal of Forecasting, 31, 172-188.
    CrossRef
  22. Ramezanian R, Peymanfar A, and Ebrahimi SB (2019). An integrated framework of genetic network programming and multi-layer perceptron neural network for prediction of daily stock return: An application in Tehran stock exchange market. Applied Soft Computing, 82, 105551.
    CrossRef
  23. Selvin S, Vinayakumar R, Gopalakrishnan EA, Menon VK, and Soman KP (2017). Stock price prediction using LSTM, RNN and CNN-sliding window model. Proceedings of 2017 International Conference on Advances in Computing, Communications and Informatics (icacci) Udupi. , 1643-1647.
    CrossRef
  24. Sujatha KV and Sundaram SM (2010). A combined PCA-MLP model for predicting stock index. Proceedings of the 1st Amrita ACM-W Celebration on Women in Computing in India. , 1-6.
    CrossRef
  25. Sch철lkopf B, Smola A, and M체ller KR (1997). Kernel principal component analysis. Proceedings of International Conference on Artificial Neural Networks Berlin, Heidelberg. , 583-588.
    CrossRef
  26. Tsai CF, Lin YC, Yen DC, and Chen YM (2011). Predicting stock returns by classifier ensembles. Applied Soft Computing, 11, 2452-2459.
    CrossRef
  27. Tibshirani R (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 58, 267-288.
    CrossRef
  28. Tran MN, Nguyen N, Nott D, and Kohn R (2020). Bayesian deep net GLM and GLMM. Journal of Computational and Graphical Statistics, 29, 97-113.
    CrossRef
  29. Wang T, Lei S, Jiang Y, Chang C, Snoussi H, Shan G, and Fu Y (2022). Accelerating temporal action proposal generation via high performance computing. Frontiers of Computer Science, 16, 1-10.
    CrossRef
  30. Wang J, Sun T, Liu B, Cao Y, and Wang D (2018). Financial markets prediction with deep learning. Proceedings of 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA) Orlando, FL. , 97-104.
    CrossRef
  31. Wen Y, Lin P, and Nie X (2020). Research of stock price prediction based on PCA-LSTM model. Proceedings of IOP Conference Series: Materials Science and Engineering Guangzho u, 012109. .
    CrossRef
  32. Wu HM (2008). Kernel sliced inverse regression with applications to classification. Journal of Computational and Graphical Statistics, 17, 590-610.
    CrossRef
  33. Xiao X, Mo H, Zhang Y, and Shan G (2022). Meta-ANN밃 dynamic artificial neural network refined by meta-learning for short-term load forecasting. Energy, 246, 123418.
    CrossRef
  34. Xia Y, Tong H, Li WK, and Zhu L-X (2009). An adaptive estimation of dimension reduction space. Journal of the Royal Statistical Society Series B: Statistical Methodology, 64, 299-346.
    CrossRef
  35. Yin X, Li B, and Cook RD (2008). Successive direction extraction for estimating the central subspace in a multiple-index regression. Journal of Multivariate Analysis, 99, 1733-1757.
    CrossRef
  36. Zou H, Hastie T, and Tibshirani R (2006). Sparse principal component analysis. Journal of Computational and Graphical Statistics, 15, 265-286.
    CrossRef