Time series prediction is a field that has attracted significant attention. Algorithms for time series prediction are widely used in many fields such as weather, electricity, and financial markets. They have been developed by using well-known classic models such as the autoregressive moving average (McLeod and Li, 1983) as well as recurrent neural network (RNN) (Rumelhart
The NARX model has been continuously studied, such as an improved hybrid prediction model (Pham
Research on introducing an attention mechanism has also been actively conducted, and studies using a new RNN method such as an attention-based encoder decoder network (Liu and Lane, 2016) have also been conducted. Research has also been conducted to introduce attention to the NARX model. A hierarchical attention network that selects the hidden state of the related encoder using two attention mechanisms (Yang
In this paper, we propose a two-dimensional attention-based multi-input LSTM (2DA-MILSTM) model that includes the advantages of DA-RNN, 2D-LSTM and MI-LSTM. This model calculates the correlation between the target variable and the exogenous variables, divides the exogenous variables accordingly and then inputs them to MI-LSTM. It also applies two separate attention layers simultaneously using the output value of MI-LSTM and the hidden state of the previous layer. Two kinds of weights are created by applying the attentions to each hidden state and time step. By integrating these two weights, the predicted value can be calculated, and the importance of the hidden state and the time step can be considered to improve the long-term dependency problem and prediction performance. We compare the performance of the proposed model with existing models through prediction experiments on stock price, room temperature and energy data.
The rest of this paper is as follows. In Section 2, we review the existing models, and in Section 3 we explain the detailed parts and configurations of the proposed model in detail. We describe datasets used in the experiment and present results of the experiment in Section 4. Lastly, Section 5 draws the conclusion of this paper.
In neural network, RNNs and LSTM are the most popular model for time series including language and speech data (Huang
Then the values in the hidden and output layers are computed by
LSTM is also similar to RNN but LSTM captures long term dependencies of times series data better than RNN using three gates, input, forget, and output gate in the LSTM cell. In Figure 2,
Then the procedure of LSTM cell using matrix notation is:
For explanation,
The target variable is defined as
The Pearson correlation coefficient is used to measure the correlation between the exogenous variables and the target variable, and
We define “Self”, “Index”, “Positive”, and “Negative” based on
Self:
Index:
where
Positive:
Negative:
where
The MI-LSTM is a model that can extract valuable information from small correlation factors and discard negative noises using extra input gates controlled by convincing factors. This model deviates from the existing LSTM methods and shows that a model having multi-inputs to the LSTM can improve prediction performance. As defined above, MI-LSTM uses four input values: the past value of the target variable, exogenous variables with positive and negative correlation, and an index exogenous variable. The input values are expressed as
and an output value of
Figure 3 is the structure of MI-LSTM. The following attention is applied to
where the matrix
DA-RNN consists of two steps. In the first step it extracts the hidden state of the previous encoder and the relevant exogenous variables in each step by using the input attention. Then, in the second step the model selects the output value of the related decoder using the temporal attention mechanism. This model can select the most relevant exogenous variables as well as adequately capture the long-term dependency of the time series. Figure 4 and Figure 5 show the structure of DA-RNN.
The
where
where
In this section we present the details of the two-dimensional attention-based multi-input LSTM (2DA-MILSTM) model, a new model that includes the advantages of the MI-LSTM and DA-RNN models described in Section 2 and 2D-LSTM. Figure 6 is the structure of the 2DA-MILSTM model.
The exogenous variables are divided using the value of the correlation coefficient, and the output value after inputting it to MI-LSTM is expressed as
where
In this model, as in the existing MI-LSTM model, “Self”, “Positive”, and “Negative” are used as input values of MI-LSTM. However, unlike in the existing model, three factors excluding index variable are input. This is because there is no index exogenous variable in the data we consider in this paper. Therefore, we use the
The forget gate and the output gate of the MI-LSTM remain the same when compared to the original LSTM as shown in the following equations:
where
All input gates are determined by the “Self” variable and the previous hidden state to control the auxiliary factors as in the following equations:
where
where
where
where
where
where
The structure of MI-LSTM is the same as the above equations. In this paper, the
It is also intended to be expressed simply as:
and we get an output value
Considering the
where
A new input vector can be extracted by multiplying the weight derived through the H-attention mechanism by the input value.
Using this new input vector, the hidden state at time
where
In order to predict the output
where
Note that the context vector
Once we get the context vector, we can obtain
where [
The newly calculated
where
where [
We conduct an experiment using two types of data sets to check whether the proposed model has good performance in stock price data as well as in time series data in other fields. A stock price prediction experiment is conducted using KOSPI 200 data. A general prediction experiment on time series data in various fields other than stock prices is conducted using temperature data and energy data.
KOSPI 200: This is a data set where, among the listed companies on KOSPI, 200 stocks considered to represent KOSPI due to their large market capitalizations and large trading volume are selected, and used to predict the stock index. This is daily data measured for a total of 2158 days from August 23, 2011 to May 29, 2020. In our experiment, the KOSPI 200 index is used as the target variable, and the stock prices for 165 companies are used as exogenous variables (Table 1). These companies are selected so that there are no missing values in the data for companies such as Samsung Electronics, SK Hynix, Samsung Biologics, Naver, Hyundai Motor Company, and LG Chem.
SML 2010 (Zamora-Martinez
Appliances energy (Candanedo
The parameters of our model include the LSTM dimension
In our experiment, in the case of
We consider three different metrics to evaluate the performance of the models in time series prediction experiments. Assuming
Mean absolute percentage error (MAPE): a value obtained by taking the average of the absolute percentage of the residual
Root mean square error (RMSE): a value obtained by taking the root of the average of the squares of the residual
Mean absolute error (MAE): average of all absolute errors
Note that the MSE, RMSE, and MAE are calculated using normalized data due to huge difference in amount among different time series data.
To compare the performance of 2DA-MILSTM, we conduct a comparative experiment with two models, DA-RNN and MI-LSTM. Also, not based on neural network model, we conduct the classical time series model in statistics field, ARIMA. Designed by Hyndman and Khandakar, auto.arima in forecast package for R (Hyndman and Benítez, 2016), automatically determines the orer of ARIMA model using step-wise algorithm for forecasting. The experiment was conducted using KOSPI 200 data (stock price data), SML 2010 data (room temperature data), and appliances energy data (energy data).
The time series prediction results of 2DA-MILSTM and three existing methods over the three datasets are shown in Table 4. In Table 4, we observe that the MAPE, RMSE and MAE of ARIMA are larger than those of the other neural net-based methods. There are two interesting exceptions for such observation, one of which is MAPE for KOSPI 200 dataset. ARIMA outperforms the other methods for the data in terms of MAPE. The possible reasons are (1) The stock price by nature reveals higher volatility as its level goes up, (2) The MAPE puts a heavier penalty on negative errors than on positive errors. These reasons are because of the high volatility at high level of stock price it is difficult to predict it accurately and there are possibly more positive errors at high level. Subsequently, it can be small even though a model does not predict well at high level since the MAPE puts a lighter penalty on positive errors.
We can also see that the proposed model outperforms the other methods for all three datasets in terms of all three criteria, except the case decsribed above. This suggests that it might be beneficial to consider the correlation between exogenous variables and target variable by introducing MI-LSTM. We can consider the importance of exogenous variables using the input attention; however, DA-RNN just downgrades the importance of variables with negative correlation with the target variable. With integration of the MI-LSTM as well as two attention mechanisms, the proposed 2DA-MILSTM achieves the best MSE, RMSE and MAE across three datasets since it uses the MI-LSTM to consider the correlation of exogenous variables effectively as well as employs H- and T-attention mechanisms to capture relevant hidden features across all time steps as well as potential output information.
For visual comparison, we show the prediction results of the three neural net-based methods over certain ranges of the three datasets in Figures 8
Figures 8
Figures 11
Figures 14
In this paper, we proposed a two-dimensional attention-based MI-LSTM (2DA-MILSTM) motivated by the advantages of MI-LSTM, which effectively uses exogenous variables with low correlation, and DA-RNN and 2D-ALSTM, in which an attention mechanism is introduced twice. The proposed method was applied to KOSPI 200, SML 2010 and appliance energy datasets. The results show that our proposed 2DA-MILSTM can outperform existing methods for time series prediction.
The proposed model, 2DA-MILSTM, can selectively use output value information and temporal information by introducing H-attention and T-attention. H-attention can better capture potential information of the output value, and T-attention can capture temporal information. 2DA-MILSTM, which uses the two attention mechanisms selects the output value with accurate information as well as captures temporal information appropriately to improve predictive performance. Better forecasting depends on capturing long- and short-term dependencies. A longer forecasted time series increases variability. While MI-LSTM capturing the dependencies, T-attention and H-attention make the output values of MI-LSTM stable considering temporal and potential values. This means that this task can control the increasing variability of time series data because the two attention mechanism weights control both temporal and potential information.
The stock price has high volatility at high level. Therefore, if one uses the log-return of the stock price rather than the original scale, then the prediction at high level might be more accurate; consequently, the performance of the model can be improved. However, in the literature of NARX field, there are several papers where the stock prices are used as the original scale, and two of which are the motivation for our research, DA-RNN and MI-LSTM. Therefore, we believe that using the same (original) scale for the stock price makes it easier for readers to compare our proposed model with the two models.
As a follow-up study, visualization method for H-attention and T-attention weights could be considered to check where the two attention mechanisms concentrate among exogenous variables, time stamp and logits from MI-LSTM each. In addition, non-linear relation between exogenous variables and target variable should be considered. When applying non-linear relations, it can be expected that the proposed 2DA-MILSTM model receives additional information between exogenous variables and the target variables. Also, various ranges of the hyperparameters should be experimented with while considering the relationship between hyperparameters.
Variables of KOSPI 200 data (https://kr.investing.com/)
Target variable | KOSPI200 index |
---|---|
Exogenous variable | 165 among 200 companies such as Samsung Electronics, SK Hynix, Samsung Biologics, Naver, Hyundai Motors, and LG Chem. |
Variables of SML 2010 data (https://archive.ics.uci.edu/ml/datasets/SML2010)
Target variable | Indoor temperature (dining room) |
---|---|
Exogenous variable | Indoor temperature (room) |
Weather forecast temperature | |
Carbon dioxide in ppm (dining room) | |
Carbon dioxide in ppm (room) | |
Relative humidity (dining room) | |
Relative humidity (room) | |
Lighting (dining room) | |
Lighting (room) | |
Rain (Percentage of the last 15 minutes in which rain was detected) | |
Sun dusk | |
Wind (m/s) | |
Sun light in west facade | |
Sun light in east facade | |
Enthalpic motor 2, 0, or 1 | |
Enthalpic motor turbo, 0 or 1 | |
Outdoor temperature |
Variables of Appliances energy data (https://archive.ics.uci.edu/ml/datasets/Appliances+energy+prediction)
Target variable | Appliances, energy use in Wh |
---|---|
Exogenous variable | lights, energy use of light fixtures in the house in Wh |
T1, Temperature in kitchen area, in Celsius | |
RH 1, Humidity in kitchen area, in % | |
T2, Temperature in living room area, in Celsius | |
RH 2, Humidity in living room area, in % | |
T3, Temperature in laundry room area | |
RH 3, Humidity in laundry room area, in % | |
T4, Temperature in office room, in Celsius | |
RH 4, Humidity in office room, in % | |
T5, Temperature in bathroom, in Celsius | |
RH 5, Humidity in bathroom, in % | |
T6, Temperature outside the building (north side), in Celsius | |
RH 6, Humidity outside the building (north side), in % | |
T7, Temperature in ironing room, in Celsius | |
RH 7, Humidity in ironing room, in % | |
T8, Temperature in teenager room 2, in Celsius | |
RH 8, Humidity in teenager room 2, in % | |
T9, Temperature in parents room, in Celsius | |
RH 9, Humidity in parents room, in % | |
To, Temperature outside (from Chievres weather station), in Celsius | |
Pressure (from Chievres weather station), in mm Hg | |
RH out, Humidity outside (from Chievres weather station), in % | |
Wind speed (from Chievres weather station), in m/s | |
Visibility (from Chievres weather station), in km | |
Tdewpoint (from Chievres weather station), in Celsius | |
rv1, Random variable 1, nondimensional | |
rv2, Random variable 2, nondimensional |
Time series prediction results over the KOSPI 200 dataset, SML 2010 dataset and Appliances energy dataset (best performance displayed in
Model | KOSPI 200 dataset | SML 2010 dataset | Appliances energy dataset | ||||||
---|---|---|---|---|---|---|---|---|---|
MAPE | RMSE | MAE | MAPE | RMSE | MAE | MAPE | RMSE | MAE | |
ARIMA | 29.696 | 23.791 | 19.440 | 3.680 | 3.098 | 49.855 | 97.498 | 53.859 | |
DA-RNN | 10.982 | 2.647 | 2.664 | 16.872 | 11.272 | 1.148 | 32.883 | 32.488 | 6.049 |
MI-LSTM | 10.182 | 0.420 | 0.556 | 5.380 | 0.240 | 0.321 | 7.120 | 0.562 | 1.243 |
2DA-MILSTM | 9.100 |