Deep learning methods have been developed, used in various fields, and they have shown outstanding performances in many cases. Many studies predicted a daily stock return, a classic example of time-series data, using deep learning methods. We also tried to apply deep learning methods to Korea’s stock market data. We used Korea’s stock market index (KOSPI) and several individual stocks to forecast daily returns and directions. We compared several deep learning models with other machine learning methods, including random forest and XGBoost. In regression, long short term memory (LSTM) and gated recurrent unit (GRU) models are better than other prediction models. For the classification applications, there is no clear winner. However, even the best deep learning models cannot predict significantly better than the simple base model. We believe that it is challenging to predict daily stock return data even if we use the latest deep learning methods.
Deep learning is one of the most popular machine learning methods. It can be applied to a wide range of data, including images, texts, and audio. Various deep learning methods have been successful in many fields, such as computer vision (Voulodimos
Recently forecasting time series data using deep learning methods are very popular. Guresen
In this study, we try to predict the daily return rate and direction of several stocks and KOSPI with various prediction models. We compare various data structures and models, such as deep learning models and machine learning models. It is known that forecasting daily return is a challenging task. However, several precedent studies show promising results. We like to check that it is possible to forecast Korean stock market data using deep learning or machine learning methods. We use 1D-CNN, LSTM, GRU, random forest (Breiman, 2001), and XGBoost (Chen and Guestrin, 2016) to make predictions. In Section 2, the dataset we used in this study is described in detail. Section 3 introduces the data structure, data splitting, and models. Section 4 shows the results of each model. Section 5 provides conclusion remarks.
We use KOSPI data and individual stock data from January 4th 2016 to December 30th 2019.
We can see KOSPI data in Figure 1. KOSPI was on an upward trend from 2016 to the beginning of 2018, but has since been on a downward trend. However, there is no apparent seasonality or cyclic behavior over this period. The dataset in this study includes daily return rate and direction of KOSPI. We use a daily return instead of index itself and the
where
where
We also consider some individual stock data in our study. Considering various industries, we try to select representative companies in each industry. We chose Samsung Electronics and Hyundai Motors in the KOSPI market. The reason for choosing Samsung Electronics is that it is the most representative South Korean company with the highest market capitalization. Hyundai Motors is the motor industry’s leading position and global corporation. Also, we chose Nexon GT and Leeno Industrial in the KOSDAQ market. Nexon GT is the 10^{th} largest market capitalization company among game companies and Leeno Industrial is semiconductor-related.
We use a sliding window to predict a daily return rate using previous return data. This method is well known in the time-series study. Reflecting this method, we transform the data into the form shown in Figure 2. We transform the data to predict the next day’s return with the last five-day returns. The number of features is equal to the training window size. For example, Figure 2 shows the data structure when the number of features is 5. The goal is to predict the next day’s return with the last
We consider walk forward optimization (Kirkpatrick and Dahlquist, 2010) to divide data into train and test set considering the temporal order. This method updates the model for each dataset as it finds the optimal parameters. After transforming the data into the form described in the previous section, we split the training data and the test data. Figure 3 shows that the shaded part represents the training set and the white part represents the test set. The date on it indicates the year and month of the
As we mentioned in the introduction, we consider several prediction models including decision tree-based methods and deep learning methods. We decide to use random forest (RF) and extreme gradient boosting (XGBoost) for the machine learning and 1D-CNN, LSTM, GRU, and combined 1D-CNN and LSTM for deep learning methods. We use two LSTM models with the option
The simple mean and weighted mean of return over the past d days are set as the baseline models. The simple mean of
The weighted mean of
where
For the up/down baseline models, we use the following predicted values. The predicted direction by a simple mean of
The predicted direction by the weighted mean of
To find the best model for daily return prediction, we compare the average of 36 test RMSE for each model with different feature numbers. We use the option
Table 1 shows that the LSTM 2 model shows the best performance in KOSPI with the past 30 days return as a feature set. However, the performance differences among the models are minimal. If we compare the best model to the baseline model, the performance difference is about 6%~20%. Figure 4 shows the KOSPI return rates and the predicted return rates for the baseline model (left) and the best model (right). The blue line is the observed values, and the red line is the fitted values. There is little visual difference between the two cases, and the return rate is predicted to be closer to zero in most cases. The standard deviation of the predicted value is about 1.19 × 10^{−3}, which is about seven times smaller than the standard deviation of true data. This prediction happens to almost all models in our analysis. Therefore we can see that it is very challenging to predict the return rate.
Tables 2
We compare the classification models by the average of 36 test set accuracies for each model with different feature numbers. We randomly choose 20% of observations for each training set as validation data to find the best models as we did in the previous section.
Table 6 shows that the 1D-CNN with LSTM model gives the best performance with the past five days return as a feature set from KOSPI data. If we compare the best model to the baseline model, the performance difference is about 5%~7%. However, it does not increase the accuracy that much. Tables 7
This test is performed by determining whether the confidence interval for
Since it does not contain 0.5, the null hypothesis is rejected at a significance level of 95%. Therefore, we conclude that the best performing method is significantly different from the random guess.
As we can see from the previous section for the regression models, some models performed worse than the baseline models in some cases. We also consider predicted direction using the predicted values of regression models. If the predicted value is greater than 0, the predicted direction is defined up, otherwise it is defined down. The best accuracy for KOSPI comes out to 0.547. Despite these different attempts to predict the classification, the performance is not good in other datasets.
In this paper, we study the performance of deep learning and machine learning methods to predict the daily return rate and direction. Comparing various prediction methods, we can see that deep learning methods show better performance than machine learning methods in forecasting market values. However, the difference between deep learning and machine learning methods are insignificant. For example, the RMSE for best deep learning for KOSPI return rate is 0.0074, and that of the best machine learning method is 0.0076. The prediction accuracy for the best deep learning method for KOSPI is 0.551, and that of the best machine learning method is 0.540. In regression, results show that the LSTM and GRU achieve better forecasting performance than all other models. There is no clear winner for the classification cases.
However, it is not significantly improved over the baseline model. Although we do not include other prediction models’ results, we considered different training and test duration setup. we also used the signs of the predicted values of the regression models in our study. However, the performance is worse than that of the classification model. Deep learning methods show good performances in many fields especially for image and text data but it is very hard to find a deep learning model with good performance for the stock data in terms of prediction accuracy. Gu
RMSE of daily return rate for KOSPI
Methods | |||
---|---|---|---|
5 | 10 | 30 | |
Mean | 0.0084 | 0.0081 | 0.0079 |
W.mean | 0.0092 | 0.0092 | 0.0092 |
1D-CNN | 0.0111 | 0.0113 | 0.0153 |
LSTM 1 | 0.0074 | 0.0075 | 0.0075 |
LSTM 2 | 0.0074 | 0.0075 | |
1D-CNN + LSTM | 0.0089 | 0.0086 | 0.0086 |
GRU | 0.0075 | 0.0075 | 0.0075 |
RF | 0.0079 | 0.0078 | 0.0076 |
XGB | 0.0085 | 0.0085 | 0.0084 |
RMSE of daily return rate for Samsung Electronics
Methods | |||
---|---|---|---|
5 | 10 | 30 | |
Mean | 0.0172 | 0.0164 | 0.0158 |
W.mean | 0.0186 | 0.0186 | 0.0186 |
1D-CNN | 0.0159 | 0.0171 | 0.0205 |
LSTM 1 | 0.0154 | 0.0154 | |
LSTM 2 | 0.0155 | 0.0155 | 0.0154 |
1D-CNN + LSTM | 0.0157 | 0.0160 | 0.0163 |
GRU | 0.0155 | 0.0155 | 0.0154 |
RF | 0.0163 | 0.0163 | 0.0163 |
XGB | 0.0181 | 0.0181 | 0.0178 |
RMSE of daily return rate for Hyundai Motors
Methods | |||
---|---|---|---|
5 | 10 | 30 | |
Mean | 0.0193 | 0.0186 | 0.0180 |
W.mean | 0.0186 | 0.0216 | 0.0216 |
1D-CNN | 0.0176 | 0.0212 | 0.0226 |
LSTM 1 | 0.0174 | 0.0173 | 0.0174 |
LSTM 2 | 0.0173 | 0.0174 | |
1D-CNN + LSTM | 0.0177 | 0.0179 | 0.0177 |
GRU | 0.0174 | 0.0173 | 0.0175 |
RF | 0.0182 | 0.0181 | 0.0179 |
XGB | 0.0210 | 0.0210 | 0.0198 |
RMSE of daily return rate for Nexon GT
Methods | |||
---|---|---|---|
5 | 10 | 30 | |
Mean | 0.0493 | 0.0462 | 0.0454 |
W.mean | 0.0528 | 0.0526 | 0.0526 |
1D-CNN | 0.0366 | 0.0363 | 0.0415 |
LSTM 1 | 0.0362 | 0.0357 | 0.0368 |
LSTM 2 | 0.0367 | 0.0365 | 0.0373 |
1D-CNN + LSTM | 0.0362 | 0.0362 | 0.0373 |
GRU | 0.0359 | 0.0364 | |
RF | 0.0382 | 0.0378 | 0.0372 |
XGB | 0.0451 | 0.0437 | 0.0483 |
RMSE of daily return rate for Leeno Industrial
Methods | |||
---|---|---|---|
5 | 10 | 30 | |
Mean | 0.0221 | 0.0209 | 0.0205 |
W.mean | 0.0254 | 0.0253 | 0.0253 |
1D-CNN | 0.0207 | 0.0233 | 0.0272 |
LSTM 1 | 0.0191 | 0.0190 | 0.0191 |
LSTM 2 | 0.0191 | 0.0191 | 0.0190 |
1D-CNN + LSTM | 0.0199 | 0.0192 | 0.0195 |
GRU | 0.0190 | 0.0195 | |
RF | 0.0198 | 0.0198 | 0.0196 |
XGB | 0.0223 | 0.0220 | 0.0215 |
Prediction accuracy of daily return direction for KOSPI
Methods | |||
---|---|---|---|
5 | 10 | 30 | |
Mean | 0.482 | 0.503 | 0.494 |
W.mean | 0.487 | 0.486 | 0.486 |
1D-CNN | 0.537 | 0.544 | 0.537 |
LSTM 1 | 0.528 | 0.513 | 0.525 |
LSTM 2 | 0.534 | 0.511 | 0.524 |
1D-CNN + LSTM | 0.541 | 0.539 | |
GRU | 0.529 | 0.515 | 0.524 |
RF | 0.540 | 0.539 | 0.528 |
XGB | 0.506 | 0.516 | 0.532 |
Prediction accuracy of daily return direction for Samsung Electronics
Methods | |||
---|---|---|---|
5 | 10 | 30 | |
Mean | 0.484 | 0.501 | 0.501 |
W.mean | 0.531 | 0.529 | 0.529 |
1D-CNN | 0.537 | 0.537 | |
LSTM 1 | 0.528 | 0.513 | 0.525 |
LSTM 2 | 0.518 | 0.510 | 0.506 |
1D-CNN + LSTM | 0.475 | 0.509 | 0.534 |
GRU | 0.515 | 0.502 | 0.501 |
RF | 0.464 | 0.523 | 0.484 |
XGB | 0.480 | 0.526 | 0.501 |
Prediction accuracy of daily return direction for Hyundai Motors
Methods | |||
---|---|---|---|
5 | 10 | 30 | |
Mean | 0.482 | 0.472 | 0.486 |
W.mean | 0.473 | 0.472 | 0.529 |
1D-CNN | 0.519 | 0.524 | 0.537 |
LSTM 1 | 0.563 | 0.567 | 0.525 |
LSTM 2 | 0.569 | 0.564 | |
1D-CNN + LSTM | 0.543 | 0.509 | 0.519 |
GRU | 0.563 | 0.502 | 0.557 |
RF | 0.522 | 0.523 | 0.554 |
XGB | 0.529 | 0.526 | 0.519 |
Prediction accuracy of daily return direction for Nexon GT
Methods | |||
---|---|---|---|
5 | 10 | 30 | |
Mean | 0.471 | 0.499 | 0.490 |
W.mean | 0.492 | 0.492 | 0.492 |
1D-CNN | 0.517 | 0.556 | 0.501 |
LSTM 1 | 0.549 | 0.549 | 0.548 |
LSTM 2 | 0.542 | 0.555 | 0.544 |
1D-CNN + LSTM | 0.527 | 0.539 | |
GRU | 0.555 | 0.562 | 0.554 |
RF | 0.514 | 0.518 | 0.518 |
XGB | 0.547 | 0.500 | 0.497 |
Prediction accuracy of daily return direction for Leeno Industrial
Methods | |||
---|---|---|---|
5 | 10 | 30 | |
Mean | 0.486 | 0.506 | 0.476 |
W.mean | 0.487 | 0.487 | 0.487 |
1D-CNN | 0.514 | 0.501 | 0.525 |
LSTM 1 | 0.536 | 0.499 | 0.508 |
LSTM 2 | 0.515 | 0.492 | 0.510 |
1D-CNN + LSTM | 0.536 | 0.494 | 0.537 |
GRU | 0.503 | 0.491 | 0.518 |
RF | 0.531 | 0.486 | |
XGB | 0.524 | 0.511 | 0.506 |