A rainfall forecasting method using machine learning models and its application to the Fukuoka city case

S. Monira Sumi; M. Faisal Zaman; Hideo Hirose

International Journal of Applied Mathematics and Computer Science (2012)

  • Volume: 22, Issue: 4, page 841-854
  • ISSN: 1641-876X

Abstract

top
In the present article, an attempt is made to derive optimal data-driven machine learning methods for forecasting an average daily and monthly rainfall of the Fukuoka city in Japan. This comparative study is conducted concentrating on three aspects: modelling inputs, modelling methods and pre-processing techniques. A comparison between linear correlation analysis and average mutual information is made to find an optimal input technique. For the modelling of the rainfall, a novel hybrid multi-model method is proposed and compared with its constituent models. The models include the artificial neural network, multivariate adaptive regression splines, the k-nearest neighbour, and radial basis support vector regression. Each of these methods is applied to model the daily and monthly rainfall, coupled with a pre-processing technique including moving average and principal component analysis. In the first stage of the hybrid method, sub-models from each of the above methods are constructed with different parameter settings. In the second stage, the sub-models are ranked with a variable selection technique and the higher ranked models are selected based on the leave-one-out cross-validation error. The forecasting of the hybrid model is performed by the weighted combination of the finally selected models.

How to cite

top

S. Monira Sumi, M. Faisal Zaman, and Hideo Hirose. "A rainfall forecasting method using machine learning models and its application to the Fukuoka city case." International Journal of Applied Mathematics and Computer Science 22.4 (2012): 841-854. <http://eudml.org/doc/244573>.

@article{S2012,
abstract = {In the present article, an attempt is made to derive optimal data-driven machine learning methods for forecasting an average daily and monthly rainfall of the Fukuoka city in Japan. This comparative study is conducted concentrating on three aspects: modelling inputs, modelling methods and pre-processing techniques. A comparison between linear correlation analysis and average mutual information is made to find an optimal input technique. For the modelling of the rainfall, a novel hybrid multi-model method is proposed and compared with its constituent models. The models include the artificial neural network, multivariate adaptive regression splines, the k-nearest neighbour, and radial basis support vector regression. Each of these methods is applied to model the daily and monthly rainfall, coupled with a pre-processing technique including moving average and principal component analysis. In the first stage of the hybrid method, sub-models from each of the above methods are constructed with different parameter settings. In the second stage, the sub-models are ranked with a variable selection technique and the higher ranked models are selected based on the leave-one-out cross-validation error. The forecasting of the hybrid model is performed by the weighted combination of the finally selected models.},
author = {S. Monira Sumi, M. Faisal Zaman, Hideo Hirose},
journal = {International Journal of Applied Mathematics and Computer Science},
keywords = {rainfall forecasting; machine learning; multi-model method; pre-processing; model ranking},
language = {eng},
number = {4},
pages = {841-854},
title = {A rainfall forecasting method using machine learning models and its application to the Fukuoka city case},
url = {http://eudml.org/doc/244573},
volume = {22},
year = {2012},
}

TY - JOUR
AU - S. Monira Sumi
AU - M. Faisal Zaman
AU - Hideo Hirose
TI - A rainfall forecasting method using machine learning models and its application to the Fukuoka city case
JO - International Journal of Applied Mathematics and Computer Science
PY - 2012
VL - 22
IS - 4
SP - 841
EP - 854
AB - In the present article, an attempt is made to derive optimal data-driven machine learning methods for forecasting an average daily and monthly rainfall of the Fukuoka city in Japan. This comparative study is conducted concentrating on three aspects: modelling inputs, modelling methods and pre-processing techniques. A comparison between linear correlation analysis and average mutual information is made to find an optimal input technique. For the modelling of the rainfall, a novel hybrid multi-model method is proposed and compared with its constituent models. The models include the artificial neural network, multivariate adaptive regression splines, the k-nearest neighbour, and radial basis support vector regression. Each of these methods is applied to model the daily and monthly rainfall, coupled with a pre-processing technique including moving average and principal component analysis. In the first stage of the hybrid method, sub-models from each of the above methods are constructed with different parameter settings. In the second stage, the sub-models are ranked with a variable selection technique and the higher ranked models are selected based on the leave-one-out cross-validation error. The forecasting of the hybrid model is performed by the weighted combination of the finally selected models.
LA - eng
KW - rainfall forecasting; machine learning; multi-model method; pre-processing; model ranking
UR - http://eudml.org/doc/244573
ER -

References

top
  1. Abrahart, R.J. and See, L. (2002). Multi-model data fusion for river flow forecasting: An evaluation of six alternative methods based on two contrasting catchments, Hydrology and Earth System Sciences 6(4): 655-670. 
  2. Baruque, B., Porras, S. and Corchado, E. (2011). Hybrid classification ensemble using topology-preserving clustering, New Generation Computing 29(3): 329-344. 
  3. Chalimourda, A., Schölkopf, B. and Smola, A.J. (2004). Experimentally optimal ν in support vector regression for different noise models and parameter settings, Neural Networks: The Official Journal of the International Neural Network Society 17(1): 127-41. Zbl1072.68541
  4. Cherkassky, V. and Ma, Y. (2004). Practical selection of SVM parameters and noise estimation for SVM regression, Neural Networks: The Official Journal of the International Neural Network Society 17(1): 113-26. Zbl1075.68632
  5. Coulibaly, P., Haché, M., Fortin, V. and Bobée, B. (2005). Improving daily reservoir inflow forecasts with model combination, Journal of Hydrologic Engineering 10(2): 91. 
  6. Dawson, C.W. and Wilby, R.L. (2001). Hydrological modelling using artificial neural networks, Progress in Physical Geography 25(1): 80-108. 
  7. De Vos, N.J. and Rientjes, T.H.M. (2005). Constraints of artificial neural networks for rainfall-runoff modelling: Trade-offs in hydrological state representation and model evaluation, Hydrology and Earth System Sciences 9(1-2): 111-126. 
  8. Deng, Y.-F., Jin, X. and Zhong, Y.-X. (2005). Ensemble SVR for prediction of time series, Proceedings of the International Conference on Machine Learning and Cybernetics, Guangzhou, China, Vol. 2, pp. 734-748. 
  9. Diebold, F.X. and Mariano, R.S. (1995). Comparing predictive accuracy, Journal of Business & Economic Statistics 13(3): 253-263. 
  10. Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression, Annals of Statistics 32(2): 407-499. Zbl1091.62054
  11. Everingham, Y.L., Smyth, C.W. and Inman-Bamber, N.G. (2009). Ensemble data mining approaches to forecast regional sugarcane crop production, Agricultural and Forest Meteorology 149(3-4): 689-696. 
  12. Fraley, C. and Hesterberg, T. (2009). Least angle regression and LASSO for large datasets, Statistical Analysis and Data Mining 1(4): 251-259. 
  13. Fraser, A.M. and Swinney, H.L. (1986). Independent coordinates for strange attractors from mutual information, Physical Review A 33(2): 1134-1140. Zbl1184.37027
  14. Friedman, J.H. (1991). Multivariate adaptive regression splines, Annals of Statistics 19(1): 1-67. Zbl0765.62064
  15. Gheyas, I.A. and Smith, L.S. (2011). A novel neural network ensemble architecture for time series forecasting, Neurocomputing 74(18): 3855-3864. 
  16. Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference and Prediction, 2nd Edn., Springer, New York, NY. Zbl1273.62005
  17. Hong, W. (2008). Rainfall forecasting by technological machine learning models, Applied Mathematics and Computation 200(1): 41-57. Zbl1164.86025
  18. Hyndman, R.J., Slava R. and Schmidt, D. (2012). forecast: Forecasting functions for time series and linear models, R package version 3.19, http://CRAN.R-project.org/package=forecast. 
  19. Kim, T., Heo, J.-H. and Jeong, C.-S. (2006). Multireservoir system optimization in the Han River basin using multi-objective genetic algorithms, Hydrological Processes 20(9): 2057-2075. 
  20. Kitanidis, P.K. and Bras, R.L. (1980). Real-time forecasting with a conceptual hydrologic model, 2: Application and results, Water Resources Research 16(6): 1034-1044. 
  21. Lee, C.F., Lee, J.C. and Lee, A.C. (2000). Statistics for Business and Financial Economics, 2nd Edn., World Scientific, Singapore. Zbl1281.62225
  22. Legates, D.R. and McCabe, G.J. (1999). Evaluating the use of “goodness-of-fit” measures in hydrologic and hydroclimatic model validation, Water Resources Research 35(1): 233-241. 
  23. Li, P.W. and Lai, E.S.T. (2004). Short-range quantitative precipitation forecasting in Hong Kong, Development 288(1-2): 189-209. 
  24. Myers, R.H. (1990). Classical and Modern Regression with Applications, Duxbury, Boston, MA. 
  25. Nash, J. and Sutcliffe, J. (1970). River flow forecasting through conceptual models, I: A discussion of principles, Journal of Hydrology 10(3): 282-290. 
  26. Newbold, P., Carlson, W. and Thorne, B. (2007). Statistics for Business and Economics, 6th Edn., Prentice Hall, Upper Saddle River, NJ. 
  27. Pucheta, J., Patino, D. and Kuchen, B. (2009). A statistically dependent approach for the monthly rainfall forecast from one point observations, in D. Li and Z. Chunjiang (Eds.), Computer and Computing Technologies in Agriculture II, Volume 2, IFIP Advances in Information and Communication Technology, Vol. 294, Springer, Boston, MA, pp. 787-798. 
  28. Racine, J. (2000). Consistent cross-validatory model-selection for dependent data: hv-block cross-validation, Journal of Econometrics 99(1): 39-61. Zbl1011.62118
  29. Siwek, K., Osowski, S., Szupiluk, R. (2009). Ensemble neural network approach for accurate load forecasting in a power system, International Journal of Applied Mathematics and Computer Science 19(2): 303-315, DOI: 10.2478/v10006-009-0026-2. Zbl1167.93338
  30. Schölkopf, B. and Smola, A.J. (2002). Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, Adaptive Computation and Machine Learning, Vol. 98, MIT Press, Cambridge, MA. 
  31. Schölkopf, B. and Smola, A.J. (2004). A tutorial on support vector regression, Statistics and Computing 14(3): 199-122. 
  32. Shrestha, D.L. and Solomatine, D.P. (2006). Machine learning approaches for estimation of prediction interval for the model output, Neural Networks 19(2): 225-235. Zbl1160.68516
  33. Solomatine, D.P. and Ostfeld, A. (2008). Data-driven modelling: Some past experiences and new approaches, Journal of Hydroinformatics 10(1): 3. 
  34. Sudheer, K.P., Gosain, A.K. and Ramasastri, K.S. (2002). A data-driven algorithm for constructing artificial neural network rainfall-runoff models, Hydrological Processes 16(6): 1325-1330. 
  35. Syed, A.R. (2011). A review of cross validation and adaptive model selection, Statistics, Mathematics Theses, Georgia State University, Arlanta, GA, Paper 99. 
  36. Timmermann, A. (2006). Forecast combinations, in G. Elliott, C. Granger and A. Timmermann (Eds.), Handbook of Economic Forecasting, Elsevier, Amsterdam, Chapter 4, pp. 135-196. 
  37. Wichard, J. (2011). Forecasting the NN5 time series with hybrid models, International Journal of Forecasting 27(3): 700-707. 
  38. Wichard, J. and Ogorzalek, M. (2007). Time series prediction with ensemble models applied to the CATS benchmark, Neurocomputing 70(13-15): 2371-2378. 
  39. Wu, C., Chau, K. and Li, Y. (2008). River stage prediction based on a distributed support vector regression, Journal of Hydrology 358(1-2): 96-111. 
  40. Xiong, L., Shamseldin, A. Y. and Oconnor, K. (2001). A non-linear combination of the forecasts of rainfall-runoff models by the first-order Takagi-Sugeno fuzzy system, Journal of Hydrology 245(1-4): 196-217. 
  41. Yang, Y., Lin, H., Guo, Z. and Jiang, J. (2007). A data mining approach for heavy rainfall forecasting based on satellite image sequence analysis, Computers Geosciences 33(1): 20-30. 
  42. Zaman, M. and Hirose, H. (2011). Classification performance of bagging and boosting type ensemble methods with small training sets, New Generation Computing 29(3): 277-292. 

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.