Improving prediction models applied in systems monitoring natural hazards and machinery

Marek Sikora; Beata Sikora

International Journal of Applied Mathematics and Computer Science (2012)

  • Volume: 22, Issue: 2, page 477-491
  • ISSN: 1641-876X

Abstract

top
A method of combining three analytic techniques including regression rule induction, the k-nearest neighbors method and time series forecasting by means of the ARIMA methodology is presented. A decrease in the forecasting error while solving problems that concern natural hazards and machinery monitoring in coal mines was the main objective of the combined application of these techniques. The M5 algorithm was applied as a basic method of developing prediction models. In spite of an intensive development of regression rule induction algorithms and fuzzy-neural systems, the M5 algorithm is still characterized by the generalization ability and unbeatable time of data model creation competitive with other systems. In the paper, two solutions designed to decrease the mean square error of the obtained rules are presented. One consists in introducing into a set of conditional variables the so-called meta-variable (an analogy to constructive induction) whose values are determined by an autoregressive or the ARIMA model. The other shows that limitation of a data set on which the M5 algorithm operates by the k-nearest neighbor method can also lead to error decreasing. Moreover, three application examples of the presented solutions for data collected by systems of natural hazards and machinery monitoring in coal mines are described. In Appendix, results of several benchmark data sets analyses are given as a supplement of the presented results.

How to cite

top

Marek Sikora, and Beata Sikora. "Improving prediction models applied in systems monitoring natural hazards and machinery." International Journal of Applied Mathematics and Computer Science 22.2 (2012): 477-491. <http://eudml.org/doc/208123>.

@article{MarekSikora2012,
abstract = {A method of combining three analytic techniques including regression rule induction, the k-nearest neighbors method and time series forecasting by means of the ARIMA methodology is presented. A decrease in the forecasting error while solving problems that concern natural hazards and machinery monitoring in coal mines was the main objective of the combined application of these techniques. The M5 algorithm was applied as a basic method of developing prediction models. In spite of an intensive development of regression rule induction algorithms and fuzzy-neural systems, the M5 algorithm is still characterized by the generalization ability and unbeatable time of data model creation competitive with other systems. In the paper, two solutions designed to decrease the mean square error of the obtained rules are presented. One consists in introducing into a set of conditional variables the so-called meta-variable (an analogy to constructive induction) whose values are determined by an autoregressive or the ARIMA model. The other shows that limitation of a data set on which the M5 algorithm operates by the k-nearest neighbor method can also lead to error decreasing. Moreover, three application examples of the presented solutions for data collected by systems of natural hazards and machinery monitoring in coal mines are described. In Appendix, results of several benchmark data sets analyses are given as a supplement of the presented results.},
author = {Marek Sikora, Beata Sikora},
journal = {International Journal of Applied Mathematics and Computer Science},
keywords = {natural hazards monitoring; regression rules; time series forecasting; k-nearest neighbors; -nearest neighbors},
language = {eng},
number = {2},
pages = {477-491},
title = {Improving prediction models applied in systems monitoring natural hazards and machinery},
url = {http://eudml.org/doc/208123},
volume = {22},
year = {2012},
}

TY - JOUR
AU - Marek Sikora
AU - Beata Sikora
TI - Improving prediction models applied in systems monitoring natural hazards and machinery
JO - International Journal of Applied Mathematics and Computer Science
PY - 2012
VL - 22
IS - 2
SP - 477
EP - 491
AB - A method of combining three analytic techniques including regression rule induction, the k-nearest neighbors method and time series forecasting by means of the ARIMA methodology is presented. A decrease in the forecasting error while solving problems that concern natural hazards and machinery monitoring in coal mines was the main objective of the combined application of these techniques. The M5 algorithm was applied as a basic method of developing prediction models. In spite of an intensive development of regression rule induction algorithms and fuzzy-neural systems, the M5 algorithm is still characterized by the generalization ability and unbeatable time of data model creation competitive with other systems. In the paper, two solutions designed to decrease the mean square error of the obtained rules are presented. One consists in introducing into a set of conditional variables the so-called meta-variable (an analogy to constructive induction) whose values are determined by an autoregressive or the ARIMA model. The other shows that limitation of a data set on which the M5 algorithm operates by the k-nearest neighbor method can also lead to error decreasing. Moreover, three application examples of the presented solutions for data collected by systems of natural hazards and machinery monitoring in coal mines are described. In Appendix, results of several benchmark data sets analyses are given as a supplement of the presented results.
LA - eng
KW - natural hazards monitoring; regression rules; time series forecasting; k-nearest neighbors; -nearest neighbors
UR - http://eudml.org/doc/208123
ER -

References

top
  1. Bloedorn, E. and Michalski, R. (2002). Data-driven constructive induction, IEEE Intelligent Systems 13(2): 30-37. 
  2. Boser, B., Guyon, I. and Vapnik, V. (1992). A training algorithm for optimal margin classifiers, Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, Pittsburgh, PA, USA, pp. 144-152. 
  3. Box, G. and Jenkins, G. (1994). Time Series Analysis: Forecasting and Control, Prentice-Hall, Upper Saddle River, NJ. Zbl0858.62072
  4. Breiman, L., Friedman, J.H., Olshen, R.A. and Stone, C.J. (1994). Classification and Regression Trees, Wadsworth, Belmont, CA. Zbl0541.62042
  5. Brockwell, P. and Davis, R. (2002). Introduction to Time Series Forecasting, Springer-Verlag, New York, NY. Zbl0994.62085
  6. Broyden, C. (1969). A new double-rank minimization algorithm, Notices of the American Mathematical Society 16: 670. 
  7. Cao, L. and Tay, F. (2003). Support vector machine with adaptive parameters in financial time series forecasting, IEEE Transactions on Neural Networks 14(6): 1506-1518. 
  8. Chen, X., Yang, J. and Liang, J. (2011). A flexible support vector machine for regression, Neural Computing & Applications, DOI 10.1007/s00521-011-0623-5. 
  9. Chunshien, L. and Kuo-Hsiang, C. (2007). Recurrent neurofuzzy hybrid-learning approach to accurate systems modeling, Fuzzy Sets and Systems 158(2): 194-212. Zbl1110.93030
  10. Czogała, E. and Łęski, J. (2000). Fuzzy and Neuro-Fuzzy Intelligent Systems. Studies in Fuzziness and Soft Computing, Springer-Verlag, New York, NY. Zbl0953.68122
  11. Dembczyński, K., Kotowiski, W. and Słowiński, R. (2010). Ender: A statistical framework for boosting decision rules, Data Mining and Knowledge Discovery 21(1): 52-90. 
  12. Dixon, W. (1992). A Statistical Analysis of Monitored Data for Methane Prediction, Ph.D. thesis, University of Nottingham, Nottingham. 
  13. Duch, W., Adamczak, R. and Grabczewski, K. (2000). A new methodology of extraction, optimization and application of crisp and fuzzy logical rules, IEEE Transactions on Neural Networks 11(10): 1-31. 
  14. Friedman, J., Kohavi, R. and Yun, Y. (1996). Lazy decision trees, Proceedings of AAAI/IAAI, Portland, OR, USA, pp. 717-724. 
  15. Gale, W., Heasley, K., Iannacchione, A., Swanson, P., Hatherly, P. and King, A. (2001). Rock damage characterization from microseismic monitoring, Proceedings of the 38th US Symposium of Rock Mechanics, Lisse, The Netherlands, pp. 1313-1320. 
  16. Goldberg, D. (1989). Genetics Algorithms in Search, Optimization and Machine Learning, Addison-Wesley Publishing Company, Boston, MA. Zbl0721.68056
  17. Góra, G. and Wojna, A. (2002). Riona: A new classification system combining rule induction and instance-based learning, Fundamenta Informaticae 51(4): 369-390. Zbl1011.68114
  18. Grychowski, T. (2008). Hazard assessment based on fuzzy logic, Archives of Mining Sciences 53(4): 595-602. 
  19. Hao, P. (2010). New support vector algorithms with parametric insensitive/margin model, Neural Networks 23(1): 60-73. 
  20. Jang, J.-S. (1994). Structure determination in fuzzy modelling: A fuzzy cart approach, Proceedings of the IEEE International Conference on Fuzzy Systems, Orlando, FL, USA, pp. 480-485. 
  21. Janssen, F. and Fürnkranz, J. (2010a). On the quest for optimal rule learning heuristics, Machine Learning 78(3): 343-379. 
  22. Janssen, F. and Fürnkranz, J. (2010b). Separate-and-conquer regression, Proceedings of LWA 2010: Lernen, Wissen, Adaptivität, Kassel, Germany, pp. 81-89. 
  23. Jonak, J. (2002). Hazard assessment based on fuzzy logic, Journal of Mining Sciences 38(3): 270-277. 
  24. Kabiesz, J. (2005). Effect of the form of data on the quality of mine tremors hazard forecasting using neural networks, Geotechnical and Geological Engineering 24(5): 1131-1147. 
  25. Katayama, N. and Satoh, S. (1997). The SR-tree: An index structure for high dimensional nearest neighbor queries, Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data, New York, NY, USA, pp. 369-380. 
  26. Macleod, J., Luk, A. and Titterington, D. (1987). A reexamination of the distance-weighted k-nearest-neighbor classification rule, IEEE Transactions on Systems, Man and Cybernetics 17(4): 689-696. 
  27. Malerba, D., Esposito, F., Ceci, M. and Appice, A. (2005). Topdown induction of model trees with regression and splitting nodes, IEEE Transactions on Pattern Analysis and Machine Intelligence 26(5): 612-625. 
  28. Michalak, M. (2011). Adaptive kernel approach to the time series prediction, Pattern Analysis and Applications 14(3): 283-293. 
  29. Nelles, O., Fink, A., Babuška, R. and Setnes, M. (2000). Comparison of two construction algorithms for Takagi-Sugeno fuzzy models, International Journal of Applied Mathematics and Computer Science 10(4): 835-855. Zbl0972.68168
  30. Oh, S. and Pedrycz, W. (2000). Identification of fuzzy systems by means of an auto-tuning algorithm and its application to nonlinear systems, Fuzzy Sets and Systems 115(2): 205-230. Zbl0965.93045
  31. Quinlan, J. (1992a). Learning with continuous classes, Proceedings of the International Conference on Artificial Intelligence, Singapore, pp. 343-348. 
  32. Quinlan, J.R. (1992b). C4.5 Programs for Machine Learning, Morgan Kaufman Publishers, San Mateo, CA. 
  33. Quinlan, J. (1993). Combining instance-based learning and model-based learning, Proceedings of the 10th International Conference on Machine Learning, San Mateo, CA, USA, pp. 236-243. 
  34. Rutkowski, L. (2004). Generalized regression neural networks in time-varying environment, IEEE Transactions on Neural Networks 15(3): 576-596. 
  35. Scholkopf, B., Smola, A., Williamson, R. and Bartlett, P. (2000). New support vector algorithms, Neural Computation 12(5): 1207-1245. 
  36. Schuster, H. (1998). Deterministic Chaos, VCH Verlagsgesellschaft, New York, NY. 
  37. Sikora, M. and Krzykawski, D. (2005). Application of data exploration methods in analysis of carbon dioxide emission in hard-coal mines dewater pump stations, Mechanizacja i Automatyzacja Górnictwa 413(6): 57-67, (in Polish). 
  38. Sikora, M., Krzystanek, Z., Bojko, B. and Śpiechowicz, K. (2011). Application of a hybrid method of machine learning for description and on-line estimation of methane hazard in mine workings, Journal of Mining Sciences 47(4): 493-505. 
  39. Sikora, M. and Sikora, B. (2006). Application of machine learning for prediction a methane concentration in a coal mine, Archives of Mining Sciences 51(4): 475-492. 
  40. Sikora, M. and Wróbel, Ł. (2010). Application of rule induction algorithms for analysis of data collected by seismic hazard monitoring systems in coal mines, Archives of Mining Sciences 55(1): 91-114. 
  41. Siwek, K., Osowski, S. and Szupiluk, R. (2009). Ensemble neural network approach for accurate load forecasting in a power system, International Journal of Applied Mathematics and Computer Science 19(2): 303-315, DOI: 10.2478/v10006-009-0026-2. Zbl1167.93338
  42. Tay, F. and Cao, L. (2002). Modified support vector machines in financial time series forecasting, Neurocomputing 48(1): 847-861. Zbl1006.68777
  43. Taylor, J. and Cristianini, N. (2004). Kernel Methods for Pattern Analysis, Cambridge University Press, Cambridge. 
  44. Tong, H. (1990). Non-linear Time Series: A Dynamical Systems Approach, Oxford University Press, Oxford. Zbl0716.62085
  45. Torgo, L. (1997). Kernel regression trees, Proceedings of Poster Papers, European Conference on Machine Learning, Prague, Czech Republic, pp. 118-127. 
  46. Vapnik, V. (1995). The Nature of Statistical Learning Theory, Springer, New York, NY. Zbl0833.62008
  47. Wang, Y. and Witten, I. (1997). Inducing model trees for continuous classes, Proceedings of Poster Papers, European Conference on Machine Learning, Prague, Czech Republic, pp. 128-137. 
  48. Weigend, A., Huberman, B. and Rumelhart, D. (1990). Predicting the future: A connectionist approach, International Journal of Neural Systems 1(3): 193-209. 
  49. Wess, S., Althoff, K. and Derwand, G. (1994). Using k-d trees to improve the retrieval step in case-based reasoning, in S. Wess, K.-D. Althoff and M. Richter (Eds.), Topics in Case-Based Reasoning, Springer-Verlag, Berlin, pp. 167-181. 
  50. Wettschereck, D., Aha, D. and Mohri, T. (1997). A review and empirical evaluation of feature weighting methods for a class of lazy learning algorithms, Artificial Intelligence Review 11(1-5): 273-314. 
  51. Wilson, D. and Martinez, T.R. (2000). An integrated instance-based learning algorithm, Computational Intelligence 16(1): 1-28. 
  52. Witten, I. and Frank, E. (2005). Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann, San Francisco, CA. Zbl1076.68555
  53. Wnek, J. and Michalski, R.S. (1994). Hypothesis-driven constructive induction in AQ17-HCI: A method and experiments, Machine Learning 14(2): 139-168. Zbl0804.68125
  54. Yager, R. and Filev, D. (1994). Essentials of Fuzzy Modeling and Control, John Wiley and Sons, New York, NY. 

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.