Seasonal time-series imputation of gap missing algorithm (STIGMA)

Eduardo Rangel-Heras; Pavel Zuniga; Alma Y. Alanis; Esteban A. Hernandez-Vargas; Oscar D. Sanchez

Kybernetika (2023)

  • Volume: 59, Issue: 6, page 861-879
  • ISSN: 0023-5954

Abstract

top
This work presents a new approach for the imputation of missing data in weather time-series from a seasonal pattern; the seasonal time-series imputation of gap missing algorithm (STIGMA). The algorithm takes advantage from a seasonal pattern for the imputation of unknown data by averaging available data. We test the algorithm using data measured every 10 minutes over a period of 365 days during the year 2010; the variables include global irradiance, diffuse irradiance, ultraviolet irradiance, and temperature, arranged in a matrix of dimensions 52 , 560 rows for data points over time and 4 columns for weather variables. The particularity of this work is that the algorithm is well-suited for the imputation of values when the missing data are presented continuously and in seasonal patterns. The algorithm employs a date-time index to collect available data for the imputation of missing data, repeating the process until all missing values are calculated. The tests are performed by removing 5 % , 10 % , 15 % , 20 % , 25 % , and 30 % of the available data, and the results are compared to autoregressive models. The proposed algorithm has been successfully tested with a maximum of 2 , 736 contiguous missing values that account for 19 consecutive days of a single month; this dataset is a portion of all the missing values when the time-series lacks 30 % of all data. The metrics to measure the performance of the algorithms are root-mean-square error (RMSE) and the coefficient of determination ( R 2 ). The results indicate that the proposed algorithm outperforms autoregressive models while preserving the seasonal behavior of the time-series. The STIGMA is also tested with non-weather time-series of beer sales and number of air passengers per month, which also have a cyclical pattern, and the results show the precise imputation of data.

How to cite

top

Rangel-Heras, Eduardo, et al. "Seasonal time-series imputation of gap missing algorithm (STIGMA)." Kybernetika 59.6 (2023): 861-879. <http://eudml.org/doc/299203>.

@article{Rangel2023,
abstract = {This work presents a new approach for the imputation of missing data in weather time-series from a seasonal pattern; the seasonal time-series imputation of gap missing algorithm (STIGMA). The algorithm takes advantage from a seasonal pattern for the imputation of unknown data by averaging available data. We test the algorithm using data measured every $10$ minutes over a period of $365$ days during the year 2010; the variables include global irradiance, diffuse irradiance, ultraviolet irradiance, and temperature, arranged in a matrix of dimensions $52,560$ rows for data points over time and $4$ columns for weather variables. The particularity of this work is that the algorithm is well-suited for the imputation of values when the missing data are presented continuously and in seasonal patterns. The algorithm employs a date-time index to collect available data for the imputation of missing data, repeating the process until all missing values are calculated. The tests are performed by removing $5\%$, $10\%$, $15\%$, $20\%$, $25\%$, and $30\%$ of the available data, and the results are compared to autoregressive models. The proposed algorithm has been successfully tested with a maximum of $2,736$ contiguous missing values that account for $19$ consecutive days of a single month; this dataset is a portion of all the missing values when the time-series lacks $30\%$ of all data. The metrics to measure the performance of the algorithms are root-mean-square error (RMSE) and the coefficient of determination ($R^\{2\}$). The results indicate that the proposed algorithm outperforms autoregressive models while preserving the seasonal behavior of the time-series. The STIGMA is also tested with non-weather time-series of beer sales and number of air passengers per month, which also have a cyclical pattern, and the results show the precise imputation of data.},
author = {Rangel-Heras, Eduardo, Zuniga, Pavel, Alanis, Alma Y., Hernandez-Vargas, Esteban A., Sanchez, Oscar D.},
journal = {Kybernetika},
keywords = {contiguous missing values; seasonal patterns; time-series},
language = {eng},
number = {6},
pages = {861-879},
publisher = {Institute of Information Theory and Automation AS CR},
title = {Seasonal time-series imputation of gap missing algorithm (STIGMA)},
url = {http://eudml.org/doc/299203},
volume = {59},
year = {2023},
}

TY - JOUR
AU - Rangel-Heras, Eduardo
AU - Zuniga, Pavel
AU - Alanis, Alma Y.
AU - Hernandez-Vargas, Esteban A.
AU - Sanchez, Oscar D.
TI - Seasonal time-series imputation of gap missing algorithm (STIGMA)
JO - Kybernetika
PY - 2023
PB - Institute of Information Theory and Automation AS CR
VL - 59
IS - 6
SP - 861
EP - 879
AB - This work presents a new approach for the imputation of missing data in weather time-series from a seasonal pattern; the seasonal time-series imputation of gap missing algorithm (STIGMA). The algorithm takes advantage from a seasonal pattern for the imputation of unknown data by averaging available data. We test the algorithm using data measured every $10$ minutes over a period of $365$ days during the year 2010; the variables include global irradiance, diffuse irradiance, ultraviolet irradiance, and temperature, arranged in a matrix of dimensions $52,560$ rows for data points over time and $4$ columns for weather variables. The particularity of this work is that the algorithm is well-suited for the imputation of values when the missing data are presented continuously and in seasonal patterns. The algorithm employs a date-time index to collect available data for the imputation of missing data, repeating the process until all missing values are calculated. The tests are performed by removing $5\%$, $10\%$, $15\%$, $20\%$, $25\%$, and $30\%$ of the available data, and the results are compared to autoregressive models. The proposed algorithm has been successfully tested with a maximum of $2,736$ contiguous missing values that account for $19$ consecutive days of a single month; this dataset is a portion of all the missing values when the time-series lacks $30\%$ of all data. The metrics to measure the performance of the algorithms are root-mean-square error (RMSE) and the coefficient of determination ($R^{2}$). The results indicate that the proposed algorithm outperforms autoregressive models while preserving the seasonal behavior of the time-series. The STIGMA is also tested with non-weather time-series of beer sales and number of air passengers per month, which also have a cyclical pattern, and the results show the precise imputation of data.
LA - eng
KW - contiguous missing values; seasonal patterns; time-series
UR - http://eudml.org/doc/299203
ER -

References

top
  1. Ahn, H., Sun, K., Kim, K. P., , Computers Materials Continua 70 (2022), 767-779. DOI
  2. Anava, O., Hazan, E., Zeevi, A., International Conference on Machine Learning., Proc. Machine Learning Research, Lille 2015. 
  3. Bashir, F., Wei, H. L., , Neurocomputing 276 (2018), 23-30. DOI
  4. Batista, G. E. A. P. A., Monard, M. C., , Appl. Artific. Intell. 17 (2003), 519-533. DOI
  5. Bras, L. P., Menezes, J. C., , IEE Proceedings - Systems Biology, 153 (2006), 105-119. DOI
  6. Brown, S., Tauler, R., Walczak, B., Comprehensive Chemometrics: Chemical and Biochemical Data Analysis. (Second edition.), Elsevier, Smsterdam 2020. 
  7. Choong, M. K., Charbit, M., Yan, H., , IEEE Trans. Inform. Technol. Biomedicine 13 (2009), 131-137. DOI
  8. Dan, E. L., Dinşoreanu, M., Mureşan, R. C., 2020 IEEE International Conference on Automation, Quality and Testing, Robotics (AQTR)., IEEE, London 2020. 
  9. Dunsmuir, W., Robinson, P. M., , J. Amer. Statist. Assoc. 76 (1981), 560-568. DOI
  10. Folch-Fortuny, A., Arteaga, F., Ferrer, A., , BMC Bioinformatics 16 (2015), 1-12. DOI
  11. Folch-Fortuny, A., Arteaga, F., Ferrer, A., , Chemometr. Intell. Labor. Systems 146 (2015), 77-88. DOI
  12. Folch-Fortuny, A., Arteaga, F., Ferrer, A., , Chemometr. Intell. Labor. Systems 154 (2016), 93-100. DOI
  13. González-Martíneza, J. M., Noord, O. E. de, Ferrer, A., , J. Chemometr. 28 (2014), 462-475. DOI
  14. Hui, D., Wan, S., Su, B, Katul, G., Monson, R., Luo, Y., , Agricultur. Forest Meteorology 121 (2004), 93-111. DOI
  15. Junger, W. L., Leon, A. Ponce de, , Atmosph. Environment 102 (2015), 96-104. DOI
  16. Liu, S., Molenaar, P. C. M., , Behavior Res. Methods 46 (2014), 1138-1148. DOI
  17. Magán-Carrión, R., Pulido-Pulido, F., Camacho, J., García-Teodoro, P., , J. Commun. 8 (2013), 738-750. DOI
  18. Makridakis, S., Wheelwright, S. C., Hyndman, R. J., Forecasting: Methods and Applications. (Third edition.), Wiley, India 2008. 
  19. Montgomery, D. C., Statistical Quality Control. (Sixth edition.), Wiley, New York 2005. 
  20. Murad, H., Dankner, R., Berlin, A., Olmer, L., Freedman, L. S., , Statist. Methods Medical Res. 29 (2020), 2074-2086. MR4128979DOI
  21. Neves, D. T., Alves, J., Naik, M. G., Proenca, A. J., Prasser, F., , J. Comput. Sci. 61 (2022), 101640. DOI
  22. Noor, N. M., Bakri-Abdullah, M. M. Al, Yahaya, A. Shukri, Ramli, N. A., Comparison of Linear Interpolation Method and Mean Method to Replace the Missing Values in Environmental Data Set., Trans Tech Publications, Switzerland 2014. 
  23. Pedreschi, R., Hertog, M. L. A. T. M., Carpentier, S. C., Lammertyn, J., Robben, J., Noben, J. P., Panis, B., Swennen, R., Nicola, B. M., , Proteomics 29 (2008), 1371-1383. DOI
  24. Quevedo, J., Puig, V., Cembrano, G., Aguilar, J., Isaza, C., Saporta, D., Benito, G., Hedo, M., Molina, A., , IFAC Proc. Vol. 39 (2006), 1181-1186. DOI
  25. Sun, Y., Li, J., Xu, Y., Zhang, T., Wang, X., , Expert Systems Appl. 227 (2023), 120-201. MR4523179DOI
  26. Zarzo, M., Martí, P., , Appl. Energy 88 (2011), 2775-2784. DOI
  27. Zhang, Z., , AME Publ. 4 (2016), 1-8. DOI

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.