Survival analysis on data streams: Analyzing temporal events in dynamically changing environments

Ammar Shaker; Eyke Hüllermeier

International Journal of Applied Mathematics and Computer Science (2014)

  • Volume: 24, Issue: 1, page 199-212
  • ISSN: 1641-876X

Abstract

top
In this paper, we introduce a method for survival analysis on data streams. Survival analysis (also known as event history analysis) is an established statistical method for the study of temporal “events” or, more specifically, questions regarding the temporal distribution of the occurrence of events and their dependence on covariates of the data sources. To make this method applicable in the setting of data streams, we propose an adaptive variant of a model that is closely related to the well-known Cox proportional hazard model. Adopting a sliding window approach, our method continuously updates its parameters based on the event data in the current time window. As a proof of concept, we present two case studies in which our method is used for different types of spatio-temporal data analysis, namely, the analysis of earthquake data and Twitter data. In an attempt to explain the frequency of events by the spatial location of the data source, both studies use the location as covariates of the sources.

How to cite

top

Ammar Shaker, and Eyke Hüllermeier. "Survival analysis on data streams: Analyzing temporal events in dynamically changing environments." International Journal of Applied Mathematics and Computer Science 24.1 (2014): 199-212. <http://eudml.org/doc/271889>.

@article{AmmarShaker2014,
abstract = {In this paper, we introduce a method for survival analysis on data streams. Survival analysis (also known as event history analysis) is an established statistical method for the study of temporal “events” or, more specifically, questions regarding the temporal distribution of the occurrence of events and their dependence on covariates of the data sources. To make this method applicable in the setting of data streams, we propose an adaptive variant of a model that is closely related to the well-known Cox proportional hazard model. Adopting a sliding window approach, our method continuously updates its parameters based on the event data in the current time window. As a proof of concept, we present two case studies in which our method is used for different types of spatio-temporal data analysis, namely, the analysis of earthquake data and Twitter data. In an attempt to explain the frequency of events by the spatial location of the data source, both studies use the location as covariates of the sources.},
author = {Ammar Shaker, Eyke Hüllermeier},
journal = {International Journal of Applied Mathematics and Computer Science},
keywords = {data streams; survival analysis; event history analysis; earthquake data; Twitter data},
language = {eng},
number = {1},
pages = {199-212},
title = {Survival analysis on data streams: Analyzing temporal events in dynamically changing environments},
url = {http://eudml.org/doc/271889},
volume = {24},
year = {2014},
}

TY - JOUR
AU - Ammar Shaker
AU - Eyke Hüllermeier
TI - Survival analysis on data streams: Analyzing temporal events in dynamically changing environments
JO - International Journal of Applied Mathematics and Computer Science
PY - 2014
VL - 24
IS - 1
SP - 199
EP - 212
AB - In this paper, we introduce a method for survival analysis on data streams. Survival analysis (also known as event history analysis) is an established statistical method for the study of temporal “events” or, more specifically, questions regarding the temporal distribution of the occurrence of events and their dependence on covariates of the data sources. To make this method applicable in the setting of data streams, we propose an adaptive variant of a model that is closely related to the well-known Cox proportional hazard model. Adopting a sliding window approach, our method continuously updates its parameters based on the event data in the current time window. As a proof of concept, we present two case studies in which our method is used for different types of spatio-temporal data analysis, namely, the analysis of earthquake data and Twitter data. In an attempt to explain the frequency of events by the spatial location of the data source, both studies use the location as covariates of the sources.
LA - eng
KW - data streams; survival analysis; event history analysis; earthquake data; Twitter data
UR - http://eudml.org/doc/271889
ER -

References

top
  1. Aggarwal, C.C., Han, J., Wang, J. and Yu, P.S. (2003). A framework for clustering evolving data streams, Proceedings of the 29th International Conference on Very Large Data Bases, Berlin, Germany, pp. 81-92. 
  2. Allan, J., Papka, R. and Lavrenko, V. (1998). On-line new event detection and tracking, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1998), Melbourne, Australia, pp. 37-45. 
  3. Amati, G., Amodeo, G. and Gaibisso, C. (2012). Survival analysis for freshness in microblogging search, Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM-2012), Maui, HI, USA, pp. 2483-2486. 
  4. Amodeo, G., Blanco, R. and Brefeld, U. (2011). Hybrid models for future event prediction, Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM-2011), Glasgow, UK, pp. 1981-1984. 
  5. Babcock, B., Babu, S., Datar, M., Motwani, R. and Widom, J. (2002). Models and issues in data stream systems, Proceedings of the 21st ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Madison, WI, USA, pp. 1-16. 
  6. Beringer, J. and H¨ullermeier, E. (2006). Online clustering of parallel data streams, Data and Knowledge Engineering 58(2): 180-204. 
  7. Bottou, L. (1998). Online algorithms and stochastic approximations, in D. Saad (Ed.), Online Learning and Neural Networks, Cambridge University Press, Cambridge. Zbl0968.68127
  8. Chen, G., Wu, X. and Zhu, X. (2005). Sequential pattern mining in multiple streams, Proceedings of the 5th IEEE International Conference on Data Mining (ICDM), Houston, TX, USA, pp. 585-588. 
  9. Cheon, S.-P., Kim, S., Lee, S.-Y. and Lee, C.-B. (2009). Bayesian networks based rare event prediction with sensor data, Knowledge-Based Systems 22(5): 336-343. 
  10. Cherniack, M., Balakrishnan, H., Balazinska, M., Carney, D., Cetintemel, U., Xing, Y. and Zdonik, S. (2003). Scalable distributed stream processing, Proceedings of CIDR-03: 1st Biennial Conference on Innovative Database Systems, Asilomar, CA, USA. 
  11. Considine, J., Li, F., Kollios, G. and Byers, J. (2004). Approximate aggregation techniques for sensor databases, ICDE-04: 20th IEEE International Conference on Data Engineering, Boston, MA, USA, pp. 449-460. 
  12. Cormode, G. and Muthukrishnan, S. (2005). What's hot and what's not: Tracking most frequent items dynamically, ACM Transactions on Database Systems 30(1): 249-278. 
  13. Cox, D. (1972). Regression models and life tables, Journal of the Royal Statistical Society B 34(2): 187-220. Zbl0243.62041
  14. Cox, D. R. and Oakes, D. (1984). Analysis of Survival Data, Chapman & Hall, London. 
  15. Das, A., Gehrke, J. and Riedewald, M. (2003). Approximate join processing over data streams, Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, San Diego, CA, USA, pp. 40-51. 
  16. Domingos, P. and Hulten, G. (2003). A general framework for mining massive data streams, Journal of Computational and Graphical Statistics 12(4): 945-949. 
  17. Gaber, M.M., Zaslavsky, A. and Krishnaswamy, S. (2005). Mining data streams: A review, ACM SIGMOD Record 34(1): 18-26. Zbl1087.68557
  18. Gama, J. (2012). A survey on learning from data streams: Current and future trends, Progress in Artificial Intelligence 1(1): 45-55. 
  19. Gama, J. and Gaber, M.M. (2007). Learning from Data Streams, Springer-Verlag, Berlin/New York, NY. Zbl1153.68361
  20. Garofalakis, M., Gehrke, J. and Rastogi, R. (2002). Querying and mining data streams: You only get one look, Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, Madison, WI, USA, pp. 635-635. 
  21. Golab, L. and Tamer, M. (2003). Issues in data stream management, ACM SIGMOD Record 32(2): 5-14. 
  22. Hulten, G., Spencer, L. and Domingos, P. (2001). Mining time-changing data streams, Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, pp. 97-106. 
  23. Ikonomovska, E., Gama, J. and Dzeroski, S. (2011). Learning model trees from evolving data streams, Data Mining and Knowledge Discovery 23(1): 128-168. Zbl1235.68158
  24. Krizanovic, K., Galic, Z. and Baranovic, M. (2011). Data types and operations for spatio-temporal data streams, IEEE International Conference on Mobile Data Management (MDM), Luleå, Sweden, pp. 11-14. 
  25. Li, R., Lei, K.H., Khadiwala, R. and Chang, K.C.-C. (2012). Tedas: A twitter-based event detection and analysis system, Proceedings of the IEEE 28th International Conference on Data Engineering (ICDE 2012), Washington, DC, USA, pp. 1273-1276. 
  26. Oliveira, M. and Gama, J. (2012). A framework to monitor clusters evolution applied to economy and finance problems, Intelligent Data Analysis 16(1): 93-111. 
  27. Radinsky, K. and Horvitz, E. (2013). Mining the web to predict future events, Proceedings of the 6th ACM International Conference on Web Search and Data Mining (WSDM 2013), Rome, Italy, pp. 255-264. 
  28. Sakaki, T., Okazaki, M. and Matsuo, Y. (2013). Tweet analysis for real-time event detection and earthquake reporting system development, IEEE Transactions on Knowledge and Data Engineering 25(4): 919-931. 
  29. Weng, J. and Lee, B.-S. (2011). Event detection in twitter, Proceedings of the 5th International Conference on Weblogs and Social Media (ICWSM 2011), Barcelona, Spain. 
  30. Yang, Y., Pierce, T. and Carbonell, J.G. (1998). A study of retrospective and on-line event detection, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1998), Melbourne, Australia, pp. 28-36. 
  31. Zadeh, L. (1965). 8(3): 338-353. Fuzzy sets, Information and Control Zbl0139.24606
  32. Zupan, B., Demšar, J., Kattan, M.W., Beck, J.R. and Bratko, I. (2000). Machine learning for survival analysis: A case study on recurrence of prostate cancer, Artificial Intelligence in Medicine 20(1): 59-75. 

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.