Optimization of the maximum likelihood estimator for determining the intrinsic dimensionality of high-dimensional data

Rasa Karbauskaitė; Gintautas Dzemyda

International Journal of Applied Mathematics and Computer Science (2015)

  • Volume: 25, Issue: 4, page 895-913
  • ISSN: 1641-876X

Abstract

top
One of the problems in the analysis of the set of images of a moving object is to evaluate the degree of freedom of motion and the angle of rotation. Here the intrinsic dimensionality of multidimensional data, characterizing the set of images, can be used. Usually, the image may be represented by a high-dimensional point whose dimensionality depends on the number of pixels in the image. The knowledge of the intrinsic dimensionality of a data set is very useful information in exploratory data analysis, because it is possible to reduce the dimensionality of the data without losing much information. In this paper, the maximum likelihood estimator (MLE) of the intrinsic dimensionality is explored experimentally. In contrast to the previous works, the radius of a hypersphere, which covers neighbours of the analysed points, is fixed instead of the number of the nearest neighbours in the MLE. A way of choosing the radius in this method is proposed. We explore which metric-Euclidean or geodesic-must be evaluated in the MLE algorithm in order to get the true estimate of the intrinsic dimensionality. The MLE method is examined using a number of artificial and real (images) data sets.

How to cite

top

Rasa Karbauskaitė, and Gintautas Dzemyda. "Optimization of the maximum likelihood estimator for determining the intrinsic dimensionality of high-dimensional data." International Journal of Applied Mathematics and Computer Science 25.4 (2015): 895-913. <http://eudml.org/doc/275939>.

@article{RasaKarbauskaitė2015,
abstract = {One of the problems in the analysis of the set of images of a moving object is to evaluate the degree of freedom of motion and the angle of rotation. Here the intrinsic dimensionality of multidimensional data, characterizing the set of images, can be used. Usually, the image may be represented by a high-dimensional point whose dimensionality depends on the number of pixels in the image. The knowledge of the intrinsic dimensionality of a data set is very useful information in exploratory data analysis, because it is possible to reduce the dimensionality of the data without losing much information. In this paper, the maximum likelihood estimator (MLE) of the intrinsic dimensionality is explored experimentally. In contrast to the previous works, the radius of a hypersphere, which covers neighbours of the analysed points, is fixed instead of the number of the nearest neighbours in the MLE. A way of choosing the radius in this method is proposed. We explore which metric-Euclidean or geodesic-must be evaluated in the MLE algorithm in order to get the true estimate of the intrinsic dimensionality. The MLE method is examined using a number of artificial and real (images) data sets.},
author = {Rasa Karbauskaitė, Gintautas Dzemyda},
journal = {International Journal of Applied Mathematics and Computer Science},
keywords = {multidimensional data; intrinsic dimensionality; maximum likelihood estimator; manifold learning methods; image understanding; dimensionality reduction; manifold learning; multidimensional data visualization; locally linear embedding; topology preservation},
language = {eng},
number = {4},
pages = {895-913},
title = {Optimization of the maximum likelihood estimator for determining the intrinsic dimensionality of high-dimensional data},
url = {http://eudml.org/doc/275939},
volume = {25},
year = {2015},
}

TY - JOUR
AU - Rasa Karbauskaitė
AU - Gintautas Dzemyda
TI - Optimization of the maximum likelihood estimator for determining the intrinsic dimensionality of high-dimensional data
JO - International Journal of Applied Mathematics and Computer Science
PY - 2015
VL - 25
IS - 4
SP - 895
EP - 913
AB - One of the problems in the analysis of the set of images of a moving object is to evaluate the degree of freedom of motion and the angle of rotation. Here the intrinsic dimensionality of multidimensional data, characterizing the set of images, can be used. Usually, the image may be represented by a high-dimensional point whose dimensionality depends on the number of pixels in the image. The knowledge of the intrinsic dimensionality of a data set is very useful information in exploratory data analysis, because it is possible to reduce the dimensionality of the data without losing much information. In this paper, the maximum likelihood estimator (MLE) of the intrinsic dimensionality is explored experimentally. In contrast to the previous works, the radius of a hypersphere, which covers neighbours of the analysed points, is fixed instead of the number of the nearest neighbours in the MLE. A way of choosing the radius in this method is proposed. We explore which metric-Euclidean or geodesic-must be evaluated in the MLE algorithm in order to get the true estimate of the intrinsic dimensionality. The MLE method is examined using a number of artificial and real (images) data sets.
LA - eng
KW - multidimensional data; intrinsic dimensionality; maximum likelihood estimator; manifold learning methods; image understanding; dimensionality reduction; manifold learning; multidimensional data visualization; locally linear embedding; topology preservation
UR - http://eudml.org/doc/275939
ER -

References

top
  1. Álvarez-Meza, A.M., Valencia-Aguirre, J., Daza-Santacoloma, G. and Castellanos-Domínguez, G. (2011). Global and local choice of the number of nearest neighbors in locally linear embedding, Pattern Recognition Letters 32(16): 2171-2177. 
  2. Belkin, M. and Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation, Neural Computation 15(6): 1373-1396. Zbl1085.68119
  3. Brand, M. (2003). Charting a manifold, in S. Becker, S. Thrun and K. Obermayer (Eds.), Advances in Neural Information Processing Systems 15, MIT Press, Cambridge, MA, pp. 961-968. 
  4. Camastra, F. (2003). Data dimensionality estimation methods: A survey, Pattern Recognition 36(12): 2945-2954. Zbl1059.68100
  5. Carter, K.M., Raich, R. and Hero, A.O. (2010). On local intrinsic dimension estimation and its applications, IEEE Transactions on Signal Processing 58(2): 650-663. 
  6. Chang, Y., Hu, C. and Turk, M. (2004). Probabilistic expression analysis on manifolds, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR(2), Washington, DC, USA, pp. 520-527. 
  7. Costa, J.A. and Hero, A.O. (2004). Geodesic entropic graphs for dimension and entropy estimation in manifold learning, IEEE Transactions on Signal Processing 52(8): 2210-2221. 
  8. Costa, J.A. and Hero, A.O. (2005). Estimating local intrinsic dimension with k-nearest neighbor graphs, IEEE Transactions on Statistical Signal Processing 30(23): 1432-1436. 
  9. Donoho, D.L. and Grimes, C. (2005). Hessian eigenmaps: New locally linear embedding techniques for high-dimensional data, Proceedings of the National Academy of Sciences 102(21): 7426-7431. 
  10. Dzemyda, G., Kurasova, O. and Žilinskas, J. (2013). Multidimensional Data Visualization: Methods and Applications, Optimization and Its Applications, Vol. 75, Springer-Verlag, New York, NY. Zbl06062044
  11. Einbeck, J. and Kalantan, Z. (2013). Intrinsic dimensionality estimation for high-dimensional data sets: New approaches for the computation of correlation dimension, Journal of Emerging Technologies in Web Intelligence 5(2): 91-97. 
  12. Elgammal, A. and su Lee, C. (2004a). Inferring 3d body pose from silhouettes using activity manifold learning, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR(2), Washington, DC, USA, pp. 681-688. 
  13. Elgammal, A. and su Lee, C. (2004b). Separating style and content on a nonlinear manifold, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR(1), Washington, DC, USA, pp. 478-485. 
  14. Fan, M., Zhang, X., Chen, S., Bao, H. and Maybank, S.J. (2013). Dimension estimation of image manifolds by minimal cover approximation, Neurocomputing 105: 19-29. 
  15. Fukunaga, K. (1982). Intrinsic dimensionality extraction, in P. Krishnaiah and L. Kanal (Eds.), Classification, Pattern Recognition and Reduction of Dimensionality, Handbook of Statistics, Vol. 2, North-Holland, Amsterdam, pp. 347-362. 
  16. Fukunaga, K. and Olsen, D. (1971). An algorithm for finding intrinsic dimensionality of data, IEEE Transactions on Computers 20(2): 176-183. Zbl0216.50201
  17. Gong, S., Cristani, M., Yan, S. and Loy, C.C. (Eds.) (2014). Person Re-Identification, Advances in Computer Vision and Pattern Recognition, Vol. XVIII, Springer, London. Zbl1282.68066
  18. Grassberger, P. and Procaccia, I. (1983). Measuring the strangeness of strange attractors, Physica D: Nonlinear Phenomena 9(1-2): 189-208. Zbl0593.58024
  19. Hadid, A., Kouropteva, O. and Pietikäinen, M. (2002). Unsupervised learning using locally linear embedding: experiments with face pose analysis, 16th International Conference on Pattern Recognition, ICPR'02(1), Quebec City, Quebec, Canada, pp. 111-114. 
  20. He, J., Ding, L., Jiang, L., Li, Z. and Hu, Q. (2014). Intrinsic dimensionality estimation based on manifold assumption, Journal of Visual Communication and Image Representation 25(5): 740-747. 
  21. Hein, M. and Audibert, J. (2005). Intrinsic dimensionality estimation of submanifolds in rd , Machine Learning: Proceedings of the 22nd International Conference (ICML 2005), Bonn, Germany, pp. 289-296. 
  22. Jenkins, O.C. and Mataric, M.J. (2004). A spatio-temporal extension to isomap nonlinear dimension reduction, 21st International Conference on Machine Learning, ICML(69), Banff, Alberta, Canada, pp. 441-448. 
  23. Karbauskaitė, R. and Dzemyda, G. (2009). Topology preservation measures in the visualization of manifold-type multidimensional data, Informatica 20(2): 235-254. Zbl1180.68217
  24. Karbauskaitė, R. and Dzemyda, G. (2014). Geodesic distances in the intrinsic dimensionality estimation using packing numbers, Nonlinear Analysis: Modelling and Control 19(4): 578-591. 
  25. Karbauskaitė, R., Dzemyda, G. and Marcinkeviˇcius, V. (2008). Selecting a regularization parameter in the locally linear embedding algorithm, 20th International EURO Mini Conference on Continuous Optimization and Knowledgebased Technologies (EurOPT2008), Neringa, Lithuania, pp. 59-64. 
  26. Karbauskaitė, R., Dzemyda, G. and Marcinkeviˇcius, V. (2010). Dependence of locally linear embedding on the regularization parameter, An Official Journal of the Spanish Society of Statistics and Operations Research 18(2): 354-376. Zbl1273.62139
  27. Karbauskaitė, R., Dzemyda, G. and Mazėtis, E. (2011). Geodesic distances in the maximum likelihood estimator of intrinsic dimensionality, Nonlinear Analysis: Modelling and Control 16(4): 387-402. Zbl1271.93148
  28. Karbauskaitė, R., Kurasova, O. and Dzemyda, G. (2007). Selection of the number of neighbours of each data point for the locally linear embedding algorithm, Information Technology and Control 36(4): 359-364. 
  29. Kégl, B. (2003). Intrinsic dimension estimation using packing numbers, Advances in Neural Information Processing Systems, NIPS(15), Cambridge, MA, USA, pp. 697-704. 
  30. Kouropteva, O., Okun, O. and Pietikäinen, M. (2002). Selection of the optimal parameter value for the locally linear embedding algorithm, 1st International Conference on Fuzzy Systems and Knowledge Discovery, FSKD(1), Singapore, pp. 359-363. 
  31. Kulczycki, P. and Łukasik, S. (2014). An algorithm for reducing the dimension and size of a sample for data exploration procedures, International Journal of Applied Mathematics and Computer Science 24(1): 133-149, DOI: 10.2478/amcs-2014-0011. Zbl1292.93044
  32. Lee, J.A. and Verleysen, M. (2007). Nonlinear Dimensionality Reduction, Springer, New York, NY. Zbl1128.68024
  33. Levina, E. and Bickel, P.J. (2005). Maximum likelihood estimation of intrinsic dimension, in L.K. Saul, Y. Weiss and L. Bottou (Eds.), Advances in Neural Information Processing Systems 17, MIT Press, Cambridge, MA, pp. 777-784. 
  34. Levina, E., Wagaman, A.S., Callender, A.F., Mandair, G.S. and Morris, M.D. (2007). Estimating the number of pure chemical components in a mixture by maximum likelihood, Journal of Chemometrics 21(1-2): 24-34. 
  35. Li, S. Z., Xiao, R., Li, Z. and Zhang, H. (2001). Nonlinear mapping from multi-view face patterns to a Gaussian distribution in a low dimensional space, IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems (RATFG-RTS), Vancouver, BC, Canada, pp. 47-54. 
  36. Mo, D. and Huang, S.H. (2012). Fractal-based intrinsic dimension estimation and its application in dimensionality reduction, IEEE Transactions on Knowledge and Data Engineering 24(1): 59-71. 
  37. Nene, S.A., Nayar, S.K. and Murase, H. (1996). Columbia object image library (COIL-20), Technical Report CUCS-005-96, Columbia University, New York, NY. 
  38. Niskanen, M. and Silven, O. (2003). Comparison of dimensionality reduction methods for wood surface inspection, 6th International Conference on Quality Control by Artificial Vision, QCAV(5132), Gatlinburg, TN, USA, pp. 178-188. 
  39. Qiao, M.F.H. and Zhang, B. (2009). Intrinsic dimension estimation of manifolds by incising balls, Pattern Recognition 42(5): 780-787. Zbl1162.68405
  40. Roweis, S.T. and Saul, L.K. (2000). Nonlinear dimensionality reduction by locally linear embedding, Science 290(5500): 2323-2326. 
  41. Saul, L.K. and Roweis, S.T. (2003). Think globally, fit locally: Unsupervised learning of low dimensional manifolds, Journal of Machine Learning Research 4: 119-155. Zbl1093.68089
  42. Shin, Y.J. and Park, C.H. (2011). Analysis of correlation based dimension reduction methods, International Journal of Applied Mathematics and Computer Science 21(3): 549-558, DOI: 10.2478/v10006-011-0043-9. Zbl1230.68173
  43. Tenenbaum, J.B., de Silva, V. and Langford, J.C. (2000). A global geometric framework for nonlinear dimensionality reduction, Science 290(5500): 2319-2323. 
  44. van der Maaten, L.J.P. (2007). An introduction to dimensionality reduction using MATLAB, Technical Report MICC 07-07, Maastricht University, Maastricht. 
  45. Varini, C., Nattkemper, T. W., Degenhard, A. and Wismuller, A. (2004). Breast MRI data analysis by LLE, Proceedings of the 2004 IEEE International Joint Conference on Neural Networks, Montreal, Canada, Vol. 3, pp. 2449-2454. 
  46. Verveer, P. and Duin, R. (1995). An evaluation of intrinsic dimensionality estimators, IEEE Transactions on Pattern Analysis and Machine Intelligence 17(1): 81-86. 
  47. Weinberger, K.Q. and Saul, L.K. (2006). Unsupervised learning of image manifolds by semidefinite programming, International Journal of Computer Vision 70(1): 77-90. 
  48. Yang, M.-H. (2002). Face recognition using extended isomap, IEEE International Conference on Image Processing, ICIP(2), Rochester, NY, USA, pp. 117-120. 
  49. Yata, K. and Aoshima, M. (2010). Intrinsic dimensionality estimation of high-dimension, low sample size data with d-asymptotics, Communications in Statistics-Theory and Methods 39(8-9): 1511-1521. Zbl1318.62204
  50. Zhang, J., Li, S.Z. and Wang, J. (2004). Nearest manifold approach for face recognition, 6th IEEE International Conference on Automatic Face and Gesture Recognition, Seoul, South Korea, pp. 223-228. 
  51. Zhang, Z. and Zha, H. (2004). Principal manifolds and nonlinear dimensionality reduction via local tangent space alignment, SIAM Journal of Scientific Computing 26(1): 313-338. Zbl1077.65042

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.