# Optimization of the maximum likelihood estimator for determining the intrinsic dimensionality of high-dimensional data

Rasa Karbauskaitė; Gintautas Dzemyda

International Journal of Applied Mathematics and Computer Science (2015)

- Volume: 25, Issue: 4, page 895-913
- ISSN: 1641-876X

## Access Full Article

top## Abstract

top## How to cite

topRasa Karbauskaitė, and Gintautas Dzemyda. "Optimization of the maximum likelihood estimator for determining the intrinsic dimensionality of high-dimensional data." International Journal of Applied Mathematics and Computer Science 25.4 (2015): 895-913. <http://eudml.org/doc/275939>.

@article{RasaKarbauskaitė2015,

abstract = {One of the problems in the analysis of the set of images of a moving object is to evaluate the degree of freedom of motion and the angle of rotation. Here the intrinsic dimensionality of multidimensional data, characterizing the set of images, can be used. Usually, the image may be represented by a high-dimensional point whose dimensionality depends on the number of pixels in the image. The knowledge of the intrinsic dimensionality of a data set is very useful information in exploratory data analysis, because it is possible to reduce the dimensionality of the data without losing much information. In this paper, the maximum likelihood estimator (MLE) of the intrinsic dimensionality is explored experimentally. In contrast to the previous works, the radius of a hypersphere, which covers neighbours of the analysed points, is fixed instead of the number of the nearest neighbours in the MLE. A way of choosing the radius in this method is proposed. We explore which metric-Euclidean or geodesic-must be evaluated in the MLE algorithm in order to get the true estimate of the intrinsic dimensionality. The MLE method is examined using a number of artificial and real (images) data sets.},

author = {Rasa Karbauskaitė, Gintautas Dzemyda},

journal = {International Journal of Applied Mathematics and Computer Science},

keywords = {multidimensional data; intrinsic dimensionality; maximum likelihood estimator; manifold learning methods; image understanding; dimensionality reduction; manifold learning; multidimensional data visualization; locally linear embedding; topology preservation},

language = {eng},

number = {4},

pages = {895-913},

title = {Optimization of the maximum likelihood estimator for determining the intrinsic dimensionality of high-dimensional data},

url = {http://eudml.org/doc/275939},

volume = {25},

year = {2015},

}

TY - JOUR

AU - Rasa Karbauskaitė

AU - Gintautas Dzemyda

TI - Optimization of the maximum likelihood estimator for determining the intrinsic dimensionality of high-dimensional data

JO - International Journal of Applied Mathematics and Computer Science

PY - 2015

VL - 25

IS - 4

SP - 895

EP - 913

AB - One of the problems in the analysis of the set of images of a moving object is to evaluate the degree of freedom of motion and the angle of rotation. Here the intrinsic dimensionality of multidimensional data, characterizing the set of images, can be used. Usually, the image may be represented by a high-dimensional point whose dimensionality depends on the number of pixels in the image. The knowledge of the intrinsic dimensionality of a data set is very useful information in exploratory data analysis, because it is possible to reduce the dimensionality of the data without losing much information. In this paper, the maximum likelihood estimator (MLE) of the intrinsic dimensionality is explored experimentally. In contrast to the previous works, the radius of a hypersphere, which covers neighbours of the analysed points, is fixed instead of the number of the nearest neighbours in the MLE. A way of choosing the radius in this method is proposed. We explore which metric-Euclidean or geodesic-must be evaluated in the MLE algorithm in order to get the true estimate of the intrinsic dimensionality. The MLE method is examined using a number of artificial and real (images) data sets.

LA - eng

KW - multidimensional data; intrinsic dimensionality; maximum likelihood estimator; manifold learning methods; image understanding; dimensionality reduction; manifold learning; multidimensional data visualization; locally linear embedding; topology preservation

UR - http://eudml.org/doc/275939

ER -

## References

top- Álvarez-Meza, A.M., Valencia-Aguirre, J., Daza-Santacoloma, G. and Castellanos-Domínguez, G. (2011). Global and local choice of the number of nearest neighbors in locally linear embedding, Pattern Recognition Letters 32(16): 2171-2177.
- Belkin, M. and Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation, Neural Computation 15(6): 1373-1396. Zbl1085.68119
- Brand, M. (2003). Charting a manifold, in S. Becker, S. Thrun and K. Obermayer (Eds.), Advances in Neural Information Processing Systems 15, MIT Press, Cambridge, MA, pp. 961-968.
- Camastra, F. (2003). Data dimensionality estimation methods: A survey, Pattern Recognition 36(12): 2945-2954. Zbl1059.68100
- Carter, K.M., Raich, R. and Hero, A.O. (2010). On local intrinsic dimension estimation and its applications, IEEE Transactions on Signal Processing 58(2): 650-663.
- Chang, Y., Hu, C. and Turk, M. (2004). Probabilistic expression analysis on manifolds, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR(2), Washington, DC, USA, pp. 520-527.
- Costa, J.A. and Hero, A.O. (2004). Geodesic entropic graphs for dimension and entropy estimation in manifold learning, IEEE Transactions on Signal Processing 52(8): 2210-2221.
- Costa, J.A. and Hero, A.O. (2005). Estimating local intrinsic dimension with k-nearest neighbor graphs, IEEE Transactions on Statistical Signal Processing 30(23): 1432-1436.
- Donoho, D.L. and Grimes, C. (2005). Hessian eigenmaps: New locally linear embedding techniques for high-dimensional data, Proceedings of the National Academy of Sciences 102(21): 7426-7431.
- Dzemyda, G., Kurasova, O. and Žilinskas, J. (2013). Multidimensional Data Visualization: Methods and Applications, Optimization and Its Applications, Vol. 75, Springer-Verlag, New York, NY. Zbl06062044
- Einbeck, J. and Kalantan, Z. (2013). Intrinsic dimensionality estimation for high-dimensional data sets: New approaches for the computation of correlation dimension, Journal of Emerging Technologies in Web Intelligence 5(2): 91-97.
- Elgammal, A. and su Lee, C. (2004a). Inferring 3d body pose from silhouettes using activity manifold learning, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR(2), Washington, DC, USA, pp. 681-688.
- Elgammal, A. and su Lee, C. (2004b). Separating style and content on a nonlinear manifold, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR(1), Washington, DC, USA, pp. 478-485.
- Fan, M., Zhang, X., Chen, S., Bao, H. and Maybank, S.J. (2013). Dimension estimation of image manifolds by minimal cover approximation, Neurocomputing 105: 19-29.
- Fukunaga, K. (1982). Intrinsic dimensionality extraction, in P. Krishnaiah and L. Kanal (Eds.), Classification, Pattern Recognition and Reduction of Dimensionality, Handbook of Statistics, Vol. 2, North-Holland, Amsterdam, pp. 347-362.
- Fukunaga, K. and Olsen, D. (1971). An algorithm for finding intrinsic dimensionality of data, IEEE Transactions on Computers 20(2): 176-183. Zbl0216.50201
- Gong, S., Cristani, M., Yan, S. and Loy, C.C. (Eds.) (2014). Person Re-Identification, Advances in Computer Vision and Pattern Recognition, Vol. XVIII, Springer, London. Zbl1282.68066
- Grassberger, P. and Procaccia, I. (1983). Measuring the strangeness of strange attractors, Physica D: Nonlinear Phenomena 9(1-2): 189-208. Zbl0593.58024
- Hadid, A., Kouropteva, O. and Pietikäinen, M. (2002). Unsupervised learning using locally linear embedding: experiments with face pose analysis, 16th International Conference on Pattern Recognition, ICPR'02(1), Quebec City, Quebec, Canada, pp. 111-114.
- He, J., Ding, L., Jiang, L., Li, Z. and Hu, Q. (2014). Intrinsic dimensionality estimation based on manifold assumption, Journal of Visual Communication and Image Representation 25(5): 740-747.
- Hein, M. and Audibert, J. (2005). Intrinsic dimensionality estimation of submanifolds in rd , Machine Learning: Proceedings of the 22nd International Conference (ICML 2005), Bonn, Germany, pp. 289-296.
- Jenkins, O.C. and Mataric, M.J. (2004). A spatio-temporal extension to isomap nonlinear dimension reduction, 21st International Conference on Machine Learning, ICML(69), Banff, Alberta, Canada, pp. 441-448.
- Karbauskaitė, R. and Dzemyda, G. (2009). Topology preservation measures in the visualization of manifold-type multidimensional data, Informatica 20(2): 235-254. Zbl1180.68217
- Karbauskaitė, R. and Dzemyda, G. (2014). Geodesic distances in the intrinsic dimensionality estimation using packing numbers, Nonlinear Analysis: Modelling and Control 19(4): 578-591.
- Karbauskaitė, R., Dzemyda, G. and Marcinkeviˇcius, V. (2008). Selecting a regularization parameter in the locally linear embedding algorithm, 20th International EURO Mini Conference on Continuous Optimization and Knowledgebased Technologies (EurOPT2008), Neringa, Lithuania, pp. 59-64.
- Karbauskaitė, R., Dzemyda, G. and Marcinkeviˇcius, V. (2010). Dependence of locally linear embedding on the regularization parameter, An Official Journal of the Spanish Society of Statistics and Operations Research 18(2): 354-376. Zbl1273.62139
- Karbauskaitė, R., Dzemyda, G. and Mazėtis, E. (2011). Geodesic distances in the maximum likelihood estimator of intrinsic dimensionality, Nonlinear Analysis: Modelling and Control 16(4): 387-402. Zbl1271.93148
- Karbauskaitė, R., Kurasova, O. and Dzemyda, G. (2007). Selection of the number of neighbours of each data point for the locally linear embedding algorithm, Information Technology and Control 36(4): 359-364.
- Kégl, B. (2003). Intrinsic dimension estimation using packing numbers, Advances in Neural Information Processing Systems, NIPS(15), Cambridge, MA, USA, pp. 697-704.
- Kouropteva, O., Okun, O. and Pietikäinen, M. (2002). Selection of the optimal parameter value for the locally linear embedding algorithm, 1st International Conference on Fuzzy Systems and Knowledge Discovery, FSKD(1), Singapore, pp. 359-363.
- Kulczycki, P. and Łukasik, S. (2014). An algorithm for reducing the dimension and size of a sample for data exploration procedures, International Journal of Applied Mathematics and Computer Science 24(1): 133-149, DOI: 10.2478/amcs-2014-0011. Zbl1292.93044
- Lee, J.A. and Verleysen, M. (2007). Nonlinear Dimensionality Reduction, Springer, New York, NY. Zbl1128.68024
- Levina, E. and Bickel, P.J. (2005). Maximum likelihood estimation of intrinsic dimension, in L.K. Saul, Y. Weiss and L. Bottou (Eds.), Advances in Neural Information Processing Systems 17, MIT Press, Cambridge, MA, pp. 777-784.
- Levina, E., Wagaman, A.S., Callender, A.F., Mandair, G.S. and Morris, M.D. (2007). Estimating the number of pure chemical components in a mixture by maximum likelihood, Journal of Chemometrics 21(1-2): 24-34.
- Li, S. Z., Xiao, R., Li, Z. and Zhang, H. (2001). Nonlinear mapping from multi-view face patterns to a Gaussian distribution in a low dimensional space, IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems (RATFG-RTS), Vancouver, BC, Canada, pp. 47-54.
- Mo, D. and Huang, S.H. (2012). Fractal-based intrinsic dimension estimation and its application in dimensionality reduction, IEEE Transactions on Knowledge and Data Engineering 24(1): 59-71.
- Nene, S.A., Nayar, S.K. and Murase, H. (1996). Columbia object image library (COIL-20), Technical Report CUCS-005-96, Columbia University, New York, NY.
- Niskanen, M. and Silven, O. (2003). Comparison of dimensionality reduction methods for wood surface inspection, 6th International Conference on Quality Control by Artificial Vision, QCAV(5132), Gatlinburg, TN, USA, pp. 178-188.
- Qiao, M.F.H. and Zhang, B. (2009). Intrinsic dimension estimation of manifolds by incising balls, Pattern Recognition 42(5): 780-787. Zbl1162.68405
- Roweis, S.T. and Saul, L.K. (2000). Nonlinear dimensionality reduction by locally linear embedding, Science 290(5500): 2323-2326.
- Saul, L.K. and Roweis, S.T. (2003). Think globally, fit locally: Unsupervised learning of low dimensional manifolds, Journal of Machine Learning Research 4: 119-155. Zbl1093.68089
- Shin, Y.J. and Park, C.H. (2011). Analysis of correlation based dimension reduction methods, International Journal of Applied Mathematics and Computer Science 21(3): 549-558, DOI: 10.2478/v10006-011-0043-9. Zbl1230.68173
- Tenenbaum, J.B., de Silva, V. and Langford, J.C. (2000). A global geometric framework for nonlinear dimensionality reduction, Science 290(5500): 2319-2323.
- van der Maaten, L.J.P. (2007). An introduction to dimensionality reduction using MATLAB, Technical Report MICC 07-07, Maastricht University, Maastricht.
- Varini, C., Nattkemper, T. W., Degenhard, A. and Wismuller, A. (2004). Breast MRI data analysis by LLE, Proceedings of the 2004 IEEE International Joint Conference on Neural Networks, Montreal, Canada, Vol. 3, pp. 2449-2454.
- Verveer, P. and Duin, R. (1995). An evaluation of intrinsic dimensionality estimators, IEEE Transactions on Pattern Analysis and Machine Intelligence 17(1): 81-86.
- Weinberger, K.Q. and Saul, L.K. (2006). Unsupervised learning of image manifolds by semidefinite programming, International Journal of Computer Vision 70(1): 77-90.
- Yang, M.-H. (2002). Face recognition using extended isomap, IEEE International Conference on Image Processing, ICIP(2), Rochester, NY, USA, pp. 117-120.
- Yata, K. and Aoshima, M. (2010). Intrinsic dimensionality estimation of high-dimension, low sample size data with d-asymptotics, Communications in Statistics-Theory and Methods 39(8-9): 1511-1521. Zbl1318.62204
- Zhang, J., Li, S.Z. and Wang, J. (2004). Nearest manifold approach for face recognition, 6th IEEE International Conference on Automatic Face and Gesture Recognition, Seoul, South Korea, pp. 223-228.
- Zhang, Z. and Zha, H. (2004). Principal manifolds and nonlinear dimensionality reduction via local tangent space alignment, SIAM Journal of Scientific Computing 26(1): 313-338. Zbl1077.65042