Latent Semantic Indexing using eigenvalue analysis for efficient information retrieval

Cherukuri Kumar; Suripeddi Srinivas

International Journal of Applied Mathematics and Computer Science (2006)

  • Volume: 16, Issue: 4, page 551-558
  • ISSN: 1641-876X

Abstract

top
Text retrieval using Latent Semantic Indexing (LSI) with truncated Singular Value Decomposition (SVD) has been intensively studied in recent years. However, the expensive complexity involved in computing truncated SVD constitutes a major drawback of the LSI method. In this paper, we demonstrate how matrix rank approximation can influence the effectiveness of information retrieval systems. Besides, we present an implementation of the LSI method based on an eigenvalue analysis for rank approximation without computing truncated SVD, along with its computational details. Significant improvements in computational time while maintaining retrieval accuracy are observed over the tested document collections.

How to cite

top

Kumar, Cherukuri, and Srinivas, Suripeddi. "Latent Semantic Indexing using eigenvalue analysis for efficient information retrieval." International Journal of Applied Mathematics and Computer Science 16.4 (2006): 551-558. <http://eudml.org/doc/207813>.

@article{Kumar2006,
abstract = {Text retrieval using Latent Semantic Indexing (LSI) with truncated Singular Value Decomposition (SVD) has been intensively studied in recent years. However, the expensive complexity involved in computing truncated SVD constitutes a major drawback of the LSI method. In this paper, we demonstrate how matrix rank approximation can influence the effectiveness of information retrieval systems. Besides, we present an implementation of the LSI method based on an eigenvalue analysis for rank approximation without computing truncated SVD, along with its computational details. Significant improvements in computational time while maintaining retrieval accuracy are observed over the tested document collections.},
author = {Kumar, Cherukuri, Srinivas, Suripeddi},
journal = {International Journal of Applied Mathematics and Computer Science},
keywords = {eigenvalues; latent semantic indexing; vector space method; rank reduction; information retrieval; singular value decomposition},
language = {eng},
number = {4},
pages = {551-558},
title = {Latent Semantic Indexing using eigenvalue analysis for efficient information retrieval},
url = {http://eudml.org/doc/207813},
volume = {16},
year = {2006},
}

TY - JOUR
AU - Kumar, Cherukuri
AU - Srinivas, Suripeddi
TI - Latent Semantic Indexing using eigenvalue analysis for efficient information retrieval
JO - International Journal of Applied Mathematics and Computer Science
PY - 2006
VL - 16
IS - 4
SP - 551
EP - 558
AB - Text retrieval using Latent Semantic Indexing (LSI) with truncated Singular Value Decomposition (SVD) has been intensively studied in recent years. However, the expensive complexity involved in computing truncated SVD constitutes a major drawback of the LSI method. In this paper, we demonstrate how matrix rank approximation can influence the effectiveness of information retrieval systems. Besides, we present an implementation of the LSI method based on an eigenvalue analysis for rank approximation without computing truncated SVD, along with its computational details. Significant improvements in computational time while maintaining retrieval accuracy are observed over the tested document collections.
LA - eng
KW - eigenvalues; latent semantic indexing; vector space method; rank reduction; information retrieval; singular value decomposition
UR - http://eudml.org/doc/207813
ER -

References

top
  1. April K. and Pottenger W.M. (2006): A framework for understanding latent semantic indexing performance. - J. Inf. Process. Manag., Vol. 42, No. 1, pp. 56-73. 
  2. Aswani Kumar Ch., Gupta A., Batool M. and Trehan S. (2005): An information retrieval model based on latent semantic indexing with intelligent preprocessing. - J. Inf. Knowl. Manag., Vol. 4, No. 4, pp. 1-7. 
  3. Aswani Kumar Ch. and Srinivas S. (2006): On the effect of rank approximation on information retrieval. - Proc. Int. Conf. s Systemics Cybernetics and Informatics, Hyderabad, India, pp. 876-880. 
  4. Balinski J. and Danilowicz C. (2005): Ranking method based on inter document distances. - Inf. Process. Manag., Vol. 41, No. 4,pp. 759-775. Zbl1080.68583
  5. Bass D. and Behrens C. (2003): Distributed LSI: Scalable concept based information retrieval with high semantic resolution. - Proc. 2003 Text Mining Workshop, San Francisco, CA, USA, pp. 72-82. 
  6. Bast H. and Weber I. (2005): Insights from viewing ranked retrieval as rank aggregation. - Proc. Workshop s Challenges in Web Information Retrieval and Integration, WIRI05, Tokyo, Japan, pp. 232-239. 
  7. Berry M.W. (1992): Large scale singular value computations. - Int. J. Super Comput. Appli., Vol. 6, No. 1, pp. 13-49. 
  8. Berry M.W. and Dumasis S.T. (1995): Using linear algebra for intelligent information retrieval. - SIAM Rev., Vol. 37, No. 4, pp. 573-995. 
  9. Berry M.W., Drmac Z. and Jessup E.R. (1999): Matrices, vector spaces, and information retrieval. - SIAM Rev., Vol. 41, No. 2, pp. 335-362. Zbl0924.68069
  10. Berry M.W. and Shakhina A.P. (2005): Computing sparse reduced-rank approximation to sparse matrices. - ACM Trans. Math. Software, 2005, Vol. 31, No. 2, pp. 252-269. Zbl1070.65539
  11. Cheng X.Z. and Lafferty J.(2006): A risk minimization framework for information retrieval. - Inf. Process. Manag., Vol. 42, No. 1, pp. 31-55. Zbl1101.68544
  12. Deerwester S. (1990): Indexing by latent semantic analysis. - J. Ameri. Soci. Inf. Sci., Vol. 41, No. 6, pp. 391-407. 
  13. Fan J., Ravi K., Littman M.L. and Santosh V. (1999): Efficient singular value decomposition via document samplings. - Tech. Rep. CS-1999-5, Dept. Computer Science, Duke University, North Carolina. 
  14. Gao J. and Zhang J. (2003): Sparsification strategies in latent semantic indexing. - Proc. 2003 Mining Workshop, San Francisco, CA, USA, pp. 93-103. 
  15. Gao J. and Zhang J. (2005): Clustered SVD strategies in latent semantic indexing. - Inf. Process. Manag., Vol. 41, No. 5, pp. 1051-1063. Zbl1101.68538
  16. Golub G.H. and Van Loan C.F. (1996): Matrix Computations. -Baltimore: The John Hopkins University Press. 
  17. Hua Y. (2000): Searching beyond SVD for rank reduction. - Proc. IEEE Workshop s Sensor Array and Multi Channel Signal Processing, Cambridge, MA, USA, pp. 395-397. 
  18. Husbands P., Simon H. and Ding C. (2001): On the use of singular value decomposition for text retrieval. - SIAM Comput. Inf. Retrieval, pp. 145-156. Zbl0995.68045
  19. Husbands P. and Ding C. (2005): Term norm distribution and its effects on latent semantic indexing. - Inf. Process. Manag., Vol. 41, No. 4, pp. 777-787. 
  20. Kontostathis A. and Pottenger W.M. (2002a): Detecting patterns in the latent semantic indexing term-term matrix. - Proc. Workshop s Foundations of Data Mining and Discovery, IEEE Int. Conf. Data Mining, (ICDM 02), Maeybaski, Japan. 
  21. Kontostathis A. and Pottenger W.M. (2002b): A mathematical view of latent semantic indexing: Tracing term co-occurrences. - Tech. Report, LU-CSE-02-006, Lehigh University. 
  22. Landauer T.K., Foltz P.W. and Laham D. (1998): Introduction to latent semantic analysis. - Discourse Processes, Vol. 25, pp. 259-284. 
  23. Park H. and Elden L. (2003): Matrix rank reduction for data analysis and feature extraction. - Tech. Rep., Dept. Computer Science and Engineering, University of Minnesota. 
  24. Praks P., Dvorsky J., Snasel V. and Cernohorsky J.D. (2003): On SVD free latent semantic indexing for image retrieval for application in a hard real time environment. - Proc. IEEE Int. Conf.Industrial Technology, Maribor, Slovenia, pp. 466-471. 
  25. Schisler M.L. (2004): Latent semantic indexing: using eigenvalues and the singular value decomposition in lexicographical search techniques. - Tech. Rep., University of Missouri. 
  26. Tang J.T. (2003): Application of principle direction divisive partitioning and SVD in information retrieval. - Masters Proj. Rep., Dept. Computer Science, University of Kentucky, Lexington, KY. 
  27. Yates R.B. and Neto B.R. (1999): Modern Information Retrieval. -New Delhi: Pearson Education. 
  28. Ye Y.Q. (2000): Comparing matrix methods in text-based information retrieval. - Tech. Rep., School of Mathematical Sciences, Peking University. 
  29. Zang Z., Zha H. and Simon H. (2002): Low rank approximations with sparse factors: Basic algorithms and error analysis. - SIAM J. Matrix Anal. Appl., Vol. 23, No. 3, pp. 706-727. Zbl1003.65041

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.