Latent Semantic Indexing for patent documents

Andreea Moldovan; Radu Boţ; Gert Wanka

International Journal of Applied Mathematics and Computer Science (2005)

  • Volume: 15, Issue: 4, page 551-560
  • ISSN: 1641-876X

Abstract

top
Since the huge database of patent documents is continuously increasing, the issue of classifying, updating and retrieving patent documents turned into an acute necessity. Therefore, we investigate the efficiency of applying Latent Semantic Indexing, an automatic indexing method of information retrieval, to some classes of patent documents from the United States Patent Classification System. We present some experiments that provide the optimal number of dimensions for the Latent Semantic Space and we compare the performance of Latent Semantic Indexing (LSI) to the Vector Space Model (VSM) technique applied to real life text documents, namely, patent documents. However, we do not strongly recommend the LSI as an improved alternative method to the VSM, since the results are not significantly better.

How to cite

top

Moldovan, Andreea, Boţ, Radu, and Wanka, Gert. "Latent Semantic Indexing for patent documents." International Journal of Applied Mathematics and Computer Science 15.4 (2005): 551-560. <http://eudml.org/doc/207766>.

@article{Moldovan2005,
abstract = {Since the huge database of patent documents is continuously increasing, the issue of classifying, updating and retrieving patent documents turned into an acute necessity. Therefore, we investigate the efficiency of applying Latent Semantic Indexing, an automatic indexing method of information retrieval, to some classes of patent documents from the United States Patent Classification System. We present some experiments that provide the optimal number of dimensions for the Latent Semantic Space and we compare the performance of Latent Semantic Indexing (LSI) to the Vector Space Model (VSM) technique applied to real life text documents, namely, patent documents. However, we do not strongly recommend the LSI as an improved alternative method to the VSM, since the results are not significantly better.},
author = {Moldovan, Andreea, Boţ, Radu, Wanka, Gert},
journal = {International Journal of Applied Mathematics and Computer Science},
keywords = {patent classification; Latent Semantic Indexing (LSI); Vector Space Model (VSM); Singular Value Decomposition (SVD); vector space model},
language = {eng},
number = {4},
pages = {551-560},
title = {Latent Semantic Indexing for patent documents},
url = {http://eudml.org/doc/207766},
volume = {15},
year = {2005},
}

TY - JOUR
AU - Moldovan, Andreea
AU - Boţ, Radu
AU - Wanka, Gert
TI - Latent Semantic Indexing for patent documents
JO - International Journal of Applied Mathematics and Computer Science
PY - 2005
VL - 15
IS - 4
SP - 551
EP - 560
AB - Since the huge database of patent documents is continuously increasing, the issue of classifying, updating and retrieving patent documents turned into an acute necessity. Therefore, we investigate the efficiency of applying Latent Semantic Indexing, an automatic indexing method of information retrieval, to some classes of patent documents from the United States Patent Classification System. We present some experiments that provide the optimal number of dimensions for the Latent Semantic Space and we compare the performance of Latent Semantic Indexing (LSI) to the Vector Space Model (VSM) technique applied to real life text documents, namely, patent documents. However, we do not strongly recommend the LSI as an improved alternative method to the VSM, since the results are not significantly better.
LA - eng
KW - patent classification; Latent Semantic Indexing (LSI); Vector Space Model (VSM); Singular Value Decomposition (SVD); vector space model
UR - http://eudml.org/doc/207766
ER -

References

top
  1. ARPACK (2005): Arnoldi package. - Available at: http://www.ime.unicamp.br/~chico/arpack++ 
  2. Bartell B.T., Cotrell G.W. and Belew R.K. (1992): Latent semantic indexing is an optimal special case ofmultidimensional scaling. - Proc. ACMSIGIR'92 Conf., Copenhagen, Denmark, pp.161-167. 
  3. Berry M.W., Dumais S.T. and O'Brien G.W. (1995): Using linear algebra for intelligent information retrieval. -SIAM Rev., Vol. 37, No. 4, pp. 573-595. Zbl0842.68026
  4. Deerwester S., Dumais S.T., Furnas G.W., Landauer T.K. and Harshman R. (1990): Indexing by latent semantic analysis. - J. Amer. Soc. Inf. Sci.,Vol. 41, No. 6, pp. 391-407. 
  5. Ding C.H.Q. (1999): A similarity-based probability model for latent semantic indexing. - Proc. 22nd ACM/SIGIR Conf., Berkley, CA, pp. 58-65. 
  6. Dumais S.T. (1991): Improving the retrieval of information from external sources. - Behav. Res. Meth. Instrum. Comput., Vol. 23, No. 2, pp. 229-236. 
  7. Dumais S.T. (1995): Using LSI for information filtering: TREC-3 experiments. - Proc. 3rd Text REtrieval Conf., TREC3, Gaithersburg, MD, pp. 219-230. 
  8. Fuhr N. (1989): Models for retrieval with probabilistic indexing. - Inf. Process. Manag., Vol. 25, No. 1, pp. 55-72. 
  9. Fuhr N. (1992): Probabilistic models in information retrieval. - Comput. J., Vol. 35, No. 3, pp. 243-255. Zbl0757.68046
  10. Hull D. (1994): Improving text retrieval forthe routing problem using latent semantic indexing. - Proc. 17th ACM/SIGIR Conf., Dublin, Ireland, pp. 282-290. 
  11. Hull D. (1996): Stemming algorithms: A case study for detailed evaluation. - J. Amer. Soc. Inform. Sci., Vol. 47, No. 1, pp. 70-84. 
  12. Jessup E.R. and Martin J.H. (2001): Taking a new look at the latent semantic analysis approach to information retrieval. - Proc. SIAM Workshop Computational Information Retrieval, Raleigh, NC, pp. 121-144. Zbl0995.68044
  13. Kolda T.G. and O'Leary D.P. (1998): A semidiscrete matrix decomposition for latent semantic indexing information retrieval. - ACM Trans. Inf. Syst. (TOIS), Vol. 16, No. 4, pp. 322-346. 
  14. Landauer T.K., Foltz P. and Laham D. (1998): Introduction to latent semantic analysis. - Discourse Processes, Vol. 25, pp. 259-284. 
  15. MED (2005): Medlon collection. - Available at: ftp://ftp.cs.cornell.edu/pub/smart/med 
  16. Papadimitriou C.H., Raghavan P., Tamaki H. and Vempala S.(1998): Latent semantic indexing: A probabilistic analysis. -Proc. Symp. Principles of Database Systems, PODS, Seattle, Washington, pp. 150-168. Zbl0963.68063
  17. PorterStemmer (2005): The Porter stemming algorithm. - Available at: http://www.tartarus.org /~martin/PorterStemmer 
  18. Salton G. (1971): The SMART Retrieval System: Experiments in Automatic Document Processing. - Englewood Cliffs, NJ: Prentice Hall. 
  19. Schutze H. (1992): Dimensions of meaning. - Proc. Conf. Supercomputing '92, Minneapolis, MN, pp. 787-796. 
  20. Schutze, H. (1998): Automatic word sense discrimination. - Comput. Linguist., Vol. 24, No. 1, pp. 97-124. 
  21. SMART (2005): SMART's English stoplist. - Available at: ftp://ftp.cs.cornell.edu/pub/smart/english.stop 
  22. TIME (2005): Time magazine collection. - Available at: ftp://ftp.cs.cornell.edu/pub/smart/time 
  23. Story R.E. (1996): An explanation of the effectiveness of Latent Semantic Indexing by means of a Bayesian regression model. - Inf. Process. Manag., Vol. 32, No. 3, pp. 329-344. 
  24. UPSTO (2005): United States Patent and Trademark Office. - Available at: http://www.uspto.gov 
  25. Zha H., Marques O. and Simon H. (1998): A subspace-based model for information retrieval with applications in latent semantic indexing. - Proc. Conf. Irregular '98, Barkeley, CA, pp. 29-42. 

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.