Multidimensional term indexing for efficient processing of complex queries

Michal Krátký; Tomáš Skopal; Václav Snášel

Kybernetika (2004)

  • Volume: 40, Issue: 3, page [381]-396
  • ISSN: 0023-5954

Abstract

top
The area of Information Retrieval deals with problems of storage and retrieval within a huge collection of text documents. In IR models, the semantics of a document is usually characterized using a set of terms. A common need to various IR models is an efficient term retrieval provided via a term index. Existing approaches of term indexing, e. g. the inverted list, support efficiently only simple queries asking for a term occurrence. In practice, we would like to exploit some more sophisticated querying mechanisms, in particular queries based on regular expressions. In this article we propose a multidimensional approach of term indexing providing efficient term retrieval and supporting regular expression queries. Since the term lengths are usually different, we also introduce an improvement based on a new data structure, called BUB-forest, providing even more efficient term retrieval.

How to cite

top

Krátký, Michal, Skopal, Tomáš, and Snášel, Václav. "Multidimensional term indexing for efficient processing of complex queries." Kybernetika 40.3 (2004): [381]-396. <http://eudml.org/doc/33707>.

@article{Krátký2004,
abstract = {The area of Information Retrieval deals with problems of storage and retrieval within a huge collection of text documents. In IR models, the semantics of a document is usually characterized using a set of terms. A common need to various IR models is an efficient term retrieval provided via a term index. Existing approaches of term indexing, e. g. the inverted list, support efficiently only simple queries asking for a term occurrence. In practice, we would like to exploit some more sophisticated querying mechanisms, in particular queries based on regular expressions. In this article we propose a multidimensional approach of term indexing providing efficient term retrieval and supporting regular expression queries. Since the term lengths are usually different, we also introduce an improvement based on a new data structure, called BUB-forest, providing even more efficient term retrieval.},
author = {Krátký, Michal, Skopal, Tomáš, Snášel, Václav},
journal = {Kybernetika},
keywords = {term indexing; complex queries; multidimensional data structures; BUB-forest; term indexing; complex queries; multidimensional data structure; BUB-forest},
language = {eng},
number = {3},
pages = {[381]-396},
publisher = {Institute of Information Theory and Automation AS CR},
title = {Multidimensional term indexing for efficient processing of complex queries},
url = {http://eudml.org/doc/33707},
volume = {40},
year = {2004},
}

TY - JOUR
AU - Krátký, Michal
AU - Skopal, Tomáš
AU - Snášel, Václav
TI - Multidimensional term indexing for efficient processing of complex queries
JO - Kybernetika
PY - 2004
PB - Institute of Information Theory and Automation AS CR
VL - 40
IS - 3
SP - [381]
EP - 396
AB - The area of Information Retrieval deals with problems of storage and retrieval within a huge collection of text documents. In IR models, the semantics of a document is usually characterized using a set of terms. A common need to various IR models is an efficient term retrieval provided via a term index. Existing approaches of term indexing, e. g. the inverted list, support efficiently only simple queries asking for a term occurrence. In practice, we would like to exploit some more sophisticated querying mechanisms, in particular queries based on regular expressions. In this article we propose a multidimensional approach of term indexing providing efficient term retrieval and supporting regular expression queries. Since the term lengths are usually different, we also introduce an improvement based on a new data structure, called BUB-forest, providing even more efficient term retrieval.
LA - eng
KW - term indexing; complex queries; multidimensional data structures; BUB-forest; term indexing; complex queries; multidimensional data structure; BUB-forest
UR - http://eudml.org/doc/33707
ER -

References

top
  1. Baeza-Yates R., Ribeiro-Neto B., Modern Information Retrieval, Addison Wesley, New York 1999 
  2. Bayer R., The iniversal B-tree for multidimensional indexing: General concepts, In: Proc. World-Wide Computing and its Applications’97, WWCA’97, Tsukuba 1997 
  3. Beckmann N., Kriegel H., Schneider, R., Seeger B., The R*-tree: An effcient and robust access method for points and rectangles, In: Proc. 1990 ACM SIGMOD Internat. Conference on Management of Data, Atlantic City, NJ 1990, pp. 322–331 (1990) 
  4. Böhm C., Berchtold, S., Keim D. A., Searching in high-dimensional spaces – Index structures for improving the performance of multimedia databases, ACM, 2002 
  5. Crochemore M., Rytter W., Text Algorithms, Oxford University Press, Oxford 1994 Zbl1078.68151MR1307378
  6. Deppisch U., S-tree: a dynamic balanced signature index for office retrieval, In: Proc. 9th ACM SIGIR Conference, Pisa 1986, pp. 77–87 (1986) 
  7. Dvorský J., Krátký M., Skopal, T., Snášel V., Term indexing in information retrieval systems, In: Proc. CIC’03, CSREA Press, Las Vegas 2003 
  8. Faloutsos C., 10.1109/32.6184, IEEE Trans. Software Engrg. 14 (1988), 10, xxx–xxx (1988) Zbl0659.68125MR0962343DOI10.1109/32.6184
  9. Fenk R., The BUB-Tree, In: Proc. 28rd VLDB Internat. Conference on VLDB. Hongkong 2002 
  10. Fenk R., Markl, V., Bayer R., Improving multidimensional range queries of non rectangular volumes specified by a query box set, In: Proc. Internat. Symposium on Database, Web and Cooperative Systems (DWACOS), Baden-Baden 1999 
  11. Ferragina P., Grossi R., A fully-dynamic data structure for external substring search, In: Proc. ACM Symposium on Theory of Computing, 1995 Zbl0978.68513
  12. Guttman A., R-Trees: A dynamic index structure for spatial searching, In: Proc. ACM SIGMOD 1984, ACM Press, Boston 1984, pp. 47–57 (1984) 
  13. Manolopoulos Y., Theodoridis, Y., Tsotras V., Advanced Database Indexing, Kluwer, Dordrecht 2001 Zbl0943.68049
  14. Markl V., Mistral: Processing Relational Queries using a Multidimensional Access Technique, Ph.D. Thesis. Technical University München 1999,http://mistral.in.tum.de/results/publications/Mar99.pdf (1999) 
  15. NIST, Text REtrieval Conference (TREC), 2003, http://trec.nist.gov/ 
  16. Ramsak F., Markl V., Fenk R., Zirkel M., Elhardt, K., Bayer R., Integrating the UB-tree into a database system kernel, In: Proc. 26th VLDB Internat. Conference, Morgan Kaufmann, Cairo 2000 
  17. Roman S., Advanced Linear Algebra, Springer Verlag, Berlin 1995 Zbl1132.15002
  18. Salton G., McGill M. J., Introduction to Modern Information Retrieval, First edition. McGraw Hill, New York 1983 Zbl0523.68084
  19. Skopal T., Krátký M., Snášel, V., Pokorný J., On Range Queries in Universal B-trees, Technical Report No. ARG-TR-01-2003, Department of Computer Science, VŠB-Technical University of Ostrava 2003,http://www.cs.vsb.cz/arg/techreports/range.pdf 
  20. Stephen G. A., String Searching Algorithms, Lecture Notes Series on Computing, World Scientific, 1998 Zbl0831.68028MR1313914
  21. Wirth N., Algorithms and Data Structures, Prentice Hall, Englewood Cliffs, N. J. 1984 Zbl0910.68053MR0808586
  22. Witten I. H., Moffat, A., Bell T. C., Managing Gigabytes, Compressing and Indexing Documents and Images, Van Nostrand Reinhold, New York 1994 Zbl0821.68051
  23. Consortium, W3, Extensible Markup Language (XML) 1, 0. 1998, http://www.w3.org/ TR/REC-xml (1998) 

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.