Multidimensional term indexing for efficient processing of complex queries
Michal Krátký; Tomáš Skopal; Václav Snášel
Kybernetika (2004)
- Volume: 40, Issue: 3, page [381]-396
- ISSN: 0023-5954
Access Full Article
topAbstract
topHow to cite
topKrátký, Michal, Skopal, Tomáš, and Snášel, Václav. "Multidimensional term indexing for efficient processing of complex queries." Kybernetika 40.3 (2004): [381]-396. <http://eudml.org/doc/33707>.
@article{Krátký2004,
abstract = {The area of Information Retrieval deals with problems of storage and retrieval within a huge collection of text documents. In IR models, the semantics of a document is usually characterized using a set of terms. A common need to various IR models is an efficient term retrieval provided via a term index. Existing approaches of term indexing, e. g. the inverted list, support efficiently only simple queries asking for a term occurrence. In practice, we would like to exploit some more sophisticated querying mechanisms, in particular queries based on regular expressions. In this article we propose a multidimensional approach of term indexing providing efficient term retrieval and supporting regular expression queries. Since the term lengths are usually different, we also introduce an improvement based on a new data structure, called BUB-forest, providing even more efficient term retrieval.},
author = {Krátký, Michal, Skopal, Tomáš, Snášel, Václav},
journal = {Kybernetika},
keywords = {term indexing; complex queries; multidimensional data structures; BUB-forest; term indexing; complex queries; multidimensional data structure; BUB-forest},
language = {eng},
number = {3},
pages = {[381]-396},
publisher = {Institute of Information Theory and Automation AS CR},
title = {Multidimensional term indexing for efficient processing of complex queries},
url = {http://eudml.org/doc/33707},
volume = {40},
year = {2004},
}
TY - JOUR
AU - Krátký, Michal
AU - Skopal, Tomáš
AU - Snášel, Václav
TI - Multidimensional term indexing for efficient processing of complex queries
JO - Kybernetika
PY - 2004
PB - Institute of Information Theory and Automation AS CR
VL - 40
IS - 3
SP - [381]
EP - 396
AB - The area of Information Retrieval deals with problems of storage and retrieval within a huge collection of text documents. In IR models, the semantics of a document is usually characterized using a set of terms. A common need to various IR models is an efficient term retrieval provided via a term index. Existing approaches of term indexing, e. g. the inverted list, support efficiently only simple queries asking for a term occurrence. In practice, we would like to exploit some more sophisticated querying mechanisms, in particular queries based on regular expressions. In this article we propose a multidimensional approach of term indexing providing efficient term retrieval and supporting regular expression queries. Since the term lengths are usually different, we also introduce an improvement based on a new data structure, called BUB-forest, providing even more efficient term retrieval.
LA - eng
KW - term indexing; complex queries; multidimensional data structures; BUB-forest; term indexing; complex queries; multidimensional data structure; BUB-forest
UR - http://eudml.org/doc/33707
ER -
References
top- Baeza-Yates R., Ribeiro-Neto B., Modern Information Retrieval, Addison Wesley, New York 1999
- Bayer R., The iniversal B-tree for multidimensional indexing: General concepts, In: Proc. World-Wide Computing and its Applications’97, WWCA’97, Tsukuba 1997
- Beckmann N., Kriegel H., Schneider, R., Seeger B., The R*-tree: An effcient and robust access method for points and rectangles, In: Proc. 1990 ACM SIGMOD Internat. Conference on Management of Data, Atlantic City, NJ 1990, pp. 322–331 (1990)
- Böhm C., Berchtold, S., Keim D. A., Searching in high-dimensional spaces – Index structures for improving the performance of multimedia databases, ACM, 2002
- Crochemore M., Rytter W., Text Algorithms, Oxford University Press, Oxford 1994 Zbl1078.68151MR1307378
- Deppisch U., S-tree: a dynamic balanced signature index for office retrieval, In: Proc. 9th ACM SIGIR Conference, Pisa 1986, pp. 77–87 (1986)
- Dvorský J., Krátký M., Skopal, T., Snášel V., Term indexing in information retrieval systems, In: Proc. CIC’03, CSREA Press, Las Vegas 2003
- Faloutsos C., 10.1109/32.6184, IEEE Trans. Software Engrg. 14 (1988), 10, xxx–xxx (1988) Zbl0659.68125MR0962343DOI10.1109/32.6184
- Fenk R., The BUB-Tree, In: Proc. 28rd VLDB Internat. Conference on VLDB. Hongkong 2002
- Fenk R., Markl, V., Bayer R., Improving multidimensional range queries of non rectangular volumes specified by a query box set, In: Proc. Internat. Symposium on Database, Web and Cooperative Systems (DWACOS), Baden-Baden 1999
- Ferragina P., Grossi R., A fully-dynamic data structure for external substring search, In: Proc. ACM Symposium on Theory of Computing, 1995 Zbl0978.68513
- Guttman A., R-Trees: A dynamic index structure for spatial searching, In: Proc. ACM SIGMOD 1984, ACM Press, Boston 1984, pp. 47–57 (1984)
- Manolopoulos Y., Theodoridis, Y., Tsotras V., Advanced Database Indexing, Kluwer, Dordrecht 2001 Zbl0943.68049
- Markl V., Mistral: Processing Relational Queries using a Multidimensional Access Technique, Ph.D. Thesis. Technical University München 1999,http://mistral.in.tum.de/results/publications/Mar99.pdf (1999)
- NIST, Text REtrieval Conference (TREC), 2003, http://trec.nist.gov/
- Ramsak F., Markl V., Fenk R., Zirkel M., Elhardt, K., Bayer R., Integrating the UB-tree into a database system kernel, In: Proc. 26th VLDB Internat. Conference, Morgan Kaufmann, Cairo 2000
- Roman S., Advanced Linear Algebra, Springer Verlag, Berlin 1995 Zbl1132.15002
- Salton G., McGill M. J., Introduction to Modern Information Retrieval, First edition. McGraw Hill, New York 1983 Zbl0523.68084
- Skopal T., Krátký M., Snášel, V., Pokorný J., On Range Queries in Universal B-trees, Technical Report No. ARG-TR-01-2003, Department of Computer Science, VŠB-Technical University of Ostrava 2003,http://www.cs.vsb.cz/arg/techreports/range.pdf
- Stephen G. A., String Searching Algorithms, Lecture Notes Series on Computing, World Scientific, 1998 Zbl0831.68028MR1313914
- Wirth N., Algorithms and Data Structures, Prentice Hall, Englewood Cliffs, N. J. 1984 Zbl0910.68053MR0808586
- Witten I. H., Moffat, A., Bell T. C., Managing Gigabytes, Compressing and Indexing Documents and Images, Van Nostrand Reinhold, New York 1994 Zbl0821.68051
- Consortium, W3, Extensible Markup Language (XML) 1, 0. 1998, http://www.w3.org/ TR/REC-xml (1998)
NotesEmbed ?
topTo embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.