Extending Full Text Search Engine for Mathematical Content

Mišutka, Jozef; Galamboš, Leo

  • Towards Digital Mathematics Library. Birmingham, United Kingdom, July 27th, 2008, Publisher: Masaryk University(Brno), page 55-67

Abstract

top
The WWW became the main resource of mathematical knowledge. Currently available full text search engines can be used on these documents but they are deficient in almost all cases. By applying axioms, equal transformations, and by using different notation each formula can be expressed in numerous ways. Most of these documents do not contain semantic information; therefore, precise mathematical interpretation is impossible. On the other hand, semantic information can help to give more precise information. In this work we address these issues and present a new technique how to search for mathematical formulae in real-world mathematical documents, but still offering an extensible level of mathematical awareness. It exploits the advantages of full text search engine and stores each formula not only once but in several generalised representations. Because it is designed as an extension, any full text search engine can adopt it. Based on the proposed theory we developed EgoMath — new mathematical search engine. Experiments with EgoMath over two document sets, containing semantic information, showed that this technique can be used to build a fully-fledged mathematical search engine.

How to cite

top

Mišutka, Jozef, and Galamboš, Leo. "Extending Full Text Search Engine for Mathematical Content." Towards Digital Mathematics Library. Birmingham, United Kingdom, July 27th, 2008. Brno: Masaryk University, 2008. 55-67. <http://eudml.org/doc/220081>.

@inProceedings{Mišutka2008,
abstract = {The WWW became the main resource of mathematical knowledge. Currently available full text search engines can be used on these documents but they are deficient in almost all cases. By applying axioms, equal transformations, and by using different notation each formula can be expressed in numerous ways. Most of these documents do not contain semantic information; therefore, precise mathematical interpretation is impossible. On the other hand, semantic information can help to give more precise information. In this work we address these issues and present a new technique how to search for mathematical formulae in real-world mathematical documents, but still offering an extensible level of mathematical awareness. It exploits the advantages of full text search engine and stores each formula not only once but in several generalised representations. Because it is designed as an extension, any full text search engine can adopt it. Based on the proposed theory we developed EgoMath — new mathematical search engine. Experiments with EgoMath over two document sets, containing semantic information, showed that this technique can be used to build a fully-fledged mathematical search engine.},
author = {Mišutka, Jozef, Galamboš, Leo},
booktitle = {Towards Digital Mathematics Library. Birmingham, United Kingdom, July 27th, 2008},
keywords = {mathematical discourse; EgoMath},
location = {Brno},
pages = {55-67},
publisher = {Masaryk University},
title = {Extending Full Text Search Engine for Mathematical Content},
url = {http://eudml.org/doc/220081},
year = {2008},
}

TY - CLSWK
AU - Mišutka, Jozef
AU - Galamboš, Leo
TI - Extending Full Text Search Engine for Mathematical Content
T2 - Towards Digital Mathematics Library. Birmingham, United Kingdom, July 27th, 2008
PY - 2008
CY - Brno
PB - Masaryk University
SP - 55
EP - 67
AB - The WWW became the main resource of mathematical knowledge. Currently available full text search engines can be used on these documents but they are deficient in almost all cases. By applying axioms, equal transformations, and by using different notation each formula can be expressed in numerous ways. Most of these documents do not contain semantic information; therefore, precise mathematical interpretation is impossible. On the other hand, semantic information can help to give more precise information. In this work we address these issues and present a new technique how to search for mathematical formulae in real-world mathematical documents, but still offering an extensible level of mathematical awareness. It exploits the advantages of full text search engine and stores each formula not only once but in several generalised representations. Because it is designed as an extension, any full text search engine can adopt it. Based on the proposed theory we developed EgoMath — new mathematical search engine. Experiments with EgoMath over two document sets, containing semantic information, showed that this technique can be used to build a fully-fledged mathematical search engine.
KW - mathematical discourse; EgoMath
UR - http://eudml.org/doc/220081
ER -

References

top
  1. Egothor v2 search engine, , http://www.egothor.org. 
  2. Zhao, J., Kan, M., Theng, Y. L., Math Information Retrieval: User Requirements and Prototype Implementation, . To appear in JCDL ’08, Pennsylvania (2008). 
  3. Kohlhase, M., S̨ucan, I. A., A search engine for mathematical formulae, . Proceedings of Artificial Intelligence and Symbolic Computation, AISC ’06, LNAI 4120, Springer Verlag, Germany (2006). (2006) Zbl1156.68306
  4. Miller, B., Youssef, A., Technical aspects of the digital library of mathematical functions, . Annals of Mathematics and Artificial Intelligence, 121–136 (2003). (2003) Zbl1019.65002MR1990417
  5. Miner, R., Munavalli, R., An approach to mathematical search through query formulation and data normalization, . In Towards Mechanized Mathematical Assistants, MKM 2007, 342–355 (2007). (2007) Zbl1202.68130
  6. Libbrecht, P., Melis, E., Methods for access and retrieval of mathematical content in ActiveMath, . Proceedings of ICMS 2006, LNAI 4151, Springer Berlin/Heidelberg, 331–342 (2006). (2006) 
  7. Kohlhase, M., Franke, A., MBase: Representing knowledge and context for the integration of mathematical software systems, . Journal of Symbolic Computation, Special Issue on the Integration of Computer algebra and Deduction Systems, 365–402 (2001). (2001) Zbl0981.68153MR1856848
  8. Asperti, A., Selmi, M., Efficient retrieval of mathematical statements, . In Mathematical Knowledge Management, LNCS 3119, Springer Verlag, 1–4 (2004). (2004) Zbl1108.68582
  9. Asperti, A., Guidi, F., Sacerdoti Coen, C., Tassi, E., Zacchiroli, S., A content based mathematical search engine: Whelp, . Proceedings of the TYPES 2004, LNCS 3839, Springer Verlag, 17–32 (2004). (2004) 
  10. Stuber, J., van den Brand, M., Extracting Mathematical Semantics from LaTeX Documents, . LNCS 2901, Springer, Germany, 160–173 (2003). (2003) 
  11. Mišutka, J., Mathematical search engine, . Master thesis, Faculty of Mathematics and Physics, Charles University in Prague (2007). (2007) 
  12. Miller, B. R., Authoring mathematical knowledge, . In 2 North American Workshop on Mathematical Knowledge Management, Phoenix (2004). http://dlmf.nist.gov/LaTeXML/. (2004) 
  13. Miller, B. R., DLMF, LaTeXML and some lessons learned, . Hot Topic Workshop on The Evolution of Mathematical Communication in the Age of Digital Libraries (2006). (2006) 
  14. Suzuki, M., Tamari, F., Fukuda, R., Uchida, S., Kanahori, T., INFTY — An integrated OCR system for mathematical documents, . Proceedings of DocEng, France (2003). (2003) 

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.