Workflow of Metadata Extraction from Retro-Born Digital Documents
Tkaczyk, Dominika; Bolikowski, Łukasz
- Towards a Digital Mathematics Library. Bertinoro, Italy, July 20-21st, 2011, Publisher: Masaryk University Press(Brno, Czech Republic), page 39-44
Access Full Article
topAbstract
topHow to cite
topTkaczyk, Dominika, and Bolikowski, Łukasz. "Workflow of Metadata Extraction from Retro-Born Digital Documents." Towards a Digital Mathematics Library. Bertinoro, Italy, July 20-21st, 2011. Brno, Czech Republic: Masaryk University Press, 2011. 39-44. <http://eudml.org/doc/221804>.
@inProceedings{Tkaczyk2011,
abstract = {In this work-in-progress report we propose a workflow for metadata extraction from articles in a digital form. We decompose the problem into clearly defined sub-tasks and outline possible implementations of the sub-tasks. We report the progress of implementation and tests, and state future work.},
author = {Tkaczyk, Dominika, Bolikowski, Łukasz},
booktitle = {Towards a Digital Mathematics Library. Bertinoro, Italy, July 20-21st, 2011},
keywords = {metadata extraction; page segmentation; zone classification; Hidden Markov Model},
location = {Brno, Czech Republic},
pages = {39-44},
publisher = {Masaryk University Press},
title = {Workflow of Metadata Extraction from Retro-Born Digital Documents},
url = {http://eudml.org/doc/221804},
year = {2011},
}
TY - CLSWK
AU - Tkaczyk, Dominika
AU - Bolikowski, Łukasz
TI - Workflow of Metadata Extraction from Retro-Born Digital Documents
T2 - Towards a Digital Mathematics Library. Bertinoro, Italy, July 20-21st, 2011
PY - 2011
CY - Brno, Czech Republic
PB - Masaryk University Press
SP - 39
EP - 44
AB - In this work-in-progress report we propose a workflow for metadata extraction from articles in a digital form. We decompose the problem into clearly defined sub-tasks and outline possible implementations of the sub-tasks. We report the progress of implementation and tests, and state future work.
KW - metadata extraction; page segmentation; zone classification; Hidden Markov Model
UR - http://eudml.org/doc/221804
ER -
References
top- iText, http://itextpdf.com/.
- MARG, http://marg.nlm.nih.gov/. Zbl1143.68407
- PDFBox, http://pdfbox.apache.org/
- Automating the production of bibliographic records for MEDLINE, Tech. rep. (2001). (2001)
- Cui, B., Chen, X., An improved hidden Markov model for literature metadata extraction, Advanced Intelligent Computing Theories and Applications. pp. 205–212 (2010). (2010)
- Hetzner, E., A simple method for citation metadata extraction using Hidden Markov Models, In: JCDL ’08: Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries. pp. 280–284. ACM, New York, NY, USA (2008). (2008)
- Marinai, S., Metadata Extraction from PDF Papers for Digital Library Ingest, 10th International Conference on Document Analysis and Recognition. pp. 251–255 (2009). (2009)
- Nagy, G., Seth, S., Viswanathan, M., A prototype document image analysis system for technical journals, Computer 25(7), 10–22 (1992). (1992)
- O’Gorman, L., The document spectrum for page layout analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence 15(11), 1162–1173 (1993). (1993)
- Sojka, P., An Experience with Building Digital Open Access Repository DML-CZ, In: Proceedings of CASLIN 2009. pp. 74–78 (2009). (2009)
- Sutton, C., McCallum, A., An Introduction to Conditional Random Fields for Relational Learning, (2006). (2006)
NotesEmbed ?
topTo embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.