PDF Enhancements Tools for a Digital Library

Hatlapatka, Radim; Sojka, Petr

  • Towards a Digital Mathematics Library. Paris, France, July 7-8th, 2010, Publisher: Masaryk University Press(Brno, Czech Republic), page 45-55

Abstract

top
This paper describes several innovative PDF document enhancements and tools that can be used when building a digital library. The main result presented in this paper is the PDF re-compression tool, developed using the jbig2enc encoder called pdfJbIm. This re-compression tool enables the size of the original bitonal PDFs to be, on average, downsized by one third. Some modifications to the jbig2enc encoder that increase the compression ratio even further are also described here. Together with another program, the pdfsizeopt.py by Péter Szabó, we have managed to decrease PDF storage size to such an extent that the transmission needs of a digital library were significantly reduced. We report the storage saving results that we have achieved on The Czech Digital Mathematics Library DML-CZ—we have downsized the PDF corpus to 43% of its original size. We also describe pdfsign tool for batch digital signature stamping of PDF documents.

How to cite

top

Hatlapatka, Radim, and Sojka, Petr. "PDF Enhancements Tools for a Digital Library." Towards a Digital Mathematics Library. Paris, France, July 7-8th, 2010. Brno, Czech Republic: Masaryk University Press, 2010. 45-55. <http://eudml.org/doc/219951>.

@inProceedings{Hatlapatka2010,
abstract = {This paper describes several innovative PDF document enhancements and tools that can be used when building a digital library. The main result presented in this paper is the PDF re-compression tool, developed using the jbig2enc encoder called pdfJbIm. This re-compression tool enables the size of the original bitonal PDFs to be, on average, downsized by one third. Some modifications to the jbig2enc encoder that increase the compression ratio even further are also described here. Together with another program, the pdfsizeopt.py by Péter Szabó, we have managed to decrease PDF storage size to such an extent that the transmission needs of a digital library were significantly reduced. We report the storage saving results that we have achieved on The Czech Digital Mathematics Library DML-CZ—we have downsized the PDF corpus to 43% of its original size. We also describe pdfsign tool for batch digital signature stamping of PDF documents.},
author = {Hatlapatka, Radim, Sojka, Petr},
booktitle = {Towards a Digital Mathematics Library. Paris, France, July 7-8th, 2010},
keywords = {jbig2enc; JBIG2; PDF size optimization; compression; DML; digital signature; JB2; DjVu; pdfsign; DML-CZ; EuDML; pdfsizeopt.py; Google; JB2 algorithm},
location = {Brno, Czech Republic},
pages = {45-55},
publisher = {Masaryk University Press},
title = {PDF Enhancements Tools for a Digital Library},
url = {http://eudml.org/doc/219951},
year = {2010},
}

TY - CLSWK
AU - Hatlapatka, Radim
AU - Sojka, Petr
TI - PDF Enhancements Tools for a Digital Library
T2 - Towards a Digital Mathematics Library. Paris, France, July 7-8th, 2010
PY - 2010
CY - Brno, Czech Republic
PB - Masaryk University Press
SP - 45
EP - 55
AB - This paper describes several innovative PDF document enhancements and tools that can be used when building a digital library. The main result presented in this paper is the PDF re-compression tool, developed using the jbig2enc encoder called pdfJbIm. This re-compression tool enables the size of the original bitonal PDFs to be, on average, downsized by one third. Some modifications to the jbig2enc encoder that increase the compression ratio even further are also described here. Together with another program, the pdfsizeopt.py by Péter Szabó, we have managed to decrease PDF storage size to such an extent that the transmission needs of a digital library were significantly reduced. We report the storage saving results that we have achieved on The Czech Digital Mathematics Library DML-CZ—we have downsized the PDF corpus to 43% of its original size. We also describe pdfsign tool for batch digital signature stamping of PDF documents.
KW - jbig2enc; JBIG2; PDF size optimization; compression; DML; digital signature; JB2; DjVu; pdfsign; DML-CZ; EuDML; pdfsizeopt.py; Google; JB2 algorithm
UR - http://eudml.org/doc/219951
ER -

References

top
  1. Bartošek, M., Lhoták, M., Rákosník, J., Sojka, P., Šárfy, M., DML-CZ: The Objectives and the First Steps, In: Borwein, J., Rocha, E.M., Rodrigues, J.F. (eds.) CMDE 2006: Communicating Mathematics in the Digital Era, pp. 69–79. A. K. Peters, MA, USA (2008) (2008) MR2590568
  2. Bloomberg, D., Leptonica, [online] (2010), [cit. 2010-04-25], http://www.leptonica.com/jbig2.html (2010) 
  3. Bočák, P., Digitáne podpisované PDF dokumenty (Bachelor thesis written in Czech, Digital signatures of PDF documents), Masaryk University, Faculty of Informatics (advisor Petr Sojka), Brno, Czech Republic (2008) (2008) 
  4. Bottou, L., Haffner, P., Howard, P.G., Simard, P., Bengio, Y., Le Cun, Y., High Quality Document Image Compression with DjVu, Journal of Electronic Imaging 7(3), 410–425 (1998), http://leon.bottou.org/papers/bottou-98 (1998) 
  5. Bruno, L., IText PDF, [online] (2009), http://www.itextpdf.com/ (2009) 
  6. Committee, J., 14492 FCD, ISO/IEC JTC 1/SC 29/WG 1 (1999), http://www.jpeg.org/public/fcd14492.pdf (1999) 
  7. Foundation, T.A.S., Apache PDFBox – Java PDF Library, [online] (2010), http://pdfbox.apache.org/ (2010) 
  8. Hatlapatka, R., JBIG2 komprese (Bachelor thesis written in Czech, JBIG2 compression), Masaryk University, Faculty of Informatics (advisor Petr Sojka), Brno, Czech Republic (2010) (2010) 
  9. Hatlapatka, R., PDF Recompression using JBIG2, [online] (2010), http://nlp.fi.muni.cz/projekty/eudml/pdfRecompression/ (2010) 
  10. Hatlapatka, R., Source codes of pdfJbIm, [online] (2010), http://code.google.com/p/pdfrecompressor/ (2010) 
  11. Howard, P., Text image compression using soft pattern matching, Computer Journal 40(2/3), 146–156 (1997) (1997) 
  12. ISO/IEC JTC1/SC29/WG1, JBIG Maui Meeting Press Release, (December 1999), http://www.jpeg.org/public/mauijbig.pdf (1999) 
  13. Langley, A., Homepage of jbig2enc encoder, [online], http://github.com/agl/jbig2enc 
  14. Sylwestrzak, W., Borbinha, J., Bouche, T., Nowiński, A., Sojka, P., EuDML—Towards the European Digital Mathematics Library, In: Sojka, P. (ed.) Proceedings of DML 2010. Masaryk University Press, Paris, France (Jul 2010) (2010) 
  15. Adobe Systems Incorporated, Adobe Systems Incorporated: PDF Reference, pp. 90–100. Adobe Systems Incorporated, sixth edn. (2006), http://www.adobe.com/devnet/acrobat/pdfs/pdf_reference_1-7.pdf (2006) 
  16. Szabó, P., Optimizing PDF output size of TeX documents, TUGboat 30(3), 112–130 (2009), [cit. 2010-04-26], http://code.google.com/p/pdfsizeopt/ (2009) 
  17. Union, I.T., ITU-T Recommendation T.88, ITU-T Recommendation T.88 (2000), http://www.itu.int/rec/T-REC-T.88-200002-I/en (2000) 

NotesEmbed ?

top

You must be logged in to post comments.

To embed these notes on your page include the following JavaScript code on your page where you want the notes to appear.

Only the controls for the widget will be shown in your chosen language. Notes will be shown in their authored language.

Tells the widget how many notes to show per page. You can cycle through additional notes using the next and previous controls.

    
                

Note: Best practice suggests putting the JavaScript code just before the closing </body> tag.