PDF, PS and DjVu
This article covers software to view, edit and convert PDF, PostScript (PS), DjVu (déjà vu) and XPS files.
Engines
- Poppler — PDF rendering library based on Xpdf. For CJK (Chinese, Japanese, Korean) support with Poppler, install poppler-data.
- Mupdf — MuPDF is a lightweight PDF, XPS, and EPUB viewer, consisting of a software library, command line tools, and viewers.
- libspectre — Small library for rendering Postscript documents.
- Ghostscript — Interpreter for PostScript and PDF. Provides the gs(1) command-line interface, see also
/usr/share/doc/ghostscript/*/Use.htm(online), along with many wrapper scripts like ps2pdf and pdf2ps.
- DjVuLibre — Suite to create, manipulate and view DjVu documents.
- libgxps — GObject based library for handling and rendering XPS documents.
Viewers
Framebuffer
- fbgs — Poor man's PostScript/pdf viewer for the linux framebuffer console.
- fbpdf — Small framebuffer PDF and DjVu viewer based on MuPDF, with Vim keybindings and written in C
- jfbview — Framebuffer PDF and image viewer. Features include Vim-like controls, zoom-to-fit, a TOC (outline) view and fast multi-threaded rendering.
Graphical
- Adobe Reader — Proprietary PDF file viewer offered by Adobe. Discontinued for Linux.
- DjView — Viewer for DjVu documents.
- Foxit Reader — Small, fast (compared to Acrobat) proprietary PDF viewer. Discontinued for Linux.
- gv — Graphical user interface for the Ghostscript interpreter that allows to view and navigate through PostScript and PDF documents.
- Zathura — Highly customizable and functional document viewer (plugin based). Supports PDF, DjVu, PostScript and Comicbook.
Comparison
Asterisk next to library denotes optional dependency needs to be installed for specified feature.
| Name | PostScript | DjVu | XPS | PDF forms | PDF Annotation | License | |
|---|---|---|---|---|---|---|---|
| Adobe Reader | Custom | ||||||
| apvlv | Poppler | DjVuLibre | |||||
| Atril | Poppler | libspectre | DjVuLibre | libgxps | |||
| DjView | DjVuLibre | ||||||
| Emacs | Ghostscript* | DjVuLibre* | GPLv3 | ||||
| ePDFView | Poppler | ||||||
| Evince | Poppler | libspectre | DjVuLibre | libgxps | |||
| Foxit Reader | Custom | ||||||
| gv | Ghostscript | GPLv3 | |||||
| llpp | libmupdf | libmupdf | GPLv3 | ||||
| MuPDF | Custom | Custom | |||||
| Okular | Poppler | libspectre | DjVuLibre | Custom | |||
| pdfpc | Poppler | ||||||
| qpdfview | Poppler | libspectre* | DjVuLibre* | ||||
| Xpdf | Custom | GPLv3 | |||||
| Xreader | Poppler | libspectre* | DjVuLibre* | libgxps* | |||
| Zathura | Poppler* / libmupdf* | libspectre* | DjVuLibre* | libmupdf* | No | ||
PDF forms
The PDF forms column in the above table refers to AcroForms support. If you do not need your input to be directly extractable from the PDF, you can also use the applications in #Annotation or #Graphical PDF editing to put text on top of a PDF. PDF forms can be created with LibreOffice Writer (View > Toolbars > Form Controls) and the advanced PDF editors.
The proprietary and deprecated XFA format for forms is not fully supported by Poppler and only supported by Adobe Reader and Master PDF Editor.
Alternatively, web browsers such as Firefox or Chromium feature a built-in PDF viewer capable of filling out forms.
Graphical PDF editing
- Scribus can import and export PDF; text is imported as polygons.
- LibreOffice Draw can import and export PDF; text is imported as text; embedded fonts are substituted.
- Inkscape can import a single page from a PDF and export to PDF; text is imported as cloned glyphs or text; with the latter embedded fonts are substituted.
- Graphics editors like GIMP and can also import and export PDFs at the cost of rasterization.
Basic editors
- PDFsam — Open source application, written in Java, supports merging, splitting and rotating.
- https://pdfsam.org/ || pdfsamAUR
Cropping tools
- briss — Java GUI to crop pages of PDF documents to one or more regions selected.
Advanced editors
PDF tools
See also Ghostscript.
- DiffPDF — Compare the text or the visual appearance of each page in two PDF files.
- Stapler — Light alternative to PDFtk using the PyPDF2 library.
Concatenate PDFs
With Ghostscript:
$ gs -dNOPAUSE -sDEVICE=pdfwrite -sOUTPUTFILE=out.pdf -dBATCH 1.pdf 2.pdf 3.pdf
With PDFtk:
$ pdftk 1.pdf 2.pdf 3.pdf cat output out.pdf
With Poppler:
$ pdfunite 1.pdf 2.pdf 3.pdf out.pdf
With QPDF:
$ qpdf --empty --pages 1.pdf 2.pdf 3.pdf -- out.pdf
Convert a PDF to text
With Poppler and maintaining the layout:
$ pdftotext -layout in.pdf out.txt
See also .
Decrypt a PDF
This section lists commands to decrypt a PDF to an unencrypted file. Note that most PDF viewers also support encrypted PDFs.
With PDFtk:
$ pdftk in.pdf input_pw password output out.pdf
With Poppler to PostScript:
$ pdftops -upw password in.pdf out.ps
With QPDF:
$ qpdf --decrypt --password=password in.pdf out.pdf
Encrypt a PDF
The user password is used for encryption, the owner password to restrict operations once the document is decrypted, for more information, see Wikipedia:PDF#Security and signatures.
With PDFtk:
$ pdftk in.pdf output out.pdf user_pw password
With PoDoFo:
$ podofoencrypt -u user_password -o owner_password in.pdf out.pdf
With QPDF:
$ qpdf --encrypt user_password owner_password key_length -- in.pdf out.pdf
where can be 40, 128 or 256.
Extract images from a PDF
With Poppler to JPEG:
$ pdfimages infile.pdf -j outfileroot
Extract page range from PDF, split multipage PDF document
With Ghostscript as a single file
$ gs -sDEVICE=pdfwrite -dNOPAUSE -dBATCH -dSAFER -dFirstPage=first -dLastPage=last -sOutputFile=outfile.pdf infile.pdf
With PDFtk as a single file:
$ pdftk infile.pdf cat first-last output outfile.pdf
With Poppler as separate files:
$ pdfseparate -f first -l last infile.pdf outfileroot-%d.pdf
With QPDF as a single file:
$ qpdf --empty --pages infile.pdf first-last -- outfile.pdf
With mutool as a single file:
$ mutool clean -g infile.pdf outfile.pdf first-last
Imposing a PDF
PDF Imposition (e.g. to combine multiple pages to one page) can be done with pdfjam, for example paper waste can be reduced with pdfnup and pdfbook can be used to arrange PDFs into a format suitable for book binding.
Optimize, reduce size of a PDF
With Ghostscript one of:
$ ps2pdf -dPDFSETTINGS=/screen in.pdf out.pdf $ gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/printer -sOutputFile=out.pdf in.pdf
For different settings see the documentation.
There is also , a script wrapping gs.
Rasterize a PDF
With GraphicsMagick to convert a specific page:
$ gm convert -density dpi infile.pdf[page] outfile.jpg
With Poppler to convert all pages:
$ pdftoppm -jpeg -r dpi infile.pdf outfileroot
With Poppler to convert a specific page:
$ pdftoppm -jpeg -r dpi -f page -singlefile infile.pdf outfileroot
Splitting PDF pages
With mupdf-tools to split every page vertically into two pages:
$ mutool poster -y 2 in.pdf out.pdf
Can be used to undo simple imposition.
Add signature.png or image to one of the pages in the PDF
To add an image to any location in PDF can be done with ImageMagick (convert), xv and pdftk. A wrapper script is here and other hints are here.
Removing annotations from a PDF
With :
$ rewritepdf.pl -C in.pdf out.pdf
See https://superuser.com/a/1051543 for more information.
DjVu tools
- DjVuLibre provides many command-line tools, like for example.
Convert DjVu to images
Break Djvu into separate pages:
$ djvmcvt -i input.djvu /path/to/out/dir output-index.djvu
Convert Djvu pages into images:
$ ddjvu --format=tiff page.djvu page.tiff
Convert Djvu pages into PDF:
$ ddjvu --format=pdf inputfile.djvu ouputfile.pdf
You can also use --page to export specific pages:
$ ddjvu --format=tiff --page=1-10 input.djvu output.tiff
this will convert pages from 1 to 10 into one tiff file.
Processing images
You can use to:
- fix orientation
- split pages
- deskew
- crop
- adjust margins
Make DjVu from images
There is a useful script .
$ img2djvu -c1 -d600 -v1 ./out
it will create 600 DPI from all files in ./out directory.
Alternatively, you can try , which seems to create smaller files especially on images with well defined background.
PostScript tools
ps2pdf
ps2pdf is a wrapper around ghostscript to convert PostScript to PDF:
$ ps2pdf -sPAPERSIZE=a4 -dOptimize=true -dEmbedAllFonts=true YourPSFile.ps
Explanation:
Libraries
Python
- pdfrw — A pure Python library that reads and writes PDFs.