//DEVGURU

Michał Bielawski @ November 18th, 2009

How to convert doc/xls/ppt/… to pdf in Unix console

Today we needed to make a tool to convert Microsoft Office (and preferably other formats) documents to PDF from a Rails application. We have come up with such solution – install OpenOffice.org, CUPS and CUPS-PDF (virtual PDF printer for CUPS) and run it like this:

soffice -headless -pt Cups-PDF file.doc

This command executes OpenOffice.org in so-called ‘headless’ mode – meaning that it will not run anything graphical. It reads the file.doc, and prints it using CUPS-PDF virtual printer. The resulting PDF can be found (by default) in ~/PDF directory.

Tagi: ,

3 comments do “How to convert doc/xls/ppt/… to pdf in Unix console”

  1. Michal wrote:

    how about performance? seems heavy.

  2. neongrau wrote:

    I was trying this solution but at least on Mac OS X it ended up with a dialog for Choosing a printer.
    While the CUPS-PDF printer was preselected it wasn’t a viable solution for me.

    After researching a bit more i came across this solution involving a small java toolset called “JODConverter” from http://www.artofsolving.com/opensource/jodconverter – which allows batch conversion via starting OOo headless running on a TCP/IP port.

    soffice -headless -accept=”socket,port=8100;urp;”

    and then running the cli tool:
    java -jar jodconverter-cli-2.2.2.jar word.doc word.pdf

    or other Office Files (even pptx or xlsx)
    java -jar jodconverter-cli-2.2.2.jar powerpoint.pptx powerpoint.pdf

    which should be much “lighter” since OOo doesn’t need to be started up each time.

    Memory usage for the OOo “service” was 30MB after start and 80-95MB after it has processed one or more files.

  3. madsheep wrote:

    neongrau – finally we did something very similar, but without using any external tool- it seems the new OOo (3.1) is clever enough to find its own instances – first we started it

    soffice -headless

    and left it there. Now each time we run are command to reprocess the file (in this case from paperclip processor) it uses the open OOo instance instead of creating a new one. If u have a problem with the printer dialog, consider passing -p option instead of -pt – this will choose default printer.
    Michal – actually it’s not heavy at all, after leaving the first OOo instance open, it can handle single document in about 500ms (lets say standard docx with 2-3 pages). For me it’s pretty quick – considering other options such us scribd or any other external service :)

Comment this post!