Skip to Content

Batch converting .ppt and .doc files to PDF using UNIX one-liner and JODConverter

Problem
I wanted to batch convert presentation files (.ppt, pptx, .odt) to PDF format. I discovered this gem of an open source software called JODConverter which converts files between various formats. Alas, i could not find an easy way to do batch conversion with it. The following is a simple solution using the UNIX command line.
Setup
My solution was tested on Ubuntu 9.04 / Bash / Java 6.0 / OpenOffice 3.0 and JODConverter version 2.2.2. You need to have Java and OpenOffice for this solution to work if you do NOT have it already. Please download JODConverter-x.x.x.zip and unzip it to some directory. No installation is required.
Steps
  1. First close any running OpenOffice instances and restart OpenOffice in headless server mode. You should also refer their README for any newer instructions.
    $ soffice -headless -accept="socket,port=8100;urp;"

    This could take a few minutes to initialize.

  2. Setup variables in your current working shell.

    $ JODDIR=<unzipped path of the JODConverter directory>
    $ SEARCHDIR=<directory to search for PPTs/ ODTs>

    I used the following values

    $ JODDIR=/tmp/jodconverter-2.2.2/
    $ SEARCHDIR=/home/ppts/

    Note that i am purposely NOT exporting the values. These are used as simple bash (shell) variables.

  3. Now execute the following magical one-liner by cut-pasting it to your shell.

    This code will search for all files with extensions .ppt or .odt in the specified SEARCHDIR and run the converter on every file. The converted output will be placed in the same directory itself as the input file. NOTE that the solution takes care of cases like filenames with uppercase or lowercase extensions and filenames containing spaces. The code will ask for confirmation before proceeding for each file. When you are confident with the code you can replace the -ok switch with -exec as argument to find.

    $ find $SEARCHDIR  -regex ".*\.[pP|oO|dD][pP|dD|oO][tT|cC][xX]?"  -ok java -jar $JODDIR/lib/jodconverter-cli-2.2.2.jar -f pdf '{}'  \;

    There is lot of scope for checking errors but i am leaving that out as it just complicates a beautiful one-liner.

  4. If you want to specify a separate OUTPUTDIR for all the generated PDFs, then run this following messier version.

    Before running this make sure to set OUTPUTDIR in addition to the other 2 variables JODDIR and SEARCHDIR. Be wary of the quoting jungle in the script below. Try this out only if you understand this !

    $ find $SEARCHDIR -regex ".*\.[pP|oO|dD][pP|dD|oO][tT|cC][xX]?" -exec sh -c "basename '{}' | cut -d\".\" -f1 | awk -v q=\"'\" '{print q \"$OUTPUTDIR\" \"/\" \$0 \".pdf\" q }' | xargs --interactive java -jar  $JODDIR/lib/jodconverter-cli-2.2.2.jar '{}' " \;

    You can prevent confirmation for every file by removing the --interactive switch to xargs.

  5. Kill the soffice server to start using openoffice normally.
    $ killall -9 soffice.bin
  6. Note that it is easy to tweak the above solution to support conversion between any formats that JODConverter supports.