Wise Bison

project-based learning for fun

6 Open-Source Tools for Creating Professional Documents

Creating Professional Quality Documents with Free Software

Over the last ten years I’ve done a lot of technical writing for my college courses. Every week I write quizzes, lab assignments, and documentation, and now and then I write blog posts like this one. Taken all together it amounts to thousands of pages.

Since I’m a lazy, I’m always looking for tools that will save time on the document-generation (and give me more time to watch Ninja Warrior). I also want to create good-looking documents that can be converted to HTML, PDFs, and the occasional ePub when I need it. So far I haven’t found one magic piece of software that can do everything I need, but, there is a very sweet cocktail of cool free software that will do the job. My toolkit keeps evolving and I’m always on the lookout for something better. In the meantime, here’s my current list of document-generation tools.

1 LyX

LyX is my number one writing tool. It’s a LaTeX-based document processor that helps your write documents in a structured way; it bills itself as the editor that does what you mean to do. As you’re writing, you mark your text according to what it means, and LyX will format it appropriately. This is very different from a word processor, that requires you to format your text. If you write scientific documents, LyX can format any mathematical forumlas that can be printed by LaTex.

LyX is a little different that Word or Pages, so expect to go through some rough patches as you go along.

I usually write my documents on LyX, then export them as HTML and PDFs. I’m writing this post on LyX, and I’ll be exporting it as a LyX document. Some other export formats are:

  • Postscript
  • DVI
  • LaTeX
  • RTF
  • Plain Text
  • EPS
Learn More about LyX

LyX runs on Windows, Linux, and OS X, and it’s free. Fortunately, you don’t need to know anything about LaTeX. LaTeX is one of the most complex pieces of software in existence, but LyX shields you from the pain. All you have to do is write.

You can get LyX at http://lyx.org.

2elyxer.py

elyxer.py is a Python script that converts LyX files to HTML. Both elyxer.py and Python get installed when you install LyX, even on Windows. LyX uses Elyxer internally to convert LaTeX to HTML. You can also use Elyxer as a standalone program to stitch together your custom HTML pages, using your fancy HTML templates.
Here’s a simple HTML page (my Python course syllabus) generated using the default settings, including the benign Elyxer CSS: Elyxer Sample. It plain and readable. And here’s an example that uses a custom HTML5 template.
I, however, can’t leave well enough along and I want to use my own CSS. By providing many, many options, Elyxer allows you to customize the final look of the HTML. Elyxer generates XHTML or HTML 4, but if you want HTML 5, you will need some of those options. Here’s the SHELL script I use to generate HTML 5 documents:
1 lyx2html5(){
2     html5_head=~/Dropbox/includes/html5_head.html
3     html5_footer=~/Dropbox/includes/html5_footer.html
4     if [ $# -lt 2 ]
5     then
6         echo ’Usage: lyx2html  input.lyx "title"’
7     else
8         echo Converting $1 to  ${1%.lyx}.html
9         echo ’—’
10         /usr/bin/elyxer.py --iso885915 --nofooter --notoclabels \
                              --raw --title "$2" $     1 ${1%.lyx}.txt
11         cat $html5_head ${1%.lyx}.txt $html5_footer > ${1%.lyx}.html
12         sed "s/THE_TITLE/$2/" < ${1%.lyx}.html > tmpfile
13         mv tmpfile ${1%.lyx}.html
14         rm ${1%.lyx}.txt
15     fi
16 }  
  • Lines 2 and 3: The files that contain the HTML 5 templates that will enclose the Elyxer-generated HTML
    Line 10:elyxer.py called with several options, the most important being “–raw”, which tells elyxer.py to omit the XHTML/HTML 4 doctype. The HTML 5 template includes some text “THE TITLE” that is replaced with $2, the second argument to the function lyx2html5().
You would add this function to your shell startup file (.bash_profile or .bashrc), and use it like this:
lyx2html somefile.lyx “This is the Title”
lyx2html will create a file with a “.html” extension, like this: somefile.html.
Learn More about Elyxer
You can get Elyxer at http://elyxer.nongnu.org/.

3 Markdown

Markdown is a way to write a limited set HTML tags in a simplified shorthand, which is converted to valid HTML by the markdown engine. For tags not covered by Markdown, you can use HTML tags.
The big selling point for Markdown is simplicity, and the price you pay is that some common tags are not implemented. The other important aspect of Markdown in my document flow is that Octopress, my blogging platform, makes use of it.
Here’s an example of what a Markdown document would look like, and the HTML output generated:
Markdown:

A First Level Header
====================

A Second Level Header
———————

Now is the time for all good men to come to
the aid of their country. This is just a
regular paragraph.

The quick brown fox jumped over the lazy
dog’s back.

### Header 3

> This is a blockquote.
> 
> This is the second paragraph in the blockquote.
>
> ## This is an H2 in a blockquote

Output:

<h1>A First Level Header</h1>

<h2>A Second Level Header</h2>

<p>Now is the time for all good men to come to
the aid of their country. This is just a
regular paragraph.</p>

<p>The quick brown fox jumped over the lazy
dog’s back.</p>

<h3>Header 3</h3>

<blockquote>
    <p>This is a blockquote.</p>
    <p>This is the second paragraph in the blockquote.</p>
    <h2>This is an H2 in a blockquote</h2>
</blockquote>

Learn More About Markdown
You can learn about Markdown at http://daringfireball.net/projects/markdown/.

4exitwp.py

When you want to replace Wordpress with something simpler, more reliable, and more secure, Octopress fills the bill. The first task is to migrate your brilliant content from Wordpress to the Markdown format used by Octopress. exitwp.py is the solution. It goes like this:
  1. Clone the exitwp Git archive:
    git clone https://github.com/thomasf/exitwp.git
    
  2. Export your Wordpress blog use the Wordpress exporter.
  3. Put your Wordpress XML file into the wordpress-xml directory inside the exitwp folder.
  4. Run xmllint to find errors; correct them.
  5. Inside the exitwp folder, run the converter: python exitwp.py
  6. Find your converted Markdown pages in the build directory. You can move them to your Octopress blog source directory.
Learn more about exitwp.py
Visit the exitwp.py Github page.

5 Pandoc

Pandoc is a nearly universal document converter that comes in handy when you need to create an ePub document from and HTML or Markdown source. I can’t improve on the excellent documentation, which starts like this:
About Pandoc

If you need to convert files from one markup format into another, pandoc is your swiss-army knife. Pandoc can convert documents in markdown, reStructuredText, textile, HTML, DocBook, or LaTeX to
  • HTML formats: XHTML, HTML5, and HTML slide shows using Slidy, Slideous, S5, or DZSlides.
  • Word processor formats: Microsoft Word docx, OpenOffice/LibreOffice ODT, OpenDocument XML
  • Ebooks: EPUB
  • Documentation formats: DocBook, GNU TexInfo, Groff man pages
  • TeX formats: LaTeX, ConTeXt, LaTeX Beamer slides
  • PDF via LaTeX
  • Lightweight markup formats: Markdown, reStructuredText, AsciiDoc, MediaWiki markup, Emacs Org-Mode, Textile

6 Octopress

Octopress is my choice for blogging platform — it’s a a static site generator that uses SASS and Markdown, as well as Ruby and RVM. It’s main advantage is that there’s no database involved, and no executable language that can be hacked. It has all the robustness and speed of static HTML pages. You write your posts in Markdown, and Octopress will convert them to static HTML files.

Octopress is for hackers only. It’s not for beginners. If you feel comfortable with PHP, and you’re fine with Wordpress having 1000’s of plugins, and you’ve gotten accustomed to dealing with the constant stream of security upgrades, you should stick with Wordpress. Don’t even look at Octopress.

But, if you like to hack some code, and you like Ruby, and you want to learn about modern tools like Sass, and if you enjoy tinkering with the guts of your software, you owe it to yourself to take a look at Octopress. It’s a breath of fresh air.

Visit octopress.org to learn more.