Saltar ó contido principal

Pandoc: any-to-any document conversion

So two posts ago we checked Graphviz, a simple and flexible tool for graph generation.

This time I want to take a look at Pandoc. Its website puts it this way:

If you need to convert files from one markup format into another, pandoc is your swiss-army knife.

It's a great tool to have in mind, so let's check some examples what you can do with it.

Simple document conversion

Ok, the first one is pretty obvious, but pandoc supports conversions between *lots of formats* [check its website for a list].

You can convert your markdown documents into MediaWiki ones (or back), to move between Gitlab's wiki and Mediawiki:

pandoc -i input.md --to=mediawiki -o output.wiki

Note: Most output formats are detected automatically through output file extension, mediawiki is not one of those (I'm not sure it has a specific extension). For common output formats no --to parameter is needed. Check available output formats with pandoc --list-output-format.

For example: The Gitit wiki uses pandoc to both allow multiple input and export formats.

An easy way to read complex formats

If you're used to Markdown/plaintext to manage documentation you probably found cool things you can do if you have your documents in "plaintext-like" formats:

  • grep and diff them
  • Write one off scripts to analyze them
  • Commit them to git

You also probably have some "binary" files. You could convert PDF's to text with pdftotext, but what about EPUB, ODT or (root-forbid) DOCX and PowerPoint.

You can use pandoc to convert them, although lossily, into text:

$ pandoc -i nextcloud/Documents/About.odt --to=markdown -o -
![](Pictures/1000000000000590000002649039D8B6B4AA5471.jpg){width="2.7075in"
height="1.1638in"}

About Nextcloud

****

**Welcome to Nextcloud, your self-hosted file sync and share solution.**

Nextcloud is the open source file sync and share software for everyone
from individuals to large enterprises and service providers. Nextcloud
...

For example: Enable git diff over MS Word files

Conclusion

I won't go through all possible uses, here are some more ideas that sound plausible:

  • Get all your (maybe binary) documents and move them into your preferred wiki software.
  • Read docs and send them to an ElasticSearch server and build a document search engine. This will probably come up in a later post ;)
  • Have your preferred blog engine generate PDF and ODT files from your posts.