Marking down DITA

On the rise of Markdown as an alternative input method for DITA projects

There’s been a lot of buzz recently on Markdown & DITA, with the announcement of oXygen’s “DITA Glass” approach for URL-based conversions & Lightweight DITA’s use of Markdown & JSON.

These approaches look promising indeed and may well be the future of DITA, but another recent development — Jarno Elovirta’s DITA-OT Markdown plugin for the DITA Open Toolkit — allows you to use Markdown in your DITA projects right now without any new tools or processing paradigms.

Before we explore that, let’s step back for a moment and take a closer look at why Markdown has become so popular and why it’s such a good fit for structured authoring.

Markdown Makes Web Writing Easier

Markdown means “a reduction in price” — an apt connotation for a lightweight markup language “that allows you to write using an easy-to-read plain text format and convert it to structurally valid” markup.

Originally created in 2004 by John Gruber, “Markdown is two things:

  1. a plain text formatting syntax; and
  2. a software tool … that converts the plain text formatting to HTML.”

As Gruber writes on the project home page:

The overriding design goal for Markdown’s formatting syntax is to make it as readable as possible. The idea is that a Markdown-formatted document should be publishable as-is, as plain text, without looking like it’s been marked up with tags or formatting instructions.

So while Markdown was originally designed to make it easier to write for the web without worrying about angle brackets and tags, it’s proving useful for more than just websites…

Mobile Authoring & Lightweight Content Ecosystems

Rapidly adopted by blogging and publishing services, content management systems, commenting platforms and countless sites and tools that make it easy to contribute content, Markdown has become a practically ubiquitous means of writing for the web.

Perhaps the most significant factor in Markdown’s success has been the rise of mobile devices, which sparked renewed interest in lightweight content formats and ease of use as authors began looking for ways to take their writing (and their tools) with them on the go.

The burgeoning market for mobile writing apps & universal sync has freed authors from unwieldy word processors and bloated desktop publishing suites, and produced a variety of easy-to-use writing tools that deliver on Markdown’s promise by allowing the writer to focus on content.

Authors can capture notes with a smartphone on the go, flush out the draft back at their desk, and proofread the final result on a tablet later without copying-and-pasting or converting to other file formats along the way.

The evolution of lightweight content creation processes has spawned an entire ecosystem of tools and services that support easy authoring and editing on the go, yet permit push-button production of web pages, print-ready PDFs, slide presentations and eBooks from a single easy-to-author format.

Since writing in Markdown encourages authors to focus on structure rather than presentation, it’s a good match for structured authoring scenarios in which minimal markup is sufficient.

Markdown Meets DITA

The steep learning curve associated with XML dialects like DITA has long presented a formidable barrier to broader adoption of structured authoring. If we want to convince more people to contribute to XML-based publications, we need to make it easier.

Various tool vendors have proposed “simple” XML or “easy” DITA authoring environments that try to solve the problem with interfaces that more or less shamelessly resemble Microsoft Word — as if that would make things any easier.

However, as authors begin to appreciate the benefits of lightweight content creation, many are beginning to wonder whether casual contributors need to author XML at all.

Why ask engineers or other subject matter experts to struggle with XML or learn a new tool before they can provide input to our publications?

Shouldn’t we just let people write and let the tools figure out what to do?

Fortunately, many in the XML community are beginning to embrace this notion and we’ll be hearing more about some of these ideas soon. Several initiatives are under way to simplify the process of creating DITA content, such as Lightweight DITA, or URL-based on-the-fly conversion from various file formats to DITA with oXygen. Many of these ideas rely on Markdown.

The HTML Detour: h2d

When I worked on the initial launch of the Places API developer documentation at Nokia in early 2012, the developers were using Markdown for most of their internal writing and reluctant to author in DITA.

Back then, in order to support their workflow, I used Markdown’s own conversion routines to generate HTML from the developers’ input, and then ran the results through IBM’s old h2d transformations to convert from HTML to DITA.

Though the process was a bit circuitous, it let the developers focus on what they do best. They thanked me by providing great input, written in their favorite environment. And while a bit of post-processing and manual cleanup was occasionally required, it still beat copying-and-pasting or starting from scratch.

But now there’s a better way…

The DITA-OT Markdown Plugin

Jarno Elovirta’s DITA-OT Markdown plugin extends the DITA Open Toolkit so you can use Markdown files directly in topic references.

You can add a Markdown file to your publication like any other topic — you just need to set the @format attribute value to markdown so the plugin will recognize the source file as Markdown and convert it to DITA in the background:

<map>
  <topicref href="markdown-dita-topic.md" format="markdown"/>
</map>

You’ll need a recent build of the DITA-OT, as versions prior to 2.1 do not have the required extension points. You can install the plugin directly from GitHub using the dita command:

dita -install https://github.com/jelovirt/dita-ot-markdown/releases/download/1.0.0/com.elovirta.dita.markdown_1.0.0.zip

The DITA-OT Markdown plugin not only enables the DITA-OT to read Markdown, it also provides a new markdown transformation type that can be used to publish existing DITA content in Markdown format.

This makes it easy to export complex DITA publications in a highly readable plain text format to facilitate review, or feed DITA content into publishing workflows based on Markdown, such as Jekyll or LeanPub.

The “Markdown DITA” Format

The original Markdown syntax is very simple, but there are certain things it doesn’t cover. As Markdown has grown in popularity and branched off into use cases far beyond its original purpose, a variety of extensions have emerged to support additional applications.

Among the most popular are Fletcher Penney’s MultiMarkdown, or MMD, a superset of the Markdown syntax that adds support for tables, footnotes, and citations; and Github Flavored Markdown, or GFM, which includes support for strikethrough, fenced code blocks and syntax highlighting.

Like MMD or GFM, the DITA-OT Markdown plugin introduces a few new conventions to facilitate conversion of Markdown content to DITA. This Markdown flavor is called “Markdown DITA”, or MDD, a representation of DITA content in Markdown.

Markdown DITA uses CommonMark as the underlying markup language.

CommonMark provides “a strongly specified, highly compatible implementation of Markdown” that serves as a sort of least-common-denominator for basic Markdown syntax that should work well everywhere.

Rather than reinvent the wheel, Markdown DITA builds on CommonMark, using standard Markdown constructs or those from other established Markdown flavors to represent DITA content where possible. For example:

The shortcut reference link syntax is used for DITA key references, so you can just write [key] to create a cross-reference like <xref keyref="key"/>.

Definition lists use the PHP Markdown Extra format, so you can write

Term
: Definition.

for a DITA definition list:

<dl>
  <delentry>
    <dt>Term</dt>
    <dd>Definition.</dd>
  </delentry>
</dl>

Tables use the MultiMarkdown table extension format, and Pandoc’s header attributes can be used to define id or outputclass attributes, so # Topic title {#carrot .juice} becomes:

<topic id="carrot" outputclass="juice">
  <title>Topic title</title>

Where necessary, Markdown DITA establishes a few conventions of its own to support additional DITA features, so you can specify the information type of the generated DITA topic with a header attribute like {.task}, or generate <section> and <example> elements with the {.section} and {.example} attributes.

The plugin’s syntax reference provides an overview of the supported constructs and illustrates how DITA’s XML structures are represented in Markdown DITA.

Future versions of the plugin may add support for YAML front matter, so you can embed metadata like index entries, copyright or author information in Markdown files for inclusion in the DITA <prolog>.

Lightweight Authoring Use Cases

Although the DITA-OT Markdown plugin can read and write Markdown, it isn’t really intended for round-tripping, as many of the more complex DITA constructs cannot be reproduced in a simple plain text format like Markdown.

It works best in scenarios where engineers or other subject matter experts need to contribute content to a DITA publication without authoring in DITA.

What happens next depends on the nature of the content and its intended lifecycle:

  1. Once complex content is converted to DITA, it stays in DITA.

    If the input is a one-off contribution, members of the DITA authoring team can use the Markdown file as raw material that is easily converted to DITA and enriched with conditional processing attributes, conkeyrefs or other more complex semantics that have no equivalent in limited formats like Markdown.

  2. Simpler content stays Markdown.

    In cases where simple content is authored collaboratively over multiple versions, topics can be kept in Markdown and extended with Markdown DITA conventions to help the DITA-OT Markdown plugin generate the proper DITA markup. Markdown DITA topics can be edited by a wide range of authors and combined as necessary with more complex content maintained in DITA XML.

Both scenarios allow cross-functional teams to integrate input from less technical authors, yet still take full advantage of DITA’s complexity where necessary.

With the momentum gathering behind the idea of lightweight authoring alternatives, the DITA community can expect more innovations in this area in the near future that will help to lower the barrier to entry and make it easier for more people to contribute content to DITA publications.

But there’s no need to wait for new releases — you can start using Markdown in your DITA projects by installing the DITA-OT Markdown plugin today.