DITA: Don’t Migrate; Metamorphose

 Joe Pairman, Lead Consultant, Mekon Ltd.

In the world of content technologies, we often borrow concepts from the world of biology. Content has a life cycle. We may attempt to classify it using a taxonomy. And when content is permanently moved from one system to another, we talk of migration.  

These borrowed concepts are powerful models that help us conceptualize definitions, processes, and states. Mostly, they serve us well. They aren’t perfect models, however. For example, content may be said to have a birth and an end of life, but it isn’t capable of autonomous reproduction (despite the impression some websites give). When we classify our subject matter hierarchically, we may find ourselves making decisions of convenience, in a slightly less rigorous way than biological taxonomy. As for migration, the problem is not with the analogy itself. It works well to describe the transferring of content from one system to another. Rather, the problem is the temptation to over apply it.

Birds migrating
Image by Roger Loewig Gesellschaft (Roger Loewig Gesellschaft) [CC-BY-3.0 (http://creativecommons.org/licenses/by/3.0)], via Wikimedia Commons

I remember the circling flocks of birds during a darkening British autumn, preparing for a long journey to a better climate. Similarly, a content team migrating to DITA can look forward to escaping the overcast gloom of copy-paste and repetitive cross-checking in favor of the clear blue sky of open standards, reuse, and push-button publishing. And just as migration often involves an arduous journey, so doc teams need to pass through a lengthy process of tools assessments, pilot projects, and training before reaching their objective.

But there’s a downside to this analogy. When a British swallow arrives in sub-Saharan Africa, it remains essentially its former self, albeit a little lighter and perhaps missing a feather or two. But content and its authors can’t remain the same after migration from traditional document formats to the hugely different environment of a structured semantic architecture such as DITA. The benefits of such a move include significant savings, increased quality and consistency, and sophisticated delivery solutions or integrations with other systems. But an organization that wants to achieve these benefits needs to radically change its processes of writing, collaborating, and even to team roles. Rather than migration, perhaps we should be talking of metamorphosis.

A completely different element

I imagine a tadpole’s transition to land to be a stressful experience. My first time snorkeling felt rather odd; how much more daunting must the reverse move be, particularly when you need to develop your own breathing apparatus? A move to structured content is similar.

Green tree frog
Image by Christian Fischer (Christian Fischer) [CC-BY-3.0 (http://creativecommons.org/licenses/by/3.0)], via Wikimedia Commons

Authors must transition from the comfortable, buoyant surroundings of WYSIWYG to a sparse, unfamiliar environment. No matter how straightforward the interface, they still need to trust the system to output something suitable. They also have to define what their words represent in such a way that the system can understand this. Though this may be as easy as entering text in the right section of a form, or picking an inline tag from a list, it still requires more thought than the old ways of authoring. (This isn’t any easier for those used to lightweight markup, either, as Markdown and the like are hardly semantic.)

Unfortunately, not everyone realizes this. There are teams who get their content into DITA files and maybe even a CMS, but continue to behave as if they’re authoring in Word. The DITA-Users list receives questions from time to time along the lines of “How do I make one word green in DITA?” or “How can my translators add <b> tags to certain terms without causing translation memory problems?” There are even accounts of groups abandoning the idea of distinct topic types and cramming everything into generic topics. Practices such as these drastically reduce the benefits that can be achieved from a DITA implementation. Local formatting tweaks reduce efficiency, can introduce errors, and mean that potentially valuable semantic information is not being captured. Why do some localized terms need to be bold and some words green? Can this information be used elsewhere, in different contexts?

A over-loose content model means that writers have to remember which of the many generic elements to use for their specific case, and limit the possibilities for use of content chunks in adaptive, multi platform solutions. (It’s possible to do structure without a constraining architecture / template, but harder — a template is a powerful motivator and focusing tool.) The shape of your content needs to change, just as a tadpole needs to grow legs to function successfully out of water.

A frog is not a walking tadpole

It’s not only the content itself but also your team’s processes that may need changed. For example, creating content in a modular way means that a content set may well not have a single author at a given time. It may well be produced simultaneously by a number of authors, each with specialist knowledge of a different subject area. The effort that used to be spent in polishing a “book” can now be invested in researching user needs, understanding the subject material better, and producing well-crafted text instead of pretty page breaks and hand-kerned typography.

Your team may well need to develop new roles entirely. It’s unusual to find corporate IT teams that thoroughly understand content technologies and can manage your systems in the way you need. They may think they do, but it’s unlikely they have the stomach for tweaking custom metadata in a CMS, working with XSLT stylesheets, and testing translation processes, for example. At the very least, one person on your team should develop skills around the right ways to use technology to work with content.

Abandon your comfortable pond

DITA makes things dramatically easier, but it does not remove the need to define your own processes and architecture. It was never intended to do so. You can’t just fly your stuff from one place to another. The nature of the environment has changed. It’s as dramatic as a frog’s lifecycle from water to land. Put migration aside for now — your content needs to metamorphose, and you with it.