Why DITA Semantics Matter

In a recent email on the DITA Users Yahoo Group, I saw the following bit of advice:

” …do not go to great lengths to use semantically correct tags if there is no actual benefit.”

I respectfully disagree with this comment.

In a recent email on the DITA Users Yahoo Group, I saw the following bit of advice:

” …do not go to great lengths to use semantically correct tags if there is no actual benefit.”

I respectfully disagree with this comment.

The effort to convert content is usually quite significant. While some conversion can be automated, the best results will require human editors to properly apply tags, and segment word processor or desktop publisher paragraphs into DITA elements. This effort is usually undertaken once, if at all, because of the time and effort required. There is seldom time or funding in subsequent content development cycles to focus on re-doing the conversion cleanup.

Further, for groups just adopting DITA, there often is not enough experience to make the judgment that “there is no actual benefit” to using semantic tags. In the worst case, this devolves into “why bother with a bunch of tags when <cmd> will do,” which is exactly what you want to avoid when coaching writers new to XML in the proper use of the markup. Shoddy or inconsistent use of tags will complicate or break your publishing.

Frequently I have seen numbered steps from FrameMaker or Word as a single paragraph, which an automated conversion can only map to <cmd>. However, when that text is analyzed, there is often a mix of imperative command and result, and sometimes multiple commands in one paragraph.

It only takes a moment when cleaning up that conversion to split the FrameMaker step paragraph into <cmd> and <stepresult>, or multiple steps if needed. If there’s an image included demonstrating the action, split the FrameMaker or Word numbered paragraph into <cmd> and <stepxmp>.

While you may not know of an “actual benefit” at the moment, the value of using proper semantics for this content is an investment that will pay off in the near future. Use cases for content are evolving daily, with the huge impact of social media just beginning to be felt in traditional technical communications. The more semantic your content is now, the easier it will be to write transformations in the future to take advantage of opportunities to host content on mobile devices and social media applications, or to single-source to multiple channels (such as training, eLearning and user documentation). Take an extra moment or two, use the DITA architecture as intended, and you will open lots of future opportunity for your content.

This may sound like hype, but I have been working with DITA content and new DITA authors for several years now, and I have experienced this sort of epiphany-like experience several times.

Tagged with:

2 Responses to “Why DITA Semantics Matter”

  1. Mike Rice says:

    The effort to convert to semantic markup can be considerable. But rather than ask “why should we do this at all?”, a more helpful question might be “how did our content get to this condition?”.

    Imagine the internal monologue of an author writing an unstructured procedure.

    “Ok, so first I need to get the reader to this screen. I selected ‘Advanced Options…’ from the ‘System’ menu to get to this ‘Advanced Options’ window. Then I clicked the ‘Logging’ button.

    “Select System > Advanced Options, then click Profiling.”

    “Ugh. That reminds me. That new writer used colons between menu items in the last topic I reviewed. I’ve told her that we use the greater than sign, but she keeps slipping into habits from her last company. I wonder how many times she’s done that. We use colons for list introductions, so that won’t be easy to find.

    “Now where was I? Oh, I need to apply formatting to ‘Profiling’ since it’s a button. I’ll probably need to provide an example here. I’ll put that in a new paragraph since last time I included an example the validator thought it was all part of the instructions and got totally lost.”

    Inside the author’s head there was analysis. Identifying a cascade of menu items led to putting a symbol between them. Identifying a button led to formatting the button’s label. Identifying a step example led to putting that information in a separate paragraph.

    The analysis was encoded into the document, but in an abstract way. What makes the paragraph containing the step example different from a paragraph setting the context for a task? What makes the bolded ‘Profiling’ button different from other bolded text? When usability testing leads to renaming the ‘Profiling’ button to ‘Filtering’, how will we be sure that we’ve changed all the instances of the ‘Profiling’ button? How will we be sure that we haven’t changed other uses of the the word “Profiling” to “Filtering”?

    This document maintenance task becomes harder because part of the analysis originally done by the author was thrown away through abstraction. At least two bad things happen as a result. 1) Machines can no longer reliably apply logic based on that analysys – they can only apply the abstracted formatting. 2) Consequently, maintaining the content requires that the analysis be repeated as authors (and possibly readers who use search) manually sort through search results and mentally filter out those that don’t match the expected result.

  2. Leigh White says:

    I couldn’t agree with you more on this point, Glenn. It’s one of the biggest hurdles to get past with authors new to DITA. It’s tempting to say, given their template (especially if they’re authoring in a GUI editor like Frame) to tell them, “You have to wrap it in this element to get it to look the way you want.” That approach works, but it doesn’t leave them with the necessary degree of distinction between content and presentation. I try to describe to them a future scenario in which, for example, they might want to produce a quick reference guide with just the “action” part of the step and not any of the additional information. When I explain that having the action in and everything else in or or whatever will give them that flexibility, a light begins to glimmer. However…with respect to your example of structuring a step, just this morning I was puzzling over the new 1.2 element . It seems to me that this element was designed to allow authors to do exactly what you are trying to discourage–pour everything into one element and make no semantic distinction between the parts of the step. Thoughts on that?

Leave a Reply

You must be logged in to post a comment.