Intelligent Content is the new trendy topic in the technical documentation galaxy. I had the occasion to assist to numerous presentations and panels around this subject during the fall 2015 conferences and tradeshows like IDW in San Jose, Lavacon in New Orleans or Tekom in Stuttgart, and I want to share some feelings and disappointment.
Do you dare to ask what is intelligent content? Because somehow, it speaks for itself: It’s in the name. The keywords that gravitate around this concept are: semantic tagging, classified content, taxonomies, cross-silos, linked information, structured content, reusability, multi-channel delivery.
In fact the answer is pretty vague, and I feel that there is a huge confusion between content, structure and semantics. And this is what I would like to clarify.
Content is what you say
It’s the story.
Structure is how you say it
Paragraphs, lists, images, tables, … how you split your content in multiple fragments in the case of structured authoring so that you can more easily reuse fragments in different places without having to duplicate it. Like using the same product description in a tech doc, in a blog article and in a catalog.
Semantics is why you say it
It’s the purpose of the content: your intent summarized and formalized through keywords, tags or links. It’s what allows keeping the “right content at the right time” promise.
Content is about language and words.
Structure is about format: Docbook, DITA, Framemaker, etc.
Semantics is about metadata and ontologies.
There is no real link between Content and Structure
I heard: “DITA is more suited to intelligent content than other formats”. I think it’s wrong and misleading to say that. There is no real link between those two things. I know that the DITA community will hate me for saying that, but a little controversy is good sometimes (at least this gives opportunity for new panels and flames).
Let’s start with this more obvious assertion: There is no real link between Content and Structure. It’s not because you write in Docbook that you tell better stories than if you write with Microsoft Word. Was Shakespeare a poor writer because he was not writing in DITA?
Your skills as a storyteller have nothing to do with the tool — the structure— that you use for writing.
The same way, Structure is not bound to Semantics. Adding a tag to a piece of content that is in DITA, in Docbook or in HTML is not different. It’s still a tag – attached to – a piece of content.
A “tag” is any additional information added to content to describe its nature, subject, purpose, use, etc. It can be a free keyword, an entry from a controlled list or a taxonomy, or even an RDF assertion complying an ontology. I won’t dig further on the subject of semantics technics and technologies, but will just give few real life examples to illustrate what those tags could be:
- ‘Subject=installation, service, search’
- ‘Audience=expert’ —implied “this content is meant to be read by experts”
- ‘Product=Software/AFS’ —implied: “this content applies to the software product AFS”
- ‘See also topic X’ —implied “this content is closely related to content X”
The point I want to insist on is the second part of the phrase: “attached to”. The tags are not inside the content. They are external to it and live separately. Putting them inside the content may even be an error.
Consider these two examples:
- ‘Audience=expert’ may depend on the context and the same content may be used by ‘novice’ as part of a task description.
- ‘See also topic X’ is a link that becomes invalid and must disappear if topic X is deleted. And you shouldn’t have to open all content pieces pointing to remove the links. Therefore it’s quite clear that the link should be external to the content itself.
Last but not least, this added descriptions (tags) are relevant only when they are attached to a piece of information that is coherent, focused and self-sufficient.
- Tagging a word or a phrase does not make sense. It’s not an information, surely not self sufficient, and too ambiguous.
- Tagging globally a 300 pages manual may not be relevant either. It is too generic, certainly dealing with a lot of different subjects.
This is why tagging mostly makes sense on fragments of content (topics), and this is also why intelligent content is closely related to structured content authoring. But this should not be a reason to blur the lines and mix the two subjects. This would be wrong and confusing for people, which won’t help the necessary move towards intelligent content.
Let’s always keep this distinction clear between Content, Structure and Semantics.
My claim to DITA Experts: As explained above, most of the semantics —i.e. tags— should be managed and stored alongside the content itself, and this is a capability that should be offered by the CCMS (Component Content Management System). The different features, possibilities of this management would create a distinction between the different CCMS on the market. But at the end of the day, we miss in DITA the possibility to export (serialize) this semantic layer. DITA offers the possibility to exchange the content without loosing the structure through the topic and map DTDs. But there is no recommendation to publish the semantics. This is certainly a missing part if DITA wants to be a comprehensive standard for intelligent content.