Multilingual Content Delivery: Language Management • Fluid Topics

Dec 3, 2016 | Reading Time: 5 minutes

In this blog post, we sift through the complexity of multilingual content delivery, especially when not all documents are translated and available in every language.

The situation

Your technical documentation is published in multiple languages: 2, 10, even 20 or more for some of you. The easy case would be to have all documents — meaning here any piece of content such as topics, files, images, entire books, etc. — existing in every language and always in sync. But we know that the reality is far from being that simple.

Usually, your tech writing team creates content in one reference language. Let’s say English to keep it simple, because it’s the lingua franca of technical documentation and true at 90%. This could mean that even if an expert provides some content in another language, it is likely translated in English. Let’s call it the Reference Corpus.

multilingual content delivery — In this example, our reference corpus is made of 20,000 topics assembled in 100 manuals.

The challenge

You are a global company with some end-users who are not speaking English. Let’s say they speak Japanese and French. You have two possibilities:

You translate everything -> “English Corpus” = “French Corpus” = “Japanese Corpus”.
Then it’s easy. You are sure that whatever language they speak, your users have access to all information. Your users read in their native language, and you can automatically set the appropriate search language based on the browser language or the user preference.
Only a subset of the entire corpus is translated (the content dedicated to end-users), and part of the documentation is not translated but available only in English (the content for experts and partners for example). Even worse, each country adds some documentation that exists only in its language (regulatory and compliance notices, local features, etc.).
The situation looks like this:

Now the question is very simple: How should your portal behave? When a user searches for information, what content blocks (see image above) should be looked upon?

Let’s keep it simple and say that our user, Jean, is French. Jean chooses the French UI. As in case 2, part but not all of your content is translated into French. Jean needs to know how to fix a component and runs a search for “X124 installation”. What should happen?

A- The search is run on all documents (En, Fr and Ja)

multilingual content translated query

Pros:

No risk of missing any information since, by default, each and every byte of content is included in the search.

Cons:

Results are possibly duplicated, or even irrelevant. Keywords exist in different languages (here in Fr en En) and an ambivalent query could lead to misleading results.
Returning results in a language a user does not understand is inappropriate. How would you feel about a page stuffed with Japanese text if you don’t read Japanese? Surfacing unnecessary and overwhelming information creates a poor and inefficient user experience. Users could refine their searches by filtering in or out desired languages. But, why not make it automatic (see next case)? (see next case).

B- The search is run in one selected language (Fr for Jean)

Not_Translated_Query

Pros:

Consistent search results and easy user experience, as only French results are displayed.

Cons:

Possibly missing information: Some content has not been translated into French and exists only in English, which means that some critical information relevant to the situation may be missing.
Even worse, the search could return “0 result”, making the user believe that there is nothing to help him. And, the probability of having no search results is increased significantly.

We see that both ways of searching are causing problems. And, the intermediate approach of searching in the French and English corpus (user and reference languages) leads to the same issues described in A. Then what should we do?

It’s obvious that the search should run on a set of documentation made of the French elements plus the English elements applicable but NOT translated. Let’s call this the Full Virtual Corpus:

Full, because it now encompasses all content relevant to a user
Virtual, because it is made of different blocks of content from different languages

Full_Corpus

But it’s not that simple. Running a search with French keywords on English content does not make much sense, as the keywords and the content won’t match. However, we can assume that the user reads English as the choice of not translating everything was made, because this specific content is meant for people used to English (e.g. sysadmin in IT).

Then there are two ways of doing it:

Low-tech: Execute the query on the French Corpus. Warn the user that some content may be missing and that they should also do a query with English keywords (and run the query on the English corpus, or on the English content part of the Full Virtual Corpus).

Query_Suggest

High-tech: Automatically translate the user query and run two queries at the same time … with the initial keywords on the French subpart of the Full Virtual Corpus and the translated keywords on the English subpart.

Both_Queries

The high-tech approach is what should be done, but it can be quite complex. How do you translate the query? Multiple solutions:

Rely on external generic translation web services. Certainly the approach requiring the least amount of work, but it may lead to poor results, as those web services are not aware of the specificities and vocabularies of your domain and content.
As users search with keywords and not phrases, you can provide a dictionary of terms — mixing generic dictionaries and specific dictionaries for your business term — and do word-for-word translation. This requires significant work, as you must create and maintain those dictionaries.
Here is the trick: Those dictionaries already exist! They are the translation memories. Since part of your content has been translated, this has generated aligned dictionaries of terms and expressions. Just use it.

Important: Even if we know the language of that user (from their browser or preferences), it’s important to keep in mind that there is a difference between the UI language and the content language. The UI may be available in 10 languages, but your documentation only in 3. A German user may set their UI to German but still has to select a different “preferred language for content” if none of your documentation is available in German (as in our example). Either you declare a default fallback language for German (“if UI in German than search in English”), or the user selects it manually.

As you have read in this blog post, multilingual content delivery is not that simple. It doesn’t just mean that you must index all content. It requires understanding and technology to best match your situation. Being on the edge of dynamic content delivery, Fluid Topics already embeds the tools that you need. We continue to research and investigate with our partners specialized in translation, such as WhP, to always push the limits and make your content the cornerstone of your customer support strategy.

About The Author

Fabrice LACROIX

Fabrice is Fluid Topics visionary thinker. By tirelessly meeting clients, prospects and partners, he is sensing the needs of the market and fueling his creativity to invent the functions that makes Fluid Topics the market leading solution for technical content dynamic delivery.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
Font Awesome	1 years 30 days	The cookie is used by cdn services like CloudFare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information.
Instant page	1 years 30 days	The cookie is used by cdn services like CloudFare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information.
lang	session	This cookie is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
pll_language	1 year	The pll _language cookie is used by Polylang to remember the language selected by the user when returning to the website, and also to get the language information when not available in another way.
Polylang	1 years 19 days 15 minutes	This cookie is set by Polylang plugin for WordPress powered websites. The cookie stores the language code of the last browsed page.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_132D9JVQ7V	2 years	This cookie is installed by Google Analytics.
_gat_UA-58488653-1	1 minute	A variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.
_gcl_au	3 months	Provided by Google Tag Manager to experiment advertisement efficiency of websites using their services.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_hjAbsoluteSessionInProgress	30 minutes	Hotjar sets this cookie to detect the first pageview session of a user. This is a True/False flag set by the cookie.
_hjFirstSeen	30 minutes	Hotjar sets this cookie to identify a new user’s first session. It stores a true/false value, indicating whether it was the first time Hotjar saw this user.
_hjIncludedInPageviewSample	2 minutes	Hotjar sets this cookie to know whether a user is included in the data sampling defined by the site's pageview limit.
_hjIncludedInSessionSample	2 minutes	Hotjar sets this cookie to know whether a user is included in the data sampling defined by the site's daily session limit.
_hjTLDTest	session	To determine the most generic cookie path that has to be used instead of the page hostname, Hotjar sets the _hjTLDTest cookie to store different URL substring alternatives until it fails.
CONSENT	16 years 3 months 8 days 15 hours	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
Pardot		The cookie is set when the visitor is logged in as a Pardot user.

Cookie	Duration	Description
bscookie	2 years	This cookie is a browser ID cookie set by Linked share Buttons and ad tags.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
NID	6 months	NID cookie, set by Google, is used for advertising purposes; to limit the number of times the user sees an ad, to mute unwanted ads, and to measure the effectiveness of ads.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
_hjSession_2701483	30 minutes	No description
_hjSessionUser_2701483	1 year	No description
_lfa_test_cookie_stored	past	No description
AnalyticsSyncHistory	1 month	No description
BIGipServerab08web-nginx-app_https	session	No description
cookielawinfo-checkbox-functional	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
FT_LOCALES	1 year	No description
FT_SESSION	1 month	No description
li_gc	2 years	No description
lpv880192	30 minutes	No description
route	12 hours	No description available.
UserMatchHistory	1 month	Linkedin - Used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.
visitor_id880192	10 years	No description
visitor_id880192-hash	10 years	No description

You talkin’ to me? Language Management in Fluid Topics

The situation

The challenge

About The Author

Fabrice LACROIX

Browse by category

Latest posts

The situation

The challenge

About The Author

Fabrice LACROIX

Browse by category

Latest posts

Looking for our logo?