HelpSmart detection of strings within your web content
Smart detection of strings within your web content
Web pages are complex documents containing graphical and text components. The text can contain structured markup in HTML such as tables, lists, buttons etc. But this text can also be punctuated with other content which is obvious to a human, but hard for a computer to process.
For the Sprawk translation engine to function most efficiently, it needs to detect and reuse text components at all levels. For example, you may have a list such as:
<ul> <li>Angola (0.6%)</li> <li>Argentina (1.2%)</li> <li>Austria (1.7%)</li> <li>Australia (2.4%)</li> </ul>
The translation engine first processes the HTML markup, recognising the items in your list:
- Angola (0.6%)
- Argentina (1.2%)
- Austria (1.7%)
- Australia (2.4%)
But you don't want to have to approve translation for every combination of countries and percentages. When Sprawk finds no specific translation, it breaks up the translation further using a combination of splitters. In this case each item will be broken into 2 components:
- Argentina
- 1.2%
Each component is processed through the engine. Sprawk detects and translates the country name automatically through it's pre-approved translation database (although you can override the translations if you wish). The percentage value is also automatically identified and handed via a special module. If you are translating to most European languages from English, this will be translated automatically to "1,2%".
Sprawk supports a wide variety of splitters such as for:
Nested span tags
<span class=\"average-rating\">Average: <span>3.7</span></span>
"<span class=\"total-votes\">(<span>3</span> votes)</span>- Average
- 3.7
- {0} votes
Note that Sprawk automatically identifies a "pattern" for the votes. This pattern {0} votes will be recorded in the missing translation queue. So you just need to approve only one translation per language and the number of votes will automatically be updated when your page is translated.
Punctuation such as slashes and semicolons
Tags: entertainment; movies; celebrities
or
Tags: entertainment / movies / celebrities
- Tags
- entertainment
- movies
- celebrities
Note that translations automatically are capitalized to match the source. For example, while translating to Swedish, "Tags" would be translated to "Taggar" but "entertainment" to "nöje". However, when translating to German, "entertainment" is translated to "Unterhaltung" since the natural form of German nouns is capitalization and having all characters lowercase would be incorrect.
With formatting tags
Sprawk can even handle combinations of tags and punctuation, such as:
<i>Kenya</i>; <em style="color:black">Mvita Constituency</em>; Qubaa Academy; Commission Report; <b>CKRC</b>; Coast Province.
would extract:
- Kenya
- Mvita Constituency
- Qubaa Academy
- Commission Report
- CKRC
- Coast Province
You should not need to alter your existing web site to translate with the Sprawk engine, but you can always test the text extraction using the String collector tool.
blog comments powered by Disqus
