Help
API Patterns
Sprawk's translation engine is a hybrid (rule- and statistical) and as such uses both a set of rules to find structure in incoming sentences and a statistical model of each language which incorporates data such as how common words and phrases are, how often pair of words sit next to each other in a sentence, or just how often two words appear anywhere in the same sentence. This data helps the engine to make the best translation choices.
Sprawk customers can use the API to query text segments for certain types of words and phrases, which may be useful in finding key phrases in documents or identifying underdiserable writing styles. Group-specific patterns can also be created by you in order to customise how sprawk translates your requests. A group-specific pattern will take precedence over standard sprawk translation rulees.
Basic patterns
Patterns follow a quasi-java MessageFormat format, for example:
which will match the following for example:
- a big boat
- a red rose
- a modern houses
The final option matches even though it is grammatically incorrect. However, the sprawk database has a lot of information about common word forms and so a more complex pattern will allow us to discount the final option:
Multiple features
Pattern components can require multiple features to match, for example:
This requires that the word has all three features matching, for the pattern to pass as a match.
Note that feature pair should be separated by only the "" and no extra spaces;
Numbers in patterns
The numbering of patterns starts from 0, and each pattern component must start with a number, but no necessarily in order. Patterns can be used to give the system hints about word order during translation and so it is important that the ID numbers of the components match up. For example:
Since, in French, the adjective usually comes after the noun, these two patterns could be grouped into a single meaning. If the pattern matches the incoming phrase, the components will be translated into French and placed into the order: Article, Noun, Adjective as per the pattern.
Negative features
Patterns allow you to specificy features that are not allowed to be present. If ANY of the features
listed in a negative pattern are found, then the whole word is deemd to have failed the match. For example if we had the following word in Swedish:
- Type:Adjective;Plurality:Plural;Specificity:Indefinite
The following pattern would fail:
But this pattern would still match:
Pruning with negative patterns
Negative patterns can also be used without specifying a value. In this way they can be used to quickly discount certain word forms. Consider the following word candidates:
- eat
- Type:Verb;Tense:Present
- can eat
- Type:Verb;Modal:Can
- may eat
- Type:Verb;Modal:May
- should eat
- Type:Verb;Modal:Should
The following rule will fail all but the first candidate, thus quickly removing undesirable forms in this situation:
Being specific
The pattern-processing engine in Sprawk is quite flexible and allows for multiple simple patterns to be combined to form powerful and quite complex results. When patterns are in "NoSubtree" mode, and an incoming fragment is being processed, different patterns can match the same words and narrow down the word options further each time. For example, the following sentence:
| Pattern | Effect |
|---|---|
| {0#Type:PronounPersonal;Case:Nominative} {1#Type:Verb} | {0#Type:Verb} {1#Type:Noun;Case:Accusative} |
Checking for specific text
Patterns can also require that the word has certain text, or a combination of text and features. Note that a pattern can NEVER have any spaces outside of the single quotes:
...but not...
Matching root forms
It is also possible to match all forms of a particular word, for example:
... will match:
- be quiet
- was quiet
- had been quiet
... and ...
... will match:
- vara hopplös
- var hopplösa
- har varit hopplöst
Note that the first ",''" simply tells the pattern engine that we don't care what the text of the word matched it, but that it's only the root form of the word (the second string in single quotes) which need to be checked for a correct match.
Flexible-length patterns
Sprawk also supports patterns with variable length, but only when the variable-length component is 'wrapped' by two fixed components, be they textual components or other pattern components.
These are all valid patterns:
- wurde {0...} {1#Type:PastParticiple}
- {0,'wurde'} {0...} {1#Type:PastParticiple}
- {0,'','starta'#Type:Verb} {1...} om
These are not:
- {0#Type:Clause} {1#Type:Conjunction} {2...}
- Jag är {0...}

English
Deutsch
français
svenska


