In Corsican language, French word ‘femme’ can be translated, depending on the context
either into donna (woman)
or into moglia (wife)
The above sample still contains a lot of vocabulary and grammatical disambiguation errors (easy/medium difficulty), but it handles successfully the semantic disambiguation (hard) of ‘femme’, two instances of which are properly translated into moglia (wife). As the Corsican proverb says, in a cianga l’oru luci sempri (in the mud, gold is still shining).
Anaphora resolution usually refers to pronouns. But we face here a special case of anaphora resolution that relates to an adjective. The following sentence: ‘un vase de Chine authentique’ (an authentic vase of China) is translated erroneously as un vasu di China autentica, due to erroneous anaphora resolution. In this sample, the adjective ‘authentique’ refers to ‘vase’ (English: vase) and not to ‘Chine’ (China).
The same goes for ‘une chanson du Portugal mythique’, where ‘mythique’ refers to ‘chanson’ and not to ‘Portugal’.
What are the conditions for a given endangered language to be a candidate for rule-based machine translation? For a given endangered language to be a candidate for rule-based machine translation, some requirements are in order. There is notably need for:
a dictionary: some specialized lexicons are useful too
a list of locutions and their translation: to be more accurate what is needed are noun locutions, adjective locutions, adverbial locutions, verbal locutions and their translations in other language.
a detailed grammar (in any language): ideally, the grammar should be very detailed, mentioning notably irregular verbs, noun plurals, etc. Subjonctive, conditional tenses must also be accurately described.
in addition, elision rules, euphony rules, should also be described.
most importantly: a description of the main variants of the language and their differences. This is needed to handle what we can call the ‘variant problem’ (we shall say a bit more about this later): as an effect of diversity, endangered languages are often polynomic and come with variants. But translation must be coherent and a mix of several variants is not acceptable as a translation.
Let us mention that endangered languages are commonly associated with another language, being in a diglossia relationship one with another. To take an example, Corsican language is associated with French. So we consider the French-Corsican pair, and what is relevant is a French-Corsican. If we consider the sardinian gallurese language (‘gaddhuresu’), the relevant pair is Italian-Gallurese. Other relevant pairs are: