Let us briefly recall the problem: translating ‘I love you’ might sound trivial, but it’s not. In fact, ‘ti amu‘ is not the best translation. The best translation is ‘ti tengu caru‘ when addressed to a male person, or ‘ti tengu cara‘ when addressed to a female person. Hence the proposed preliminary translation ‘ti tengu caru/cara‘. Such rough translation requires further disambiguation, but on what precise grounds?
Let us look at the issue from an analytical perspective. It appears that we need to assign a reference to the pronoun ‘te’ (you, ti). The latter could be identified according to the context, depending on whether the person ‘te’ refers to is male or female. At this stage, it appears that it is better to consider that the personal object pronoun has an inherent gender: masculine or feminine. This gender does not affect the pronoun itself which remains ‘te’ (you, ti) independently of the gender, but it does have an effect on the words that depend on it, i.e. the adjective caru/cara in Corsican, in the locution ti tengu caru/cara. The upshot is: in this case, ‘te’ (you, ti) is a personal object pronoun, masculine or feminine, whose inherent ambiguity can be solved according to the context.
Let us consider a specific kind of superlative. Such form specific to Corsican language is notably mentioned by grammarian and author Santu Casta, in his Punteghju, who recommends the following translation of “C’était le village le plus riche du canton” (It was the richest village of the canton): Era u più paese riccu di stu cantone (pages 26 & 54-55). The structure is original in the sense that the comparative (più) precedes the noun (campanile, bell tower) that precedes the adjective (altu, high).
French to Corsican: performing on French wikipedia sample test currently amounts to 94% on average. Below is a rough typology of remaining errors (presumably an average scoring of 95% on the open test should be attainable on the basis of correction of ‘easy’ tagged errors):
unknown vocabulary: 40% (easy)
basic disambiguation: 25% (easy or medium difficulty)
false positives: 5% (medium difficulty or hard). This type of error is mostly related to proper nouns, i.e. English termes that should remain un translated. For example: ‘North American Aviation’ translates erroneously into ‘North American Aviazione’. In this case, ‘Aviation’ should remain untranslated.
inadequate locution: 10% (medium difficulty or hard)
anaphora resolution related to complex sentence’s structure: 5% (hard)
semantic disambiguation: 5% (hard). For example, disambiguating French ‘échecs’ = fiaschi/scacchi (failures/chess)
erroneous accord related to gender mismatch from French to Corsican, i.e. (i) words that are masculine in French and feminine in Corsican language; and (ii) ) words that are feminine in French and masculine in Corsican language: 1% (medium difficulty).
erroneous accord related to number mismatch from French to Corsican, i.e. (i) words that are singular in French and plural in Corsican language; and (ii) ) words that are plural in French and singular in Corsican language (for example French ‘la canicule’ translates into ‘i sulleoni’ in Corsican language: 1% (medium difficulty).
specific grammatical case: 2% (hard)
anaphora resolution associated with gender or number mismatch: 1% (hard)
Anaphora resolution usually refers to pronouns. But we face here a special case of anaphora resolution that relates to an adjective. The following sentence: ‘un vase de Chine authentique’ (an authentic vase of China) is translated erroneously as un vasu di China autentica, due to erroneous anaphora resolution. In this sample, the adjective ‘authentique’ refers to ‘vase’ (English: vase) and not to ‘Chine’ (China).
The same goes for ‘une chanson du Portugal mythique’, where ‘mythique’ refers to ‘chanson’ and not to ‘Portugal’.
Here is an interesting case of first name disambiguation for machine translation. Consider the following first name ‘Camille’. It can apply to both genders. In Corsican (taravese or sartinese variants) it translates either into Cameddu (masculine) or Camedda (feminine). In some cases, the corresponding disambiguation relies on mere grammatical grounds. For example, ‘Camille était beau’ translates into Cameddu era beddu (Camille was beautiful), on grammatical grounds alone. The same goes for ‘Camille était belle’, that translates straightforwardly into Camedda era bedda (Camille was beautiful), according to the adjective gender.
Now the related disambiguation can result in a hard case, relying only on semantic context. Hence, ‘Camille était pacifique” can translate either into Cameddu era pacificu or into Camedda era pacifica, depending on the context (which can be text or even an image…). In effect, it cannot be translated merely on grammatical grounds, since ‘pacifique’ (pacific) is gender-ambiguous: it can translate either into pacificu of pacifica.
Now the same goes for French first name ‘Dominique’ (Dominic), which translates either into ‘Dumenicu (masculine) or ‘Dumenica‘ (feminine). Hence, ‘Dominique était pacifique’ (Dominic was pacific) can translate either into ‘Dumenicu era pacificu‘ or into ‘Dumenica era pacifica‘, depending on the context.