Tag Archives: disambiguation

Proper nouns: handling some false positives

Now handling some kind of false positives related to proper nouns translation. As this type of error is somewhat widespread, it could result in a 0.2% increase in overall accuracy.

Of interest in the present case:

  • recall that ‘détroit’ is French name for strittonu (straight, i.e. the straight of Gibraltar)
  • ‘Tours’ (the French city of) is also left untranslated, also being ambiguous with torri (towers) or ghjiri (turns)
  • 12th Street riot, Michigan are left untranslated
  • self-evaluation finds erroneously 2 vocabulary errors : riot and ‘th’ in 12th

Proper nouns: false positives again

Now we face false positives again: French proper noun ‘Détroit’ is translated erroneously into Strittonu when it shouls have been left untradslated, being a proper noun.  The ambiguity of ‘Détroit’ lies in the fact that it can be translated either into:

  • Détroit, the city
  • Strittonu, the Corsican word strittonu/strittone being the corresponding word for French noun ‘détroit’ (strait, i.e. the strait of Messina).

This raises the general issue of the proper disambiguation of proper nouns.

A Special Case of Anaphora Resolution

After improper anaphora resolution

Anaphora resolution usually refers to pronouns. But we face here a special case of anaphora resolution that relates to an adjective. The following sentence: ‘un vase de Chine authentique’ (an authentic vase of China) is translated erroneously as un vasu di China autentica, due to erroneous anaphora resolution. In this sample, the adjective ‘authentique’ refers to ‘vase’ (English: vase) and not to ‘Chine’ (China).

The same goes for ‘une chanson du Portugal mythique’, where ‘mythique’ refers to ‘chanson’ and not to ‘Portugal’.

After appropriate anaphora resolution

Four consecutive ambiguous words


Translating the following sentence: ‘ce fait est unique’ is not as easy as it could seem at first glance. In effect, it is made up of four consecutive ambiguous words:

  • ‘ce’: ‘ssu (demonstrative pronoun, this) or ciò (it, relative pronoun)
  • ‘fait’: fattu (masculine singular noun, fact), fattu (past participe, done) or faci (does, third person singular of the verb to do at the present tense)
  • ‘est’: estu (masculine singular noun, east) or (is, third person singular of the verb to be at the present tense)
  • ‘unique’: unicu (masculine singular adjective, unique in English) or unica (feminine singular adjective, unique in English)

Solving fivefold ambiguity: translation for French ‘poste’

French word ‘poste’ has (at least) fivefold ambiguity. For it can designate:

  • ‘poste’ (masculine singular noun) : postu, masculine singular noun (set, i.e. television set)
  • ‘poste’ (masculine singular noun): posta, feminine singular noun (position): erroneously translated as postu in the present case ; it should read a so posta
  • ‘poste’ (feminine singular noun) : posta, feminine singular noun (post office)
  • ‘poste’: impostu (from the verb impustà (‘poster’, to station o.s.) at singular first person)
  • ‘poste’: imposta (from the verb impustà (‘poster’, to station o.s.) at singular third person)

(However, it is more complex than that, since there is another sense of the verb ‘poster’ (to post/to mail).

Chemistry: translating acid names


Translating this series of acid names is not as easy as it could seem at first glance. In effect, each acid name is composed of three consecutive ambiguous names:

  • ‘l’ is ambiguous between the masculine (u, the) or feminine (a, the) definite article
  • ‘acide’ is ambiguous betwwen acidu (acid, masculine singular noun), acitu (acid, masculine singular adjective) or acita (acid, feminine singular noun)
  • ‘daturique’, etc. are all ambiguous since that can be either masculine singular (daturicu, daturic) or feminine singular (daturica, daturic) adjectives.

Another case of firstname ambiguity: ‘Noël’

Translation of the French word ‘Noël’ yields another case of ambiguity. For ‘Noël’ can translate:

  • either into Natali (Christmas, Christmas Day): the annual festival commemorating Jesus Christ’s birth
  • or into, identically, Natali (‘Noel‘): the firstname

Now it seems there is no case of disambiguation, since in either case, ‘Noël’ in French translates into Natali (Natali in sartinese and taravese variants; Natale in cismuntincu variant). But ambiguity lurks when one considers some sentences including ‘Noël’. Let us consider then the following sentence: ‘Je l’ai donné à Noël.’ Now it can be translated:

  • either into: L’aghju datu in Natali. (I gave it at Christmas.)
  • or into: L’aghju datu à Natali (I gave it to Noel.)

since French preposition ‘à’ translates differently in both cases. A phenomenon of the same nature occurs when one considers translation from French to English.

Interestingly, when the two ambiguous consecutive words are repeated, ambiguity vanishes. Since ‘Je l’ai donné à Noël à Noël.’ translates unambiguously into L’aghju datu à Natali in Natali (I gave it to Noel at Christmas.). For we can ignore the order: L’aghju datu in Natali à Natali (I gave it at Christmas to Noel.) amounts to the same. In this last case, the  translation is meaning-preserving.

Interesting case of first name disambiguation

Here is an interesting case of first name disambiguation for machine translation. Consider the following first name ‘Camille’. It can apply to both genders. In Corsican (taravese or sartinese variants) it translates either into Cameddu (masculine) or Camedda (feminine). In some cases, the corresponding disambiguation relies on mere grammatical grounds. For example, ‘Camille était beau’ translates into Cameddu era beddu (Camille was beautiful), on grammatical grounds alone. The same goes for ‘Camille était belle’, that translates straightforwardly into Camedda era bedda (Camille was beautiful), according to the adjective gender.

Now the related disambiguation can result in a hard case, relying only on semantic context. Hence, ‘Camille était pacifique” can translate either into Cameddu era pacificu or into Camedda era pacifica, depending on the context (which can be text or even an image…). In effect, it cannot be translated merely on grammatical grounds, since ‘pacifique’ (pacific) is gender-ambiguous: it can translate either into pacificu of pacifica.

Now the same goes for French first name ‘Dominique’ (Dominic), which translates either into ‘Dumenicu (masculine) or ‘Dumenica‘ (feminine). Hence, ‘Dominique était pacifique’ (Dominic was pacific) can translate either into ‘Dumenicu era pacificu‘ or into ‘Dumenica era pacifica‘, depending on the context.

Word-sense disambiguation: first test of new engine

Now testing the new engine with the semantically ambiguous French ‘échecs’ = fiaschi/scacchi (failures/chess).

What is interesting here is that semantic disambiguation transfers successfully into English (although the French/English engine is still in its infancy as there are still a lot of grammatical errors):

Now further tests are needed with some other semantically ambiguous words:

  • ‘défense’: defense/tusk; Corsican: difesa/sanna
  • ‘fils’: sons/wires; Corsican: figlioli/fili
  • ‘comprendre’:
    understand/comprise; Corsican: capisce/cumprende
  • ‘vol’: flight/theft; Corsican: bulu/arrubecciu
  • ‘voler’: fly/steal; Corsican: bulà/arrubà
  • ‘échecs’: chess/failures; Corsican: scacchi/fiaschi
  • ‘palais’: palace/palaces/palate/palates; Corsican: palazzu/palazzi/palate/palates

In the background, the unresolved threefold ambiguity of French ‘partie’ = parti/partita/partita (part/game/gone) is lurking…