Monthly Archives: December 2017

Semantic disambiguation of French ‘femme’: in the mud, gold is still shining

In Corsican language, French word ‘femme’ can be translated, depending on the context

  • either into donna (woman)
  • or into moglia (wife)

The above sample still contains a lot of vocabulary and grammatical disambiguation errors (easy/medium difficulty), but it handles successfully the semantic disambiguation (hard) of ‘femme’, two instances of which are properly translated into moglia (wife). As the Corsican proverb says, in a cianga l’oru luci sempri (in the mud, gold is still shining).

French samples are from the French corpora of the University of Leipzig.

A Special Case of Anaphora Resolution

After improper anaphora resolution

Anaphora resolution usually refers to pronouns. But we face here a special case of anaphora resolution that relates to an adjective. The following sentence: ‘un vase de Chine authentique’ (an authentic vase of China) is translated erroneously as un vasu di China autentica, due to erroneous anaphora resolution. In this sample, the adjective ‘authentique’ refers to ‘vase’ (English: vase) and not to ‘Chine’ (China).

The same goes for ‘une chanson du Portugal mythique’, where ‘mythique’ refers to ‘chanson’ and not to ‘Portugal’.

After appropriate anaphora resolution

Four consecutive ambiguous words


Translating the following sentence: ‘ce fait est unique’ is not as easy as it could seem at first glance. In effect, it is made up of four consecutive ambiguous words:

  • ‘ce’: ‘ssu (demonstrative pronoun, this) or ciò (it, relative pronoun)
  • ‘fait’: fattu (masculine singular noun, fact), fattu (past participe, done) or faci (does, third person singular of the verb to do at the present tense)
  • ‘est’: estu (masculine singular noun, east) or (is, third person singular of the verb to be at the present tense)
  • ‘unique’: unicu (masculine singular adjective, unique in English) or unica (feminine singular adjective, unique in English)

What are the conditions for a given endangered language to be a candidate for rule-based machine translation?

What are the conditions for a given endangered language to be a candidate for rule-based machine translation? For a given endangered language to be a candidate for rule-based machine translation, some requirements are in order. There is notably need for:

  • a dictionary: some specialized lexicons are useful too
  • a list of locutions and their translation: to be more accurate what is needed are noun locutions, adjective locutions, adverbial locutions, verbal locutions and their translations in other language.
  • a detailed grammar (in any language): ideally, the grammar should be very detailed, mentioning notably irregular verbs, noun plurals, etc. Subjonctive, conditional tenses must also be accurately described.
  • in addition, elision rules, euphony rules, should also be described.
  • most importantly: a description of the main variants of the language and their differences. This is needed to handle what we can call the ‘variant problem’ (we shall say a bit more about this later): as an effect of diversity, endangered languages are often polynomic and come with variants. But translation must be coherent and a mix of several variants is not acceptable as a translation.

Let us mention that endangered languages are commonly associated with another language, being in a diglossia relationship one with another. To take an example, Corsican language is associated with French. So we consider the French-Corsican pair, and what is relevant is a French-Corsican. If we consider the sardinian gallurese language (‘gaddhuresu’), the relevant pair is Italian-Gallurese. Other relevant pairs are:

  • Italian-Sassarese
  • Italian-Sicilian
  • Italian-Venetian

Solving fivefold ambiguity: translation for French ‘poste’

French word ‘poste’ has (at least) fivefold ambiguity. For it can designate:

  • ‘poste’ (masculine singular noun) : postu, masculine singular noun (set, i.e. television set)
  • ‘poste’ (masculine singular noun): posta, feminine singular noun (position): erroneously translated as postu in the present case ; it should read a so posta
  • ‘poste’ (feminine singular noun) : posta, feminine singular noun (post office)
  • ‘poste’: impostu (from the verb impustà (‘poster’, to station o.s.) at singular first person)
  • ‘poste’: imposta (from the verb impustà (‘poster’, to station o.s.) at singular third person)

(However, it is more complex than that, since there is another sense of the verb ‘poster’ (to post/to mail).