Tag Archives: semantic disambiguation

Performing our first open test of the year

Let us comment on the remaining errors encountered in the above open test:

  • French ‘carrière’ remains undisambiguated: either carriera (career) or cava (quarry): two occurrences
  • ‘de’: French ‘de’ is perhaps the most difficult word to translate into another language, due to its general polymorphism
  • ‘national-socialiste’: missing vocabulary
  • l’ within ” l’empeche “: pronoun error
  • it should be pointed out that ‘Etats-Unis’ remains untranslated due to the fact that it is erroneously written, with a beginning E instead of É

The result is 1 – (5/169) = 97.04%. To be noticed: ambiguous French word ‘partie’ (‘durant la première partie’, during the first part) is correctly disambiguated into parti (part), instead of partita (game, match).

It seems that an average result of 95% is currently being consolidated, and that an average result of 96% is a target that should be achievable within a year.

Disambiguating ‘nombre de’

Let us consider here the disambiguation of ‘nombre de’ which can be according to the cases:

  • a singular masculine noun followed by a preposition: in this case, ‘nombre de’ translates to numaru di (number of)
  • an indefinite pronoun: in this case, French ‘nombre de’ translates to Corsican into bon parechji (many, a great many)

Si tratta quì di a disambiguazioni di ‘nombre de’ chì pò essa siont’è i casi:

  • un nomu maschili singulari suvitatu da una pripusizioni: in ‘ssu casu, ‘nombre de’ si traduci pà numaru di
  • un prunomu indefinitu: in ‘ssu casu, ‘nombre de’ pò essa traduttu in corsu da bon parechji

Semantic disambiguation of French ‘femme’: in the mud, gold is still shining

In Corsican language, French word ‘femme’ can be translated, depending on the context

  • either into donna (woman)
  • or into moglia (wife)

The above sample still contains a lot of vocabulary and grammatical disambiguation errors (easy/medium difficulty), but it handles successfully the semantic disambiguation (hard) of ‘femme’, two instances of which are properly translated into moglia (wife). As the Corsican proverb says, in a cianga l’oru luci sempri (in the mud, gold is still shining).

French samples are from the French corpora of the University of Leipzig.

Word-sense disambiguation: first test of new engine

Now testing the new engine with the semantically ambiguous French ‘échecs’ = fiaschi/scacchi (failures/chess).

What is interesting here is that semantic disambiguation transfers successfully into English (although the French/English engine is still in its infancy as there are still a lot of grammatical errors):

Now further tests are needed with some other semantically ambiguous words:

  • ‘défense’: defense/tusk; Corsican: difesa/sanna
  • ‘fils’: sons/wires; Corsican: figlioli/fili
  • ‘comprendre’:
    understand/comprise; Corsican: capisce/cumprende
  • ‘vol’: flight/theft; Corsican: bulu/arrubecciu
  • ‘voler’: fly/steal; Corsican: bulà/arrubà
  • ‘échecs’: chess/failures; Corsican: scacchi/fiaschi
  • ‘palais’: palace/palaces/palate/palates; Corsican: palazzu/palazzi/palate/palates

In the background, the unresolved threefold ambiguity of French ‘partie’ = parti/partita/partita (part/game/gone) is lurking…

Feigenbaum test and semantic disambiguation

Now it is patent that there cannot be successful  Feigenbaum test (i.e. not only occasional Feigenbaum hits, but regular and average performance) without an adequate treatment of semantic disambiguation. Arguably, it is one hard problem of machine translation. Here are some typical instances:

  • ‘défense’: defense/tusk; Corsican: difesa/sanna
  • ‘fils’: sons/wires; Corsican: figlioli/fili
  • ‘comprendre’:
    understand/comprise; Corsican: capisce/cumprende
  • ‘vol’: flight/theft; Corsican: bulu/arrubecciu
  • ‘voler’: fly/steal; Corsican: bulà/arrubà
  • ‘échecs’: chess/failures; Corsican: scacchi/fiaschi
  • and the fourfold ambiguous ‘palais’: palace/palaces/palate/palates; Corsican: palazzu/palazzi/palate/palates

In short: no successful semantic disambiguation = no genuine successful  Feigenbaum test. Semantic disambiguation engine needs to be rewritten.

The disambiguation of French ‘fils’ again: scoring 98.42%

Scoring 1 – 2/127 = 98.42%. Of interest:

  • ‘de 839 à sa mort’ (from 839 to his death) should read: da u 839 à a so morte. French ‘de’ translates either into di or into da in Corsican language (to simplify matters, since in certain cases, being a partitive article, it translates into nothing).
  • now we face again the multi-ambiguous French ‘fils’, which can translate into: i) figliolu, masculine, singular (son) ii) figlioli, masculine, plural (sons) iii) fili, masculine, plural (wire/wires). In the present case, ‘Fils du roi…’ should translate Figliolu di u rè… (Son of King…).

To notice: five consecutive 100% sentences.

With regard to the Feigenbaum test: failed again. Arguably, the first error is of an acceptable kind, in this context. But the ‘fils’ error is a gross one, that a human would not do…