Monthly Archives: March 2018

Disambiguating ‘nombre de’

Let us consider here the disambiguation of ‘nombre de’ which can be according to the cases:

  • a singular masculine noun followed by a preposition: in this case, ‘nombre de’ translates to numaru di (number of)
  • an indefinite pronoun: in this case, French ‘nombre de’ translates to Corsican into bon parechji (many, a great many)

Si tratta quì di a disambiguazioni di ‘nombre de’ chì pò essa siont’è i casi:

  • un nomu maschili singulari suvitatu da una pripusizioni: in ‘ssu casu, ‘nombre de’ si traduci pà numaru di
  • un prunomu indefinitu: in ‘ssu casu, ‘nombre de’ pò essa traduttu in corsu da bon parechji

Large-scale testing and self-evaluation

Now performing large-scale testing with self-evaluation on full wikipedia articles:

  • Italie: Self-evaluation: 1 – 708/14451 = 95,10% (708 errors on 14451 words)
  • Aristote: Self-evaluation: 1 – 1264/22885 = 94,48%
  • Everest: Self-evaluation: 1 – 606/10530 = 94,25%
  • Mer méditerranée: Self-evaluation: 1 – 235/5088 = 95,38%
  • démocratie: Self-evaluation: 1 – 515/11430 = 95,49%

Eccu i risultati di i pruvaturi à grandi scala incù autovalutazioni, fatti annantu à l’articuli cumpletti di Wikipedia in francesu:

Italie: autovalutazioni: 1 – 708/14451 = 95,10% (708 arrori annantu à 14451 paroli)
Aristote: autovalutazioni: 1 – 1264/22885 = 94,48%
Everest: autovalutazioni: 1 – 606/10530 = 94,25%
Mer méditerranée : autovalutazioni: 1 – 235/5088 = 95,38%
démocratie: autovalutazioni: 1 – 515/11430 = 95,49%

Rough typology of remaining errors (updated march 2018)

French to Corsican: performing on French wikipedia sample test currently amounts to 94% on average. Below is a rough typology of remaining errors (presumably an average scoring of 95% on the open test should be attainable on the basis of correction of ‘easy’ tagged errors):

  • unknown vocabulary: 40% (easy)
  • basic disambiguation: 25%  (easy or medium difficulty)
  • false positives: 5% (medium difficulty or hard). This type of error  is mostly related to proper nouns, i.e. English termes that should remain un translated. For example: ‘North American Aviation’ translates erroneously into ‘North American Aviazione’. In this case, ‘Aviation’ should remain untranslated.
  • inadequate locution: 10% (medium difficulty or hard)
  • anaphora resolution related to complex sentence’s structure: 5% (hard)
  • semantic disambiguation: 5% (hard). For example, disambiguating French ‘échecs’ = fiaschi/scacchi (failures/chess)
  • erroneous accord related to gender mismatch from French to Corsican, i.e. (i) words that are masculine in French and feminine in Corsican language; and (ii) ) words that are feminine in French and masculine in Corsican language: 1% (medium difficulty).
  • erroneous accord related to number mismatch from French to Corsican, i.e. (i) words that are singular in French and plural in Corsican language; and (ii) ) words that are plural in French and singular in Corsican language (for example French ‘la canicule’ translates into ‘i sulleoni’ in Corsican language: 1% (medium difficulty).
  • specific grammatical case: 2% (hard)
  • anaphora resolution associated with gender or number mismatch: 1% (hard)
  • unknown, unclassified: 6% (hard)