Tag Archives: self-evaluation

Large-scale testing and self-evaluation

Now performing large-scale testing with self-evaluation on full wikipedia articles:

  • Italie: Self-evaluation: 1 – 708/14451 = 95,10% (708 errors on 14451 words)
  • Aristote: Self-evaluation: 1 – 1264/22885 = 94,48%
  • Everest: Self-evaluation: 1 – 606/10530 = 94,25%
  • Mer méditerranée: Self-evaluation: 1 – 235/5088 = 95,38%
  • démocratie: Self-evaluation: 1 – 515/11430 = 95,49%

Eccu i risultati di i pruvaturi à grandi scala incù autovalutazioni, fatti annantu à l’articuli cumpletti di Wikipedia in francesu:

Italie: autovalutazioni: 1 – 708/14451 = 95,10% (708 arrori annantu à 14451 paroli)
Aristote: autovalutazioni: 1 – 1264/22885 = 94,48%
Everest: autovalutazioni: 1 – 606/10530 = 94,25%
Mer méditerranée : autovalutazioni: 1 – 235/5088 = 95,38%
démocratie: autovalutazioni: 1 – 515/11430 = 95,49%

Testing large-scale self-evaluation

Now testing large-scale self-evaluation. In the present sample, self-evaluation relates to a 7693 words (45437 characters) text from the French wikipedia article on Constance II (Constantius II): 414 errors found.

The present test illustrates well the benefits of self-evaluation: it runs fast, and gives a rough estimation of MT accuracy (± 2%).

Testing self-evaluation accuracy

Testing self-evaluation accuracy: in the present case, it yields a 100% performance. However, there is one error ‘par des explosifs’ should read da splusivi or even da i splusivi (by explosives): a problem of partitive article. Arguably, there is a second grammatical error to which self-evaluation is blind: ‘sont ensuite détruits’ should read sò distrutti dopu (are then destroyed): the problem lies in the fact that prepostion dopu should be placed more adequately before the verb. In short : human evalution yields 98,14% performance in the present case. (by the way, it seems average performance on MT open test is currently nearing 94%.)