Tag Archives: MT self-improvement

Autonomous MT system

Let us speculate about what could be an autonomous MT system. In the present state of MT we provide rules and dictionary to the software (rules-based translation) or we feed it with a corpus regarding a given pair of languages (statistical MT). But let us imagine that we could do otherwises and build an autonomous MT system. We provide the MT system with a corpus regarding a given source language. It analyses, first, the thoroughly this language. It begins with identifying single words. It creates then grammatical types and assigns then to the vocabulary. It also identifes locutions (adverbial, verbal, adjective locutions, verb locutions, etc.) and assigns them a grammatical type. The MT system also identifies prefixes and suffixes. It also computes elision rules, euphony rules, etc. for that source language.
Now the autonomous MT system should, second, do the same for the target language.
The MT system creates, third, a set of rules for translating the source language into the target one. For that purpose, the MT system could for example assign a structured reference to all these words and locutions. For instance, ‘oak’ in English refers to ‘quercus ilex’, ‘cat’ refers’ to ‘felis sylvestris’. For abstract entities, we presume it would not be a trivial task… Alternatively but not exclusively, it could use suffixes and exhibit morphing rules from the source language to the target one.

Is it feasible or pure speculation? It could be testable. Prima facie, this sounds like a different approach to IA than the classical one. It operates at a meta-level, since the MT system creates the rules and in some respect, builds the software.

Evaluation of machine translation: why not self-evaluation?

Evaluation of machine translation is usually done via external tools (to cite some instances: ARPA, BLEU, METEOR, LEPOR, …). But let us investigate the idea of self-evaluation. For it seems that the software itself is capable of having an accurate idea of its possible errors.

In the above example, human evaluation yields a score of 1 – 5/88 = 94.31%. Contrast with self-evaluation which sums its possible errors: unknown words and disambiguation errors, thus entailing a self-evaluation of 92,05%, due to 7 hypothesized errors. In this case, self-evaluation computes the maximum error rate. But even here, there are some false positives: ‘apellation’ is left untranslated, being unrecognized. In effect, the correct spelling is ‘appellation’. To sum up: the software identifies an unknown word (and lefts it untranslated) and counts it as a possible error.

Let us sketch what could be the pros and cons of MT self-evaluation. To begin with, the pros:

  • it could provide a detailed taxonomy of possible errors: unknown words, unresolved grammatical disambiguation, unresolved semantical disambiguation, …
  • it could identify precisely the suspected errors
  • evaluation would be very fast and uncostly
  • self-evaluation would work with whatever text or corpus
  • self-evaluation could pave the way to further self-improvement and self-correction of errors
  • its reliability could be good

And the cons:

  • MT may be unaware of some types of errors, i.e. errors related to expressions and locutions
  • MT may be unaware of some types of errors, i.e. errors related to expressions and locutions
  • MT self-evaluation could be especially blind to grammatical errors
  • it would sometimes count as unknown words some foreign words that should remain untranslated
  • MT would be unaware of erroneous disambiguations