More on polymorphic disambiguation…

Let’s take another look at polymorphic disambiguation. We shall consider the French word sequence ‘nombre de’. The translation into Corsican (the same goes for English and other languages) cannot be identical, because ‘number of’ can be translated in two different ways. In the sequence ‘mais nombre de poissons sont longs’ (but many fish are long), ‘number of’ is an indefinite determiner: it translates as bon parechji (many). On the other hand, in the sequence ‘mais le nombre de poissons est supérieur à dix’ (but the number of fish is greater than ten), ‘nombre de’ is a common name followed by the preposition ‘de’: it is translated by numaru di (number of). Statistical MT does usually better than human-like (rule-based) MT at polymorphic disambiguation (I did a test with both sentences with Deepl and Google translate, and both of them successfully solve the relevant polymorphic disambiguation), but it turns out that human-like (rule-based) MT is also capable of handling that.

A hard case for disambiguation: polymorphic disambiguation

Let us investigate an issue that relates to disambiguation. It is a hard case that needs to be addressed: I shall call it in what follows, for reasons that will become clearer later, polymorphic disambiguation. Let us take an example. It relates to the translation of the two consecutive words: ‘de fait’. The first French sentence ‘De fait, il part.’ translates into Difatti, parti‘ (Actually, he’s leaving.): in this case, ‘de fait’ is considered as an adverbial locution. The second French sentence ‘Il n’y a rien de fait. translates correctly into Ùn ci hè nienti di fattu. (There is nothing done.) where ‘fait’ is now identifed as a participe. The instance at hand concerns French to Corsican, but it should be clear that it arises in the same way within French to English translation. To sum up: the two consecutive words ‘de fait’ can be identifed either as an adverbial locution, or as a preposition (‘de’) followed by a participe (‘fait’, done).

Now we are in a position to formulate the problem in a more general way. It concerns two or more consecutive words, that may be grammatically interpreted differently in the sentence and that may, thus, be translated in a different way. Generally speaking, disambiguation may concern one word (in most cases) but also a group of words. Now polymorphic disambiguation relates then to a given groups of words, i.e. sequences of 2-words, 3-words, 4-words, etc.

A try with online translators shows that statistical MT does better with polymorphic disambiguation. That is truly an interesting difference. So it is a gap that should be filled for rule-based MT.