Tag Archives: grammatical type disambiguation

Grammatical word-disambiguation again and again

The main difficulty here seems to lie in the adaptation of the grammatical disambiguation module. Indeed, for the French language, such a module performs disambiguation with respect to about 100 categories. The number of pairs (or 3-tuples, 4-tuples, etc.) of disambiguation, for French, is about 250. The question is: when we change languages, how many categories of n-tuples of disambiguation does this result in? In particular, when one switches from French to Italian, does this result in a big change in the categories to be disambiguated?

Let’s take an example, with a particular category of words to disambiguate. One such category is for example AQfs/Vsing3present (feminine singular adjective or verb in the 3rd person singular present tense). A word in Italian that belongs to this type is ‘stanca’. So we have both uses:

  • ‘è stanca’ (she is tired): AQfs
  • stanca il cavallo’ (it tires the horse): Vsing3present
    In French, we don’t have this kind of disambiguation category directly because the category concerned is broader than that: it includes at least the 1st person singular of the present. Thus we have the word ‘sèche’, which belongs to this type of disambiguation category:
  • ‘la feuille est sèche’ (the leaf is dry): AQfs
  • ‘je sèche mes cheveux’ (I dry my hair): Vsing1present
  • ‘il sèche sa chemise’ (he dries his shirt): Vsing3present

Of course, the code that allows the disambiguation of AQfs/Vsing1present/Vsing3present should also allow the derivation of the disambiguation of AQfs/Vsing3present. But this gives an idea of the kind of problems that arise and the adaptation needed.

If the types of disambiguation are very different from one language to another, it will be necessary to have a disambiguation module which is capable of adapting to many new types of disambiguation and which is therefore very flexible. This appears to be a considerable difficulty for the creation of an eco-system. It seems that Apertium, faced with this difficulty, has chosen a statistical module as a solution for its eco-system. However, the question of whether such a flexible module, adaptable without difficulty from one language to another, is feasible in the context of rule-based MT, remains an open question.

Grammatical word-disambiguation again

The challenge is especially that of generalizing the grammatical word-disambiguation to several languages. Creating a module of grammatical word-disambiguation for each language appears to be a long and arduous task. This seems to be the main difficulty. But if a module specific to a given language can be generalized to several other languages, this could be an important advance in the field of rule-based machine translation (which simulates human reasoning seems to me a more appropriate term).

We can describe the problem more precisely. We have about 100 grammatical categories for a given language. We also have about 300 ambiguous grammatical types – to fix ideas – which are: e.g., adverb or preposition, singular masculine noun or singular masculine adjective, etc. The problem is to describe an algorithm to remove the ambiguity and determine the corresponding grammatical type according to the context.

Now rewriting the complete module of disambiguation by grammatical type, so that it can be used and adapted to other languages (Italian in the first place). It remains to be seen if this can be done.

On the implementation of grammatical disambiguation

Grammatical disambiguation – i.e. whether ‘maintenant’ is and adverb (now) or the gerundive (maintaining) of the verb ‘maintenir’ – seems to be the crucial issue for the adoption of the rule-based model or statistical model for machine translation. This problem is widespread and seems to concern all languages. For the French language, this problem of grammatical disambiguation concerns about 1 word out of 7. Effective grammatical disambiguation is difficult to implement. The advantage of adopting the statistical method for grammatical disambiguation is that the same method can be generalized and used for several languages. In the case of the rule-based model, the module of grammatical disambiguation must be rewritten for each language, which generates considerable complexity and requires a very significant development time. Therefore, a rule-based method for grammatical disambiguation that can be easily applied to several languages would be of great interest. This seems to be the main difficulty that rule-based machine translation is designed to overcome.

But if we want an artificial intelligence that not only provides an (mostly accurate) answer without being able to really explain its reasoning, but is truly able to emulate human reasoning and to justify and describe step by step the reasoning that leads to its answer, then it is worth the effort.

Word sense disambiguation: a hard case

Let us consider a hard case for word sense disambiguation, in the context of French to Corsican MT. But the same goes for French to English MT. It relates to French words such as: ‘accomplit’, ‘affaiblit’, ‘affranchit’, ‘alourdit’, ‘amortit’. The corresponding verbs ‘accomplir’ (to fulfill, to accomplish), ‘affaiblir’ (to weaken), ‘affranchir’ (to free), ‘alourdir’ (to burden), ‘amortir’ (to damp) have the same word for simple present and simple past at the third person singular: respectively ‘accomplit’, ‘affaiblit’, ‘affranchit’, ‘alourdit’, ‘amortit’. The upshot is that a single sentence such as: ‘Il affaiblit sa position.’ can be translated either into he weakens his position or into he weakened his position. If the context is unambiguous with regard to the sence of the discourse, the correct tense can be adequately chosen. But in the lack of informative context, it would be opportune to let the ambiguity prevail.

It should be pointed out that any such verbs are not rare. A more complete list includes: accomplit, affaiblit, affranchit, alourdit, amortit, anéantit, anoblit, aplatit, arrondit, assombrit, bannit, bâtit, blanchit, blondit, démolit, éblouit, emplit, enfouit, enhardit, enlaidit, ennoblit, envahit, épaissit, étourdit, exclut, franchit, glapit, investit, jaunit, jouit, munit, noircit, obéit, obscurcit, occit, périt, réagit, régit, réjouit, remplit, répartit, resplendit, rétrécit, rit, rougit, rouvrit, saisit, sévit, surgit.

.

More on grammatical type disambiguation

Let us focus on grammatical type disambiguation, which is a subproblem of word disambiguation. General grammatical types are: verbs, nouns, adjectives, adverbs, prepositions, gerundive, etc. But for grammatical type disambiguation purposes, more accuracy is in order: instances of grammatical types are then: masculine singular noun, feminine singular noun, masculine plural noun, feminine plural noun, masculine singular adjective, feminine singular adjective, masculine plural adjective, feminine plural adjective, adverbs, prepositions, gerundive, etc. Now grammatical type disambiguation can occur between two different grammatical types (in the above-mentioned form). For example, an ambiguity can occur between preposition and gerundive. In French, this is notably the case for ‘devant’ and ‘maintenant’. For ‘devant’ can either be an adverb (in front) or a gerundive (from the verb ‘devoir’, to have to). Similarly, ‘maintenant’ can either be an adverb (now) or a gerundive (from the verb ‘maintenir’, to maintain). It should be clear now that ‘devant’ and ‘maintenant’ are both ambiguous with regard to their grammatical type. In English, depending on the relevant grammatical type, ‘devant’ is ambiguous between having to or in front). In the same way, ‘maintenant’ is ambiguous between now and maintening.
In order to disambiguate French words ‘devant’ or ‘maintenant’, rule-based MT needs a disambiguation module that is able to distinguish whether ‘devant’ or ‘maintenant’ are adverbs or gerundives.

(not to mention the fact that ‘devant’ can also be a preposition, for the sake of clarity).