Grammatical word-disambiguation again

The challenge is especially that of generalizing the grammatical word-disambiguation to several languages. Creating a module of grammatical word-disambiguation for each language appears to be a long and arduous task. This seems to be the main difficulty. But if a module specific to a given language can be generalized to several other languages, this could be an important advance in the field of rule-based machine translation (which simulates human reasoning seems to me a more appropriate term).

We can describe the problem more precisely. We have about 100 grammatical categories for a given language. We also have about 300 ambiguous grammatical types – to fix ideas – which are: e.g., adverb or preposition, singular masculine noun or singular masculine adjective, etc. The problem is to describe an algorithm to remove the ambiguity and determine the corresponding grammatical type according to the context.

Now rewriting the complete module of disambiguation by grammatical type, so that it can be used and adapted to other languages (Italian in the first place). It remains to be seen if this can be done.

First steps in gallurese language

The translator takes his first steps in translating from French into the Gallurian language. The first tests show a score of 75-80%, with many errors in grammar, spelling and vocabulary. It will be necessary to reach a score of 90% before the result can be published.

The ideal would have been the Italian-Gallurian translation, but this is not yet possible: it will be necessary to translate (i) Italian into French, then (ii) French into Gallurian.

Hinting at the Control problem

The question of choosing the best system to solve the problems posed by word disambiguation in the field of translation seems to be linked to the AGI control problem (how to avoid that an AGI finally turns out to be harmful for its creators). It seems that when we have the choice between several methods to develop an AI, it is wiser to choose the one that allows a better control of the AGI. As far as machine translation is concerned, we should thus prefer in this regard the method that emulates human reasoning, and that produces a response that can be broken down step by step into the reasoning that leads to it. This makes it possible to accurately determine the cause of an error, but also to remedy it. This problem does not only concern machine translation, but has a somewhat extended scope. For grammatical disambiguation concerns machine translation, but also the understanding of natural language, and disambiguation according to context, in the very absence of any translation.

On the implementation of grammatical disambiguation

Grammatical disambiguation – i.e. whether ‘maintenant’ is and adverb (now) or the gerundive (maintaining) of the verb ‘maintenir’ – seems to be the crucial issue for the adoption of the rule-based model or statistical model for machine translation. This problem is widespread and seems to concern all languages. For the French language, this problem of grammatical disambiguation concerns about 1 word out of 7. Effective grammatical disambiguation is difficult to implement. The advantage of adopting the statistical method for grammatical disambiguation is that the same method can be generalized and used for several languages. In the case of the rule-based model, the module of grammatical disambiguation must be rewritten for each language, which generates considerable complexity and requires a very significant development time. Therefore, a rule-based method for grammatical disambiguation that can be easily applied to several languages would be of great interest. This seems to be the main difficulty that rule-based machine translation is designed to overcome.

But if we want an artificial intelligence that not only provides an (mostly accurate) answer without being able to really explain its reasoning, but is truly able to emulate human reasoning and to justify and describe step by step the reasoning that leads to its answer, then it is worth the effort.

The 90% rule

The translation from French to Gallurese is in progress and currently under development. An application for Android is first planned. It will be called ‘traducidori gaddhuresu’. Currently the French-Gallurese translator is undergoing testing. It will only be published if its performance (evaluated by an open test) is above 90%. This is a rule that we apply to ourselves, and is specific to endangered languages. We consider that for them, a poor or low quality translation can be more harmful than useful.

A “traducidori gaddhuresu” in preparation

After the Corsican language, the second endangered language for which we would like to develop a translator is the Gallurese language (“traducidori gaddhuresu”). As far as the ‘traducidori gaddhuresu’ is concerned, we are considering an Android application and a Windows version.

The priority pair for Gallurese is Italian-Gallurese. However, it will not be possible to make an Italian-Gallurese translator at first. It is a French-Gallurese translator that is first of all in preparation. It will therefore be necessary, at first, to translate a text from Italian into French first (especially with Deepl, which is of very good quality), and then to use the French-Gallurese translator.

Gallurese language

Our next project will be to implement the translation from Italian into Gallurese (gaddhuresu), or from French into Gallurese. The Gallurese language is close to the Corsican language, in particular to the ‘Rucchisgiana’ (Alta Rocca) or ‘Sartinese’ variant of the Corsican language. However, there are significant differences in writing and morphology between Gallurese and Corsican. A difficulty will be, as for the Corsican language, the management of the variants. The ideal would be to manage the main variants. In a first step, we will try to implement one of the main variants of the Gallurese language (we will preferably choose a well documented variant, such as the one used in the writings of Maria Teresa Inzaina).

Updating our grammatical typology

We now have the following categories in our grammatical taxonomy:

  • determinants
  • nouns
  • pronouns
  • verbs
  • prepositions and postpositions
  • determinant modifiers
  • noun modifiers, i.e. adjectives
  • adjective modifiers
  • verb modifiers, i.e. adverbs (but in a restricted sense with regard to classical grammar)
  • adverb (still in a restricted sense) modifiers

To be noted: the classical category of adverbs comprises here the following categories:

  • adjective modifiers
  • verb modifiers
  • adverb modifiers

On the category of adverb modifiers

Let’s continue to rethink the gruesome (so is it argued here) category of adverbs (in the classical sense). Let’s now turn our attention to the category of ‘adverb modifiers’. Adverbs are understood here in a restricted sense: they are either verb modifiers or proposition modifiers. In this context, we are likely to encounter adverb modifiers. In general, the adverb modifier precedes the adverb. Thus, very (‘très’) is an adverb modifier in the sequence he was eating very rarely (il mangeait très rarement’, manghjava mori raramenti).

Likewise more (‘plus’, più) is in some cases an adverb modifier. This is the case in the sequence he was drinking more frequently (‘il buvait plus fréquemment’, biia più suventi).

The case of adjective modifiers and the notion of grammatical proof

Let’s consider again the case of adjective modifiers (in classical grammar, this category of words are considered as degree adverbs). These include the following: peu, très, extrêmement, surtout, étonnamment, à peine, vraiment, assez, bien, trop, tellement, … = pocu, assai, estremamente, sopratuttu, in modu stunante, appena, propriu/propria/proprii/proprie, abbastanza, bellu/bella/belli/belle, troppu/troppa/troppi, troppe, tantu/tanta, tanti/tante, … = not very, very, extremely, especially, surprisingly, hardly, really, enough, all/very, too, so,… We have argued that this category of words are ‘adjective modifiers’, when they precede an adjective. But is such an assertion likely to be proven, or is there some form of evidence available? Grammar, like other disciplines, requires that assertions be justified, and if possible proven. The notion of proof in grammar, however, is uncommon. Let’s see if we can provide such proof or justification?

Consider the case of ‘tellement’ (so much), which we consider to be an adjective modifier when it precedes an adjective. Now, let us consider the following translations, where ‘tellement’ is used:

  • in French: il est tellement beau, ils sont tellement petits, elles est tellement belle, elles sont tellement intelligentes
  • in English: it is so beautiful, they are so small, they are so beautiful, they are so smart
  • in Corsican: hè tantu bellu, sò tanti chjuchi, hè tanta bella, sò tante intelligente (an alternative translation hè: hè cusì bellu, sò cusì chjuchi, hè cusì bella, sò cusì intelligente)
  • in Italian: è così bello, sono così piccoli, sono così belli, sono così intelligenti

It is patent here that ‘tellement’ preceding an adjective is translated in Corsican by:

  • tantu, when the adjective is singular masculine
  • tanti, when the adjective is plural masculine
  • tanta, when the adjective is singular feminine
  • tante, when the adjective is plural feminine

Thus ‘tellement’ (so much, tantu/tanti/tanta/tante), employed in this usage, i.e. preceding an adjective, accords with the adjective to which it refers. This sounds as a justification of its classification as an adjective modifier.