It is useful to point out the differences that may exist between different grammatical typologies. The classical grammatical taxonomy is essentially aimed at teaching and comprehension. It therefore has a pedagogical purpose. On the other hand, the taxonomy that is useful for rule-based machine translation has a different purpose: it aims essentially at allowing disambiguation, both grammatically and semantically, because ambiguity is a fundamental and very common problem in this particular context. Such a typology essentially focuses on the location of word types, on the structures encountered in the sentence. This explains why typologies can be different, as they have different goals and purposes.
Let’s take a closer look at noun modulators, especially common noun modulators. We have seen that adjectives could be considered, in the present conceptual framework, as noun modulators. In this context, the question arises, are there other forms of noun modulators? It seems that there are.
Let us consider elements of sentences such as ‘bois de châtaignier’ (chestnut wood; legnu castagninu) or ‘oiseau de proie’ (bird of prey; aceddu di preda). In ‘bois de châtaignier’, ‘de châtaignier’ seems to play the role of noun modulator, in the same way as an adjective. In traditional grammar, ‘de châtaignier’ is considered as a noun complement. In the present framework, it would be a noun modulator, since it clarifies and restricts the meaning of the noun ‘bois’ (wood; legnu). The role of ‘de proie’ in ‘oiseau de proie’ is identical, as it acts as a modulator of the name ‘bird’.
Interestingly, it turns out that the comparison between languages tends to validate this type of analysis. Indeed, ‘bois de châtaignier’ is better translated in Corsican language by legnu castagninu than litterally by legnu di castagnu (chestnut wood); and in this case, castagninu (of chestnut) is an adjective, i.e. a noun modulator. Thus, castagninu and di castagnu being equivalent here, confirming in both cases their same nature of adjective modulator.
We have mentioned the special category of determinant modulators. It seems that this category is interesting and deserves to be explored further. A determinant modulator is placed before a determinant and changes its meaning. As we have already seen, from the viewpoint of two-sided grammar, a determinant preceded by a determinant modulator (MODD) remains a determinant.
We can give some examples that apply to different categories of determinants:
- MODD applying to possessive determinants (mes, tes, ses, nos, vos, leurs; my, your, his/her, our, your, their; i me, i to, i so, i nostri, i vostri, i so), demonstrative determinants (ces; these; ‘ssi/’sse) and definite article determinants (les; the; i/e): certaines de, certains de, l’un de, l’une de, la majeure partie de, la plupart de, tous, toutes, une bonne partie de, une grande partie de; some of, some of, one of, one of, most of, most of, all of, all of, a good part of, a large part of; une poche di, uni pochi di, unu di, una di, parte è più di, a maiò parte di, tutti, tutte, une bella parte di, parte assai di. Here are some examples: “certains de mes chevaux étaient bruns” (some of my horses were brown; uni pochi di i me cavalli eranu bruni); la majeure partie des (= de les) habitants étaient riches (most of the inhabitants were rich; a maiò parte di l’abitanti eranu ricchi).
In addition, we have three other categories of MODDs that have already been mentioned:
- MODD applying to cardinal determinants (deux, trois, quatre, cinq, … ; two, three, four, five…; dui, trè, quattru, cinqui,…): au moins, presque, quasiment, environ, plus de, moins de, approximativement, etc. (at least, almost, nearly, about, more than, less than, approximately, etc. ; alminu, guasgi, guasgi, circa, più di, menu di, apprussimativamenti, etc.)
- MODD applying to indefinite article determinants: plus de, au moins; more than, at least; più di, alminu
- MODD applying to indefinite determinants (aucun, aucune, quelques; none, none, a few ; nisciunu, nisciuna, calchì): au moins, presque; at least, almost; alminu, guasi
Finally, it seems that this category of MODD has some consistency and could be of practical interest.
Let us try to delve more deeply into the case of adverbs. We shall continue now to define them by their position in relation to other grammatical categories. The result is that adverbs are divided into several different categories. Now let’s look at the adverbs that may be placed before an adjective modulator. To begin with, let us cite but a few adjective modulators:
- peu, très, extrêmement, surtout, étonnamment, à peine, vraiment, assez, bien, trop, tellement, etc.
- pocu, assai, estremamente, sopratuttu, in modu stunante, appena, propriu/propria/proprii/proprie, abbastanza, bellu/bella/belli/belle, troppu/troppa/troppi, troppe, tantu/tanta, tanti/tante, etc.
- not very, very, extremely, especially, surprisingly, hardly, really, enough, all/very, too, so, etc.
Now some modulators of adjective modulators are:
- pas, peut-être, surtout, vraiment, etc.
- micca, forse, soprattuttu, veramente, è cetera.
- not, maybe, mostly, really, etc.
Here are some relevant examples: “il était surtout trop blanc” (he was mostly too white, era sopratuttu troppu biancu); “il était vraiment très beau” (he was really very beautiful, era propriu bellissimu); “il était bien trop grand” (he was far too tall ; era bellu troppu maiore).
Let’s call this category modulators of adjective modulators. The fact of being placed before the adjective modulator is related to the fact that the modulator modifies the meaning of the adjectivemodulator.
Hence, if we reason in terms of two-sided grammar, an adjective modulator preceded by a modulator remains an adjective modulator: MOD-MODAQ = MODAQ.
To sum up. So far we have distinguished several categories among the classical class of adverbs:
- modulators of adjectives
- modulators preceding verbs: verb pre-modulators
- modulators following verbs: verb post-modulators
- modulators preceding cardinal determinants
- modulators preceding adjective modulators
Let us briefly recall the problem: translating ‘I love you’ might sound trivial, but it’s not. In fact, ‘ti amu‘ is not the best translation. The best translation is ‘ti tengu caru‘ when addressed to a male person, or ‘ti tengu cara‘ when addressed to a female person. Hence the proposed preliminary translation ‘ti tengu caru/cara‘. Such rough translation requires further disambiguation, but on what precise grounds?
Let us look at the issue from an analytical perspective. It appears that we need to assign a reference to the pronoun ‘te’ (you, ti). The latter could be identified according to the context, depending on whether the person ‘te’ refers to is male or female. At this stage, it appears that it is better to consider that the personal object pronoun has an inherent gender: masculine or feminine. This gender does not affect the pronoun itself which remains ‘te’ (you, ti) independently of the gender, but it does have an effect on the words that depend on it, i.e. the adjective caru/cara in Corsican, in the locution ti tengu caru/cara. The upshot is: in this case, ‘te’ (you, ti) is a personal object pronoun, masculine or feminine, whose inherent ambiguity can be solved according to the context.
Let us give some further examples of two-sided grammatical analysis:
- “à dessein” (purposedly), “à volonté” (at will), “à tort” (mistakenly): from an analytical standpoint, these are prepositions followed by a singular noun. From a synthetical viewpoint, they are adverbs (adverbial locutions).
- “à jamais” (forever): from an analytical standpoint, it is a preposition followed by an adverb. From a synthetical viewpoint, it is an adverb (adverbial locution).
- “à genoux” (on my/his/her/… knees), “à torrents” (in torrents): from an analytical standpoint, these are prepositions followed by a plural noun. From a synthetical viewpoint, they are adverbs (adverbial locutions).
Let us call two-sided grammatical analysis the type of grammatical analysis that will be described below. Two-sided grammatical analysis contrasts with one-sided analysis, which sees a sequence of words either as a locution type (adverbial locution, verbal locution, noun locution, etc.) or as the sequence of types of it constituent words. From the standpoint of two-sided grammatical analysis, a given sequence of words can be attributed one (synthetically) single type, and (analytically) several grammatical types corresponding one-by-one to its constituent words. The upshot is that a given sequence of words can be described from two – synthetic & analytic – different viewpoints. What is now the status of ‘de fait’, from the viewpoint of ‘two-sided grammatical analysis’? From a synthetic standpoint, it is an adverb. And from an analytic viewpoint, it is made up of one preposition (‘de’) followed by a common noun (‘fait’). Both viewpoints are complementary and cast each light on one facet of the same reality. (lacking the time to write a scholar article, but I hope the main idea should be clear…)
What are the conditions for a given endangered language to be a candidate for rule-based machine translation? For a given endangered language to be a candidate for rule-based machine translation, some requirements are in order. There is notably need for:
- a dictionary: some specialized lexicons are useful too
- a list of locutions and their translation: to be more accurate what is needed are noun locutions, adjective locutions, adverbial locutions, verbal locutions and their translations in other language.
- a detailed grammar (in any language): ideally, the grammar should be very detailed, mentioning notably irregular verbs, noun plurals, etc. Subjonctive, conditional tenses must also be accurately described.
- in addition, elision rules, euphony rules, should also be described.
- most importantly: a description of the main variants of the language and their differences. This is needed to handle what we can call the ‘variant problem’ (we shall say a bit more about this later): as an effect of diversity, endangered languages are often polynomic and come with variants. But translation must be coherent and a mix of several variants is not acceptable as a translation.
Let us mention that endangered languages are commonly associated with another language, being in a diglossia relationship one with another. To take an example, Corsican language is associated with French. So we consider the French-Corsican pair, and what is relevant is a French-Corsican. If we consider the sardinian gallurese language (‘gaddhuresu’), the relevant pair is Italian-Gallurese. Other relevant pairs are: