If we want to realize a rule-based translation eco-system (with many language pairs), a disambiguation module for grammatical types is necessary. Indeed, for the French language, such a module performs disambiguation with respect to about 100 categories. The number of pairs (or 3-tuples, 4-tuples, etc.) of disambiguation, for French, is about 250. We need a module that disambiguates each of these n-tuples. So these n-tuples are different from one language to another and necessitates a language-specific module. However, in the context of an eco-system, it is too complex and time-consuming to implement a disambiguation module for each language pair and each ambiguous type. What is needed is a disambiguation module that is not language-specific and that can be simply adapted to a given language pair. A versatile grammatical disambiguator seems to be the crux of the matter here.
Today we are launching a project on github to write a better index page for the okchakko translator project.
The current index page is located at the following address.
This index page gives online free access to the translation from French to Corsican, a language threatened with extinction.
The current index page has several defects:
- it is basic, rather crude in its design on a white background
- the source-text and the destination-text should be aligned horizontally (like Google translate, Deepl, etc.) and not vertically
The index page index.php will be published under the MIT license.
Your contributions are welcome. You can help this project by proposing a better index page that the current one of the okchakko project (a priori in php).
If we consider indefinite pronouns, and in particular: ‘ceci, cela, les autres, quelques-uns, tous, toutes’ (this, that, the others, some, all, all), it turns out that they are likely to appear in sequences such as: ‘tout ceci, tout cela, tous les autres, toutes les autres, au moins quelques-uns, quasiment tous, presque tous, quasiment toutes, presque toutes’ (all this, all that, all the others, all the others, at least some, almost all, almost all, almost all, almost all‘; tuttu quissu, tuttu quissa, tutti l’altri, tutti l’altri, alminu calchiadunu, guasgi tutti, guasgi tutti, guasgi tutti, guasgi tutti) and followed by a verb. In the present context, ‘tout’ (all) in ‘tout ceci’ (all this; tuttu quissu) and ‘tout cela’ (all that; tuttu quissa), ‘tous’ (all) in ‘tous les autres’ (all others; tutti l’altri), ‘toutes’ (all) in ‘toutes les autres’ (all others; tutti l’altri), ‘au moins’ (at least) in ‘au moins quelques-uns’ (at least some), ‘quasiment’ (almost; guasgi) in ‘quasiment tous’ (almost all; guasgi tutti) and ‘quasiment toutes’ (almost all; guasgi tutti), ‘presque’ (almost; guasgi) in ‘presque tous’ (almost all; guasgi tutti) and ‘presque toutes’ (almost all; guasgi tutti) play the role of a modifier of an indefinite pronoun, as they change its meaning and scope.
Let’s consider the so-called ‘tonic personal pronouns’ used in the imperative, in an affirmative (non-negative) form: parle-moi, _ , parle-lui, parlez-nous, parlez-vous, parle-leur (talk to me, , talk to him/her, talk to us, talk to you, talk to them; parla mi, _, parla li, parleti ci, parleti vi, parla li). Here again, we have the equivalence of meaning: parle-moi = parle à moi, _ , parle-lui = parle à lui/elle, parlez-nous = parlez à nous, parlez-vous = parlez à vous, parle-leur = parle à eux/elles (no difference in english; parla mi = parla à mè, _, parla li = parla ad eddu/edda, parleti ci = parleti à no, parleti vi = parleti à vo, parla li = parla ad eddi). Therefore, once again: ‘me = à moi, te = à toi, lui = à lui/elle, nous = à nous, vous = à vous, leur = eux/elles’ (mi = à mè, ti = à tè, li = ad eddu/edda, ci = à no, vi = à vo, li = ad eddi). Thus, in the so-called ‘tonic personal pronoun’ used in the imperative, the preposition placed before the personal pronoun is included. In the present model, ‘tonic personal pronouns’ cannot be considered as a category of personal pronouns, but are viewed here as a contraction, i.e. a preposition+personal pronoun group.
To begin with, there is (i) the class of autonomous personal pronouns: ‘moi, toi, lui/elle, nous, vous, eux/elles’ (me, you, he/she, we, you, they).
There is also (ii) the class of personal pronouns as direct object complements: ‘me te le/la nous vous les’ (me you him/her us you them; mi ti u/a ci vi i/e), as in ‘il me comprend, elle te comprend, je le comprends, nous nous comprenons, ils vous comprennent, je les comprends’: (he understands me, she understands you, I understand him, we understand us, they understand you, I understand them; mi capisci, ti capisci, u capiscu, ci capimu, vi capiscini, i capiscu).
There are also (iii) the so-called ‘tonic personal pronouns’: ‘moi toi lui/elle nous vous eux/elles’ (me you him/her we you them), used after a preposition: ‘de moi, à toi, devant lui, après elle, par nous, chez vous, à eux, à elles’ (of me, to you, in front of him, after her, by us, at your place, to them, to them; di mè, à tè, davanti ad eddu, dopu ad edda, da no, ind’è vo, ad eddi, ad eddi).
Finally, there is the class of person pronouns as indirect object complement: ‘me te lui nous vous leur’ (not applicable to english; mi ti li ci vi li). For example: ‘il me parle, elle te parle, je lui parle, il nous parle, elle vous parle, je leur parle’ (he talks to me, she talks to you, I talk to her, he talks to us, she talks to you, I talk to them; mi parla, ti parla, li parlu, ci parla, vi parla, li parlu). If we now analyse the personal pronouns as indirect object complements, it turns out that each of them is equivalent to the preposition followed by the tonic personal pronoun: ‘il me parle = il parle à moi, elle te parle = elle parle à toi, je lui parle = je parle à lui/elle, il nous parle = il parle à nous, elle vous parle = elle parle à vous, je leur parle = je parle à eux/elles’ (mi parla = parla à mè, ti parla = parla à tè, li parlu = parlu ad eddu/edda, ci parla = parla à no, vi parla = parla à vo, li parlu = parla ad eddi). Therefore: ‘me = à moi, te = à toi, lui = à lui/elle, nous = à nous, vous = à vous, leur = eux/elles’ (mi = à mè, ti = à tè, li = ad eddu/edda, ci = à no, vi = à vo, li = ad eddi). Thus, in the so-called ‘tonic personal pronoun’, the preposition placed before the personal pronoun is included. It is therefore a preposition+personal pronoun group. In the present context, ‘tonic personal pronouns’ cannot be considered as a category of personal pronouns: in the present model, it is it is a contraction, i.e. a group consisting of a preposition followed by a personal pronoun: PS+PRPERS.
Let’s focus on the word class of determiner modifiers: they also include indefinite determiner modifiers. Indefinite determiners are thus: ‘tous les, aucun, aucune, quelques’ (all, none, none, some ; tutti i, nisciunu, nisciuna, calchì).
Here are some examples of these indefinite determiner modifiers: ‘presque tous les, quasi tous les, quasiment tous les, quasiment aucun, quasiment aucune, presque aucun, presque aucune, au moins quelques, au plus quelques, tout au plus quelques’ (almost all, almost all, almost all, almost none, almost none, at least some, at most some, at most some). Thus, ‘presque tous les’ (almost all; guasgi tutti i) in the sentence ‘presque tous les soldats’ (almost all the soldiers; guasgi tutti i suldati) is an indefinite determiner modifier.
Let’s focus on the class of autonomous personal pronouns: moi, toi, lui/elle, nous, vous, eux/elles (me, you, he/she, we, you, they). On rencontre parfois les formes: ‘moi particulièrement, toi aussi, toi notamment, moi spécialement, moi surtout, vous également, toi de même, toi pareil, lui en particulier, elle notamment, moi aussi’ (me especially, you also, you especially, me especially, me especially, you also, you likewise, you the same, he especially, she especially, me too). As in the sequence ‘moi aussi, j’aime cela’ (me too, I like that). Classically, ‘particulièrement, aussi, notamment, spécialement, surtout, également, de même, pareil, en particulier’ (particularly, also, especially, especially, especially, also, likewise, same, in particular) are considered adverbs. However, in the present context, they modify the meaning of autonomous personal pronouns. When used in this way, it is therefore logical to consider them as modifiers of autonomous personal pronouns.
In the present construction, the question arises of whether or not a determiner is a modifier. More specifically, is a definite or indefinite article (i.e. a determiner) a modifier of a noun. In the present model, an adjective is indeed a noun modifier. Is this not also the case for a definite article (the definite article ‘le’ (the), for example)? The answer is no. In fact, a modifier only modifies the meaning of the word to which it is applied. The consequence is that if the modifier is removed from the sentence in question, the sentence still conveys meaning and remains correctly formed. For example, in the sentence ‘le cheval blanc courait’ (the white horse was running), if we remove the noun modifier ‘blanc’ (white), the sentence ‘le cheval courait’ (the horse was running) remains correct. On the other hand, if we remove the determiner ‘le’ (the), we get the sentence ‘cheval blanc courait’ (white horse was running) which is incomplete and whose structure is not valid.
The distinction between rule-based and statistically-based translation may well be artificial and obscure what is really the interesting distinction in machine translation modules. The latter may well lie in the fact that some methods capture (at least partially) the semantics of a text, and are for example able to enumerate lemmas in the text, change the person of verbs or the gender of nouns, etc. In contrast, other translation methods do not capture the semantics of the text and only perform the translation. At least this type of classification seems to be relevant to artificial intelligence.
What are interjections (Hello! Good evening! Merry Christmas! Happy Birthday!…) in the present framework? They are words preceded by a punctuation mark (period, comma, exclamation mark, question mark, etc.) and followed by a punctuation mark.