Let’s focus on the class of autonomous personal pronouns: moi, toi, lui/elle, nous, vous, eux/elles (me, you, he/she, we, you, they). On rencontre parfois les formes: ‘moi particulièrement, toi aussi, toi notamment, moi spécialement, moi surtout, vous également, toi de même, toi pareil, lui en particulier, elle notamment, moi aussi’ (me especially, you also, you especially, me especially, me especially, you also, you likewise, you the same, he especially, she especially, me too). As in the sequence ‘moi aussi, j’aime cela’ (me too, I like that). Classically, ‘particulièrement, aussi, notamment, spécialement, surtout, également, de même, pareil, en particulier’ (particularly, also, especially, especially, especially, also, likewise, same, in particular) are considered adverbs. However, in the present context, they modify the meaning of autonomous personal pronouns. When used in this way, it is therefore logical to consider them as modifiers of autonomous personal pronouns.
Category Archives: blog
Is a determiner a modifier?
In the present construction, the question arises of whether or not a determiner is a modifier. More specifically, is a definite or indefinite article (i.e. a determiner) a modifier of a noun. In the present model, an adjective is indeed a noun modifier. Is this not also the case for a definite article (the definite article ‘le’ (the), for example)? The answer is no. In fact, a modifier only modifies the meaning of the word to which it is applied. The consequence is that if the modifier is removed from the sentence in question, the sentence still conveys meaning and remains correctly formed. For example, in the sentence ‘le cheval blanc courait’ (the white horse was running), if we remove the noun modifier ‘blanc’ (white), the sentence ‘le cheval courait’ (the horse was running) remains correct. On the other hand, if we remove the determiner ‘le’ (the), we get the sentence ‘cheval blanc courait’ (white horse was running) which is incomplete and whose structure is not valid.
About the typology of machine translation systems
The distinction between rule-based and statistically-based translation may well be artificial and obscure what is really the interesting distinction in machine translation modules. The latter may well lie in the fact that some methods capture (at least partially) the semantics of a text, and are for example able to enumerate lemmas in the text, change the person of verbs or the gender of nouns, etc. In contrast, other translation methods do not capture the semantics of the text and only perform the translation. At least this type of classification seems to be relevant to artificial intelligence.
What are interjections (Hello! Good evening! Merry Christmas! Happy Birthday!…) in the present framework? They are words preceded by a punctuation mark (period, comma, exclamation mark, question mark, etc.) and followed by a punctuation mark.
An analysis of French word ‘très’
According to our analysis, the word ‘très’ is likely to occur in the following grammatical types:
- Adjective modifier: here, ‘très’ modifies the meaning of an adjective: très beau (very beautiful, biddisimu), très content (very happy, cuntentissimu)
- Adverb modifier: ‘très’ here modifies the meaning of an adverb: ‘très rarement’ = very rarely, raramenti; ‘très souvent’ = very often, mori à spessu
- Adverb (i.e. in our terminology, a Verb modifier): ‘very’ modifies here the meaning of a verb: ‘j’ai très faim’ = I am very hungry, t’aghju mori fami; ‘il avait très soif’ = he was very thirsty, t’aia mori seti: where the verb is here the verbal locution ‘avoir faim’ = to be hungry, avè a fami; avoir soif = to be thirsty, avè a seti
Leaving ambiguity unresolved
Disambiguation is an essential process in machine translation. Sometimes, however, it seems more rational and logical to leave an ambiguity in the translation. This is the case when (i) there is an ambiguous word in the sentence to be translated; and (ii) the context does not provide an objective reason to choose one of the two occurrences. It seems that in this case, the best translation is the one that leaves the ambiguity intact.
Let’s take an example. Consider the following French sentence: ‘Son palais était en feu.’. The French word ‘palais’ is ambiguous, because it corresponds in English and in Corsican to two different words (palace, palazzu and palate, palatu).
Thus, we have 3 possibilities of translation:
- His palate was on fire
- His palace was on fire
- His palace/palate was on fire
The third translation, in my opinion, is better, because it points out that the context is insufficient to choose one of the two alternatives.
Consider now, on the one hand, the following sentence: ‘Il avait mangé du piment fort. Son palais était en feu.’ Now the context provides an objective motivation to choose one of the two occurence. This yields the following translation: He had eaten some hot pepper. His palate was on fire.
On the other hand, consider the following sentence: ‘Les ennemis du prince avaient lancé des engins incendiaires. Son palais était en feu.’ We also have here an objective reason to choose the other alternative. It translates then: The prince’s enemies had thrown incendiary devices. His palace was on fire.
Dictionary = Corpus?
As far as machine translation is concerned, it seems that the best thing is to combine the best of the two approaches: rule-based or statistic-based. If it were possible to converge the two approaches, it seems that the benefit could be great. Let us try to define what could allow such a convergence, based on the two-sided grammatical approach. Let us try to illustrate this with a few examples.
To begin with, u soli sittimbrinu = ‘le soleil de septembre’ (the sun of September). In Corsican language, sittimbrinu is a masculine singular adjective that means ‘de septembre’ (of September). In French, ‘de septembre’ is–from an analytic perspective–a preposition followed by a common masculine singular noun. But according to the two-sided analysis ‘de septembre’ (of September) is also–from a synthetic perspective–a masculine singular adjective. This double nature, according to this two-sided analysis of ‘de septembre’, allows in fact the alignment of ‘de septembre’ (of September) with sittimbrinu.
More generally, if we define words or groups of words according to the two-sided grammatical analysis in the dictionary, we also have an alignment tool, which can be used for a translation system based on statistics, in the same way as a corpus. Thus, if it is sufficiently provided, the dictionary is also a corpus, and even more, an aligned corpus.
Grammatical taxonomy again: the case of prepositions
Let’s look at the translation of the word ‘whose’. Depending on the case, ‘whose’ can be a
- relative pronoun: ‘la difficulté dont je t’ai parlé’ (the difficulty I told you about), ‘voilà le professeur dont j’apprécie beaucoup les cours’ (this is the teacher whose classes I really enjoy.)
- or, more rarely, a preposition: ‘il y avait cinq couleurs, dont le rouge et le bleu’. (there were five colours, including red and blue.)
It is the latter case that we will be looking at. In this case, ‘dont’ is translated into English as ‘including’. In Corsican, the translation is: c’eranu cinque culori, frà i quali u rossu è u turchinu. But if we translate ‘il y avait cinq plantes, dont le ciste et la bruyère’ (‘there were five plants, including cistus and heather’), we get: c’eranu cinque piante, frà e quale u muchju è a scopa. Thus the translation of ‘dont’ (including) as a preposition is either frà i quali (masculine plural, culore being masculine in Corsican) or frà e quale (feminine plural), depending on which noun ‘dont’ refers to.
Thus ‘dont’ is translated into the masculine plural or the feminine plural, depending on the noun – either masculine or feminine – to which it refers. This casts doubt on the ‘prepositional’ nature of ‘dont’, and leads to further analysis to determine whether there might not be a more suitable grammatical type.
It is worth noting that ‘dont (including) can be replaced by ‘parmi lequels’ (among which, frà i quali) or ‘parmi lesquelles’ (among which, frà e quale) depending on whether the noun to which ‘whose’ refers is in the masculine plural or the feminine plural. This suggests that ‘whose’ could be conceived of as a preposition followed by a pronoun. In the spirit of this analysis, the BDL site notes: ‘Dont’ is probably the relative pronoun whose use is the most delicate. To use it correctly, one must know that dont always ‘hides’ the preposition ‘de’; ‘dont’ is equivalent to ‘de qui’, ‘de quoi’, ‘duquel’, etc. This link between ‘dont’ and ‘de’ goes back to the Latin origin of ‘dont’, which is from ‘unde’ “from where”.
More generally, this suggests that further analysis of some prepositions may be needed.
Creating new grammatical types
Italian has ‘prepositions followed by articles’ (preposizione articolate). This is a specific grammatical type, which refers to a word (e.g. della) that replaces a preposition (di) followed by an article (la):
il lo l’ la i gli le di del dello dell’ della dei degli delle a al allo all’ alla ai agli alle da dal dallo dall’ dalla dai dagli dalle in nel nello nell’ nella nei negli nelle su sul sullo sull’ sulla sui sugli sulle
This specific grammatical type also corresponds to:
- in French: du = de le, des = de les
- in Corsican and especially in the Sartenese variant: ‘llu = di lu, ‘lla = di la, etc.
This raises the general problem of the number of grammatical types we should retain. Should we create new grammatical types beyond the classical ones, in order to optimise translators and NLP in general? What is the best grammatical type to retain for ‘prepositions followed by an article’: a new primitive one or a compound one (always keeping Occam’s razor in mind)? A preposition followed by an article behaves like a preposition for words on its left, and like an article for words on its right.
Evaluation of the performance after changes
Just performed a series of open tests, using the (pseudo-random) article of the day from wikipedia in French.The results are the following, concerning the Taravese version of the Corsican language:
that is to say an average of about 95%, taking into account that the ‘cismuntinca’ version generally obtains a slightly lower result, because of the masculine and feminine plurals which are different (whereas they are identical in Taravese).