Tag Archives: translation

index page project for the online okchakko translator

Today we are launching a project on github to write a better index page for the okchakko translator project.

The current index page is located at the following address.

This index page gives online free access to the translation from French to Corsican, a language threatened with extinction.

The current index page has several defects:

  • it is basic, rather crude in its design on a white background
  • the source-text and the destination-text should be aligned horizontally (like Google translate, Deepl, etc.) and not vertically

The index page index.php will be published under the MIT license.

Your contributions are welcome. You can help this project by proposing a better index page that the current one of the okchakko project (a priori in php).

Translation from Italian to Gallurese

Our new project will be to try to implement the translation from Italian into Gallurese. For this is an essential pair for the Gallurese language, which is a priority. The major difficulty in doing this is:
– on the one hand, to (automatically) transform the dictionary (in the extended sense) based on the French-Corsican pair, into a dictionary related to the Italian-Gallurese pair
– on the other hand, to implement automatically (without having to rewrite them entirely) the other modules, and in particular the one based on grammatical disambiguation.

The stakes here seem high. It is a question of transforming a system that can translate one pair of languages (i.e. French into Corsican) into an eco-system that can translate several pairs of languages (the target language of which being an endangered language).

Grammatical categories by position again: the case of adverbs and modulators placed before a modulator

Let us try to delve more deeply into the case of adverbs. We shall continue now to define them by their position in relation to other grammatical categories. The result is that adverbs are divided into several different categories. Now let’s look at the adverbs that may be placed before an adjective modulator. To begin with, let us cite but a few adjective modulators:

  • peu, très, extrêmement, surtout, étonnamment, à peine, vraiment, assez, bien, trop, tellement, etc.
  • pocu, assai, estremamente, sopratuttu, in modu stunante, appena, propriu/propria/proprii/proprie, abbastanza, bellu/bella/belli/belle, troppu/troppa/troppi, troppe, tantu/tanta, tanti/tante, etc.
  • not veryveryextremelyespeciallysurprisinglyhardlyreallyenoughall/very, tooso, etc.

Now some modulators of adjective modulators are:

  • pas, peut-être, surtout, vraiment, etc.
  • micca, forse, soprattuttu, veramente, è cetera.
  • not, maybe, mostly, really, etc.

Here are some relevant examples: “il était surtout trop blanc” (he was mostly too white, era sopratuttu troppu biancu); “il était vraiment très beau” (he was really very beautiful, era propriu bellissimu); “il était bien trop grand” (he was far too tall ; era bellu troppu maiore).

Let’s call this category modulators of adjective modulators. The fact of being placed before the adjective modulator is related to the fact that the modulator modifies the meaning of the adjectivemodulator.

Hence, if we reason in terms of two-sided grammar, an adjective modulator preceded by a modulator remains an adjective modulator: MOD-MODAQ = MODAQ.

To sum up. So far we have distinguished several categories among the classical class of adverbs:

  • modulators of adjectives
  • modulators preceding verbs: verb pre-modulators
  • modulators following verbs: verb post-modulators
  • modulators preceding cardinal determinants
  • modulators preceding adjective modulators

More on polymorphic disambiguation…

Let’s take another look at polymorphic disambiguation. We shall consider the French word sequence ‘nombre de’. The translation into Corsican (the same goes for English and other languages) cannot be identical, because ‘number of’ can be translated in two different ways. In the sequence ‘mais nombre de poissons sont longs’ (but many fish are long), ‘number of’ is an indefinite determiner: it translates as bon parechji (many). On the other hand, in the sequence ‘mais le nombre de poissons est supérieur à dix’ (but the number of fish is greater than ten), ‘nombre de’ is a common name followed by the preposition ‘de’: it is translated by numaru di (number of). Statistical MT does usually better than human-like (rule-based) MT at polymorphic disambiguation (I did a test with both sentences with Deepl and Google translate, and both of them successfully solve the relevant polymorphic disambiguation), but it turns out that human-like (rule-based) MT is also capable of handling that.

Performing our first open test of the year

Let us comment on the remaining errors encountered in the above open test:

  • French ‘carrière’ remains undisambiguated: either carriera (career) or cava (quarry): two occurrences
  • ‘de’: French ‘de’ is perhaps the most difficult word to translate into another language, due to its general polymorphism
  • ‘national-socialiste’: missing vocabulary
  • l’ within ” l’empeche “: pronoun error
  • it should be pointed out that ‘Etats-Unis’ remains untranslated due to the fact that it is erroneously written, with a beginning E instead of É

The result is 1 – (5/169) = 97.04%. To be noticed: ambiguous French word ‘partie’ (‘durant la première partie’, during the first part) is correctly disambiguated into parti (part), instead of partita (game, match).

It seems that an average result of 95% is currently being consolidated, and that an average result of 96% is a target that should be achievable within a year.

More on the remaining 1% problem

The analysis of the Wikipedia article of the day in French is interesting, in the sense that it sheds light on the skills that will be necessary for a machine translation system to achieve a 100% accurate translation. The error that appears here is characteristic and must probably be placed in the missing 1% to achieve 100% accuracy in the translation (the problem of the remaining 1%). The phrase ‘Her father studied at the University of Oregon and then at Yale Law School‘ has a definite article with elision: l’. The translation given (u/a, i.e. indeterminate between the masculine definite article u and the feminine definite article a) is not correct in that it fails to determine the gender – masculine or feminine – of Yale Law School, the name of an English school. In order to provide the correct translation, it is necessary to know how to translate Yale Law School into Corsican, and thus to determine that school is translated by scola, which is feminine. Therefore the correct translation should have been: po à a Yale Law School prima di ….
This finally shows that a translator capable of translating with 100% performance must be able (i) to determine the language in which the text parts are written in another language and (ii) to translate those text parts into the target language. This highlights the skills necessary to successfully achieve the remaining 1% are: (i) the ability to determine the language of a subtext and (ii) the ability to translate a subtext from any language in the target language.

Presently, we can only conjecture that this ability to solve the remaining 1% requires artificial general intelligence (AGI ). Now providing concrete and detailed examples may help to confirm or disprove that hypothesis.

New version on Android Playstore

The application now changes its name on the Android Playstore, and becomes “Traduttore corsu”: the name is not very original, let’s face it, but at least it is easy to understand. “Traduttore corsu” is dedicated especially to the translation from French to Corsican. So we are leaving aside for the moment this beautiful word “okchakko” from the language of the Choctaw Indians.

To find the application Traduttore corsu on Google Play, you have to search with “traduttore_corsu”, because there is a known “bug” in Google Play that means that with “corsu” or “traduttore”, you cannot find the application.