The first test carried out to transform the dictionary (in the extended sense) based on the French-Corsican pair, into a dictionary related to the Italian-Gallurian pair, shows that it is feasible. The result – of an acceptable but perfectible quality – is obtained in 21 minutes (with 16 GO RAM & Intel core i7-8550U CPU). We start with a multi-lingual dictionary based on French entries, and the final result is an Italian-Gallurese dictionary.
The translator takes his first steps in translating from French into the Gallurian language. The first tests show a score of 75-80%, with many errors in grammar, spelling and vocabulary. It will be necessary to reach a score of 90% before the result can be published.
The ideal would have been the Italian-Gallurian translation, but this is not yet possible: it will be necessary to translate (i) Italian into French, then (ii) French into Gallurian.
The translation from French to Gallurese is in progress and currently under development. An application for Android is first planned. It will be called ‘traducidori gaddhuresu’. Currently the French-Gallurese translator is undergoing testing. It will only be published if its performance (evaluated by an open test) is above 90%. This is a rule that we apply to ourselves, and is specific to endangered languages. We consider that for them, a poor or low quality translation can be more harmful than useful.
For those willing to read some texts in gallurese language (‘gaddhuresu‘), a sister-language from Corsican:
- Tizzoni, a short story of Maria Teresa Inzaina and Antonio Meloni
- Li fiori di l’alburi di la presca, short story of Maria Teresa Inzaina
- La fura, short story of Maria Teresa Inzaina
- Tummeantoni (l’ultimu carrulanti), short story of Maria Teresa Inzaina
- Don Baignu, lu “Catullo” gaddhuresu di lu XVIII sècculu, a text of Quirina Ruiu
There is also some poetry:
- A notti funda from Gianfranco Garrucciu
What are the conditions for a given endangered language to be a candidate for rule-based machine translation? For a given endangered language to be a candidate for rule-based machine translation, some requirements are in order. There is notably need for:
- a dictionary: some specialized lexicons are useful too
- a list of locutions and their translation: to be more accurate what is needed are noun locutions, adjective locutions, adverbial locutions, verbal locutions and their translations in other language.
- a detailed grammar (in any language): ideally, the grammar should be very detailed, mentioning notably irregular verbs, noun plurals, etc. Subjonctive, conditional tenses must also be accurately described.
- in addition, elision rules, euphony rules, should also be described.
- most importantly: a description of the main variants of the language and their differences. This is needed to handle what we can call the ‘variant problem’ (we shall say a bit more about this later): as an effect of diversity, endangered languages are often polynomic and come with variants. But translation must be coherent and a mix of several variants is not acceptable as a translation.
Let us mention that endangered languages are commonly associated with another language, being in a diglossia relationship one with another. To take an example, Corsican language is associated with French. So we consider the French-Corsican pair, and what is relevant is a French-Corsican. If we consider the sardinian gallurese language (‘gaddhuresu’), the relevant pair is Italian-Gallurese. Other relevant pairs are:
Here are some writing differences between Corsican and Sardinian gallurese, that result from historical writing habits. These writing differences prevail, even when the words are the same:
- ghj is replaced by gghj: acciaghju (corsu), acciagghju (gallurese) , steel
- chj is replaced by cchj: finochju (corsu), finocchju (gallurese), fennel
- tonic accent is marked systematically in gallurese whereas it is not compulsory in Corsican: apostulu (Corsican), apòstulu (gallurese), apostle
- cc is prefered in Gallurese language instead of cq in Corsican: acquistu (corsu), accuistu (gallurese), purchase
- dd in Corsican taravese or sartinese is replaced with ddh in Gallurese: beddu bedda beddi (corsu), beddhu beddha beddhi (gallurese), beautiful
- final è in Corsican is replaced with é in Gallurese: sapè (corsu), sapé (gallurese), know