Tag Archives: Corsican language

Further reflexions on the status of “I love you” in Corsican language

Let us briefly recall the problem: translating ‘I love you’ might sound trivial, but it’s not. In fact, ‘ti amu‘ is not the best translation. The best translation is ‘ti tengu caru‘ when addressed to a male person, or ‘ti tengu cara‘ when addressed to a female person. Hence the proposed preliminary translation ‘ti tengu caru/cara‘. Such rough translation requires further disambiguation, but on what precise grounds?

Let us look at the issue from an analytical perspective. It appears that we need to assign a reference to the pronoun ‘te’ (you, ti). The latter could be identified according to the context, depending on whether the person ‘te’ refers to is male or female. At this stage, it appears that it is better to consider that the personal object pronoun has an inherent gender: masculine or feminine. This gender does not affect the pronoun itself which remains ‘te’ (you, ti) independently of the gender, but it does have an effect on the words that depend on it, i.e. the adjective caru/cara in Corsican, in the locution ti tengu caru/cara. The upshot is: in this case, ‘te’ (you, ti) is a personal object pronoun, masculine or feminine, whose inherent ambiguity can be solved according to the context.

A specific kind of superlative

Let us consider a specific kind of superlative. Such form specific to Corsican language is notably mentioned by grammarian and author Santu Casta, in his  Punteghju, who recommends the following translation of “C’était le village le plus riche du canton” (It was the richest village of the canton):  Era u più paese riccu di stu cantone (pages 26 & 54-55). The structure is original in the sense that the comparative (più) precedes the noun (campanile, bell tower) that precedes the adjective (altu, high).

Anagrams in Corsican language

Here are some anagrams in Corsican language:

  • Corscia è Corsica
  • Marta è Matra
  • accanitu è uccitana
  • acciliratu, ricciulata è riciculata
  • accirtà è traccià
  • accirtatu, catarticu è tracciatu
  • adriatica è cadariati
  • anacrunisimu è cunsumariani
  • aprarà è pararà
  • arba è bara
  • attaccu è tuccata
  • aumintà è umanità
  • armunizà è rumanizà
  • basatu è sabatu
  • battariu è urbitata
  • calculatu è cullucata
  • cadastrali è riscaldata
  • camaratu è racamatu
  • ciandarmu è ricumanda
  • candidatu è incuddata
  • chjinà è nichjà
  • dicimà è midicà
  • fascià è fiascà
  • qualificativa è qualificavati
  • marchjariani è richjamarani
  • participativa è participavati
  • lavarà è valarà
  • cunsidarà è sicundarà
  • cuntrà è truncà
  • carrià è criarà
  • carità è citarà
  • ilarità è rialità
  • limità è milità
  • neru è renu
  • rinvià è vinarà
  • rinviarà, rivinarà è vinirarà
  • muralità, mutilarà è ultimarà
  • pisà è spià
  • pricisà è ripiscà
  • pristà è stirpà
  • ramintà è tarminà
  • ricciulà è riciculà
  • sacramentu è stancaremu
  • staccariu è sucratica
  • svià è visà
  • pristarà, stirparà è straripà

Rough typology of remaining errors (updated march 2018)

French to Corsican: performing on French wikipedia sample test currently amounts to 94% on average. Below is a rough typology of remaining errors (presumably an average scoring of 95% on the open test should be attainable on the basis of correction of ‘easy’ tagged errors):

  • unknown vocabulary: 40% (easy)
  • basic disambiguation: 25%  (easy or medium difficulty)
  • false positives: 5% (medium difficulty or hard). This type of error  is mostly related to proper nouns, i.e. English termes that should remain un translated. For example: ‘North American Aviation’ translates erroneously into ‘North American Aviazione’. In this case, ‘Aviation’ should remain untranslated.
  • inadequate locution: 10% (medium difficulty or hard)
  • anaphora resolution related to complex sentence’s structure: 5% (hard)
  • semantic disambiguation: 5% (hard). For example, disambiguating French ‘échecs’ = fiaschi/scacchi (failures/chess)
  • erroneous accord related to gender mismatch from French to Corsican, i.e. (i) words that are masculine in French and feminine in Corsican language; and (ii) ) words that are feminine in French and masculine in Corsican language: 1% (medium difficulty).
  • erroneous accord related to number mismatch from French to Corsican, i.e. (i) words that are singular in French and plural in Corsican language; and (ii) ) words that are plural in French and singular in Corsican language (for example French ‘la canicule’ translates into ‘i sulleoni’ in Corsican language: 1% (medium difficulty).
  • specific grammatical case: 2% (hard)
  • anaphora resolution associated with gender or number mismatch: 1% (hard)
  • unknown, unclassified: 6% (hard)

A Special Case of Anaphora Resolution

After improper anaphora resolution

Anaphora resolution usually refers to pronouns. But we face here a special case of anaphora resolution that relates to an adjective. The following sentence: ‘un vase de Chine authentique’ (an authentic vase of China) is translated erroneously as un vasu di China autentica, due to erroneous anaphora resolution. In this sample, the adjective ‘authentique’ refers to ‘vase’ (English: vase) and not to ‘Chine’ (China).

The same goes for ‘une chanson du Portugal mythique’, where ‘mythique’ refers to ‘chanson’ and not to ‘Portugal’.

After appropriate anaphora resolution

Solving fivefold ambiguity: translation for French ‘poste’

French word ‘poste’ has (at least) fivefold ambiguity. For it can designate:

  • ‘poste’ (masculine singular noun) : postu, masculine singular noun (set, i.e. television set)
  • ‘poste’ (masculine singular noun): posta, feminine singular noun (position): erroneously translated as postu in the present case ; it should read a so posta
  • ‘poste’ (feminine singular noun) : posta, feminine singular noun (post office)
  • ‘poste’: impostu (from the verb impustà (‘poster’, to station o.s.) at singular first person)
  • ‘poste’: imposta (from the verb impustà (‘poster’, to station o.s.) at singular third person)

(However, it is more complex than that, since there is another sense of the verb ‘poster’ (to post/to mail).

Chemistry: translating acid names


Translating this series of acid names is not as easy as it could seem at first glance. In effect, each acid name is composed of three consecutive ambiguous names:

  • ‘l’ is ambiguous between the masculine (u, the) or feminine (a, the) definite article
  • ‘acide’ is ambiguous betwwen acidu (acid, masculine singular noun), acitu (acid, masculine singular adjective) or acita (acid, feminine singular noun)
  • ‘daturique’, etc. are all ambiguous since that can be either masculine singular (daturicu, daturic) or feminine singular (daturica, daturic) adjectives.

Interesting case of first name disambiguation

Here is an interesting case of first name disambiguation for machine translation. Consider the following first name ‘Camille’. It can apply to both genders. In Corsican (taravese or sartinese variants) it translates either into Cameddu (masculine) or Camedda (feminine). In some cases, the corresponding disambiguation relies on mere grammatical grounds. For example, ‘Camille était beau’ translates into Cameddu era beddu (Camille was beautiful), on grammatical grounds alone. The same goes for ‘Camille était belle’, that translates straightforwardly into Camedda era bedda (Camille was beautiful), according to the adjective gender.

Now the related disambiguation can result in a hard case, relying only on semantic context. Hence, ‘Camille était pacifique” can translate either into Cameddu era pacificu or into Camedda era pacifica, depending on the context (which can be text or even an image…). In effect, it cannot be translated merely on grammatical grounds, since ‘pacifique’ (pacific) is gender-ambiguous: it can translate either into pacificu of pacifica.

Now the same goes for French first name ‘Dominique’ (Dominic), which translates either into ‘Dumenicu (masculine) or ‘Dumenica‘ (feminine). Hence, ‘Dominique était pacifique’ (Dominic was pacific) can translate either into ‘Dumenicu era pacificu‘ or into ‘Dumenica era pacifica‘, depending on the context.

Writing differences between Corsican and Gallurese

Here are some writing differences between Corsican and Sardinian gallurese, that result from historical writing habits. These writing differences prevail, even when the words are the same:

  • ghj is replaced by gghj: acciaghju (corsu), acciagghju (gallurese) , steel
  • chj is replaced by cchj: finochju (corsu), finocchju (gallurese), fennel
  • tonic accent is marked systematically in gallurese whereas it is not compulsory in Corsican: apostulu (Corsican), apòstulu (gallurese), apostle
  • cc is prefered in Gallurese language instead of cq in Corsican: acquistu (corsu), accuistu (gallurese), purchase
  • dd in Corsican taravese or sartinese is replaced with ddh in Gallurese: beddu bedda beddi (corsu), beddhu beddha beddhi (gallurese), beautiful
  • final è in Corsican is replaced with é in Gallurese: sapè (corsu), sapé (gallurese), know