Monthly Archives: February 2017

Double adjective accordance: scoring 98.43%

Now scoring 1 – 2/128 = 98.43%. There are only two related errors, of a special case of adjective accordance: ‘aux xxie et XXe siècles’  (in the 21st and 20th centuries) should translate into: à i XIXu è XXu seculi. There are 3 ambiguous words here:

  • ‘aux’ i.e. ‘à les’ (in the): à i (masculine plural)/à e (feminine plural)
  • ‘xxie’ i.e. ‘vingt-et-unième’ (21st): XIXu (masculine singular)/XIXa (feminine singular)
  • ‘xxe’ i.e. ‘vingtième’ (20th): XXu (masculine singular)/XXa (feminine singular)

Proper accordance should be performed as follows:

  • ‘aux’ : à i (masculine plural): depends on ‘siècles’ (centuries), masculine plural
  • ‘xxie’ i.e. vingt-et-unième (21st): XIXu (masculine singular)
  • ‘xxe’ i.e. vingtième (20th): XXu (masculine singular)

Of the same type are:

  • ‘les langues italienne et française’: e lingue taliana è francesathe Italian and French languages (English is ambiguous in this case, since ‘les langues italiennes et françaises’ translate the same, although the meaning is different, referring explicitly to the several varieties of Italian anf French languages. In French, the ambiguity only concerns oral text, since the written sentence is unambiguous. In Corsican language, both written and oral sentences are unambiguous.)
  • ‘les codes pénal et civil’: i codici penale è civilethe penal and civil codes

Now should it be considered an instance of a successful Feigenbaum test? Arguably, yes (although this is debatable). These two errors can not be considered as gross errors, from a Feigenbaum test perspective. They can be considered as some errors a human could do.

But caution: at present time, this is only one exceptional case of successful instance. Call it Feigenbaum hit. What we are intested in is regular successful  Feigenbaum test. For the moment the software is not capable of that. New target: 99% and/or more frequent successful Feigenbaum hits.

Accordance of past participe

Now scoring 1 – 2/129 = 98.44%.

  • The issue of past participe’s accordance again: ‘une session du parlement tenue à Nuremberg’ (a session of the Parliament held in Nuremberg) should translate into una sessione di u parlamentu tenuta in Nuremberg. Past participe tenuta should accord with sessione (feminine, session) and not with parlamentu (masculine, Parliament). This could need dependency parsing, but it could be insufficient. Perhaps (harder) semantic disambiguation is required in this case.
  • One false positive: ‘des’, being a Deutsch word, should remain untranslated.

Past participe or present simple: the disambiguation of French ‘construit’

In the present case, it should read, custruitu à u seculu XII (built in the 12th century). The error relates to the disambiguation of French ‘construit’. It can translate into:

  • custruitu (built): past participe, masculine, singular
  • custruisce (builds): present simple, third person

MT should (i) find the proper reference of ‘construit’, i.e. ‘clocher’ (church tower), but above all (ii) whether  ‘construit’ is a past participe or a present simple. Some kind of dependency parser is in order…

The disambiguation of French ‘fils’ again: scoring 98.42%

Scoring 1 – 2/127 = 98.42%. Of interest:

  • ‘de 839 à sa mort’ (from 839 to his death) should read: da u 839 à a so morte. French ‘de’ translates either into di or into da in Corsican language (to simplify matters, since in certain cases, being a partitive article, it translates into nothing).
  • now we face again the multi-ambiguous French ‘fils’, which can translate into: i) figliolu, masculine, singular (son) ii) figlioli, masculine, plural (sons) iii) fili, masculine, plural (wire/wires). In the present case, ‘Fils du roi…’ should translate Figliolu di u rè… (Son of King…).

To notice: five consecutive 100% sentences.

With regard to the Feigenbaum test: failed again. Arguably, the first error is of an acceptable kind, in this context. But the ‘fils’ error is a gross one, that a human would not do…

Can translation help teaching an endangered language?

Can translation help self-teaching and endangered language? It seems yes, it the translation is accurate. Let us check with the verb parlà (to speak). In this case, the translation is 100% accurate, so it can help (but we need to check other verb categories and other tenses). Other verbs of the same group are verbs that end with : manghjà (to eat), saltà (to jump), cantà (to sing), etc.

To begin with: conjugations, present simple:

  • je parle (I speak), tu parles (you speak), il/elle parle (he/she speaks),
    nous parlons (we speak), vous parlez (you speak), ils/elles parlent (they speak)
  • je parlais (I was speaking), tu parlais (you were speaking), il/elle parlait (he/she was speaking),
    nous parlions (we were speaking), vous parliez (you were speaking), ils/elles parlaient (they were speaking)
  • je parlerai (I will speak), tu parleras (you will speak), il/elle parlera (he/she will speak), nous parlerons (we will speak), vous parlerez (you will speak), ils/elles parleront (they will speak).

Of interest:

  • French ‘parle’ is ambiguous since it can translate into parlu (I speak) or parla (he/she speaks).
  • French ‘parlais’ is ambiguous since it can translate into parlavu (I was speaking) or parlavi (you were speaking).

Language self-reference again

Language self-reference is not as uncommon as one could think at first glance. In the above excerpt, we find another of that issue of ‘language self-reference’ (or ‘target language shift’). French ‘le surnom d’« Old Reliable » (en français, « le Vieux Fiable »).’ should translate into Corsican: ‘u sopranome d'” Old Reliable “ (in corsu, ” u Vechju Affidevule “; in English: The nickname of “Old Reliable”.

Hence, a machine translator should include a feature that handles properly this ‘language self-reference’ issue. In conclusion: to implement: this target language shift feature.