There are sometimes false positives. Some words should remain untranslated, notably proper names. Interestingly, it is due to the fact that the english word ‘transport’ is the same in french: transport (fr) = transport (en) = trasportu (co).
Testing #machine translation now facing new elision problem:
Riventosa (fr) = A Riventosa(co)
proper noun (fr) = definite article + proper noun (co)
it should read: in u paese di A Riventosa (without elision)
Elision rules are not trivial:
le village d’Arbellara (fr) = the village of Arbellara, should be translated as:
u paese d’Arbellara (with elision)
Rule-based translation : adjective accordance : interesting stuff:
sur les réseaux japonais et américain (fr) = annantu à e rete sgiappunesa è americana (co) = on the japanese and american networks (en)
noun (networks) is plural but adjectives (japanese and american) are singular
The excerpt refers to the ‘single) japanese network and the (single) american network
I guess (to be confirmed?) that the following sentence is ambiguous in english:
‘the japanese and american networks’: are there one or several japanese network(s)?
are there one or several american network(s)?
Now handling gender reversal:
– mer (FR, feminine) = sea (EN) = mare (CO, masculine)
– saveur (FR, feminine) = flavor (EN) = sapore (CO, masculine)
– liqueur (FR, feminine) = liquor (EN) = licore (CO, masculine)
‘c’est une bonne liqueur’ (FR) = ‘it is a good liquor’ (EN) = hè un bonu licore (CO) requires gender reversal of : definite article ‘un’ (instead of ‘una‘) + adjective bonu (instead of ‘bona‘)
‘la mer est belle’ (FR) = ‘the sea is beautiful’ (EN) = u mare hè bellu (CO) requires gender reversal of : definite article ‘u‘ (instead of ‘a‘) + adjective bellu (instead of ‘bella‘)
u mare hè bellu requires uppercase and should be written: U mare hè bellu.
Introducing new feature for #MachineTranslation:
some verbal locutions:
prendre d’assaut = assaltà
mettre à sac = sacchighjà
prendre au collet = incappià
Now considering the issue of Semantical disambiguation.
Some instances For French to Corsican are:
– ‘défense’ = sanna/difesa = tusk/defense
– ‘vol’ = bulu/furtu = flight/theft
– ‘comprend’ = capisce/cumprende = understands/comprises
– ‘palais’ = palatu/palazzu = palate/palace
– ‘expérience’ = sperienza/sperimentu = experience/experiment
Now scoring 1 – 6/134 = 95.52%.
Lack of vocabulary ‘passacaile’. It should read: ‘versu u 1678‘, ‘di a so epica‘, ‘di u so tempu‘.
Let us mention the issue of threefold ambiguity: french ‘nouvelle’ can translate into:
‘nutizia‘ (‘piece of news’)
or ‘nuvella‘ (short story’)
or ‘nova‘ (‘new’)
The disambiguation between ‘nutizia‘ (‘piece of news’)
or ‘nuvella‘ (short story’) is semantic (hard)
while the disambiguation between ‘nutizia‘/’nuvella‘ (noun)
and ‘nova‘ (adjective) is grammatical (medium difficulty)
Some further reflections on the definition of ‘above human level’ translation:
– the answer may not be based solely on the quantitative side, being of the type: ‘above 96%’, “above 97%’, ‘above 98%’, etc.
– it seems the answer should also incorporate insights from the qualitative side, i.e. not containing gross translation errors.
Semantic disambiguation errors would most oftenbe termed ‘gross errors’: for example, translating ‘défense d’éléphant’ into ‘elephant’s defense’ instead of ‘elephant’s tusk’
– to fix ideas, it could be proposed: ‘above human level’ =
above 98% AND without gross errors
Defining an instance of Feigenbaum test (from wikipedia: generally defined as a variant of the Turing test where a computer software attemps to imitate a human expert in a given field): Translating French into Corsican.
We expect 98% accuracy and lack of gross errors in order to pass this Feigenbaum test.