Tag Archives: rule-based MT

Why it’s worth it to engage in rule-based translation

Rule-based translation is difficult to implement. The main difficulty encountered is taking into account the groups of words, so as to be on a par with statistics-based translation. The main problems in this regard are (i) polymorphic disambiguation; and (ii) building a fair typology of grammatical types. But once these steps begin to be mastered, there are many advantages. What seems essential here is that with the same piece of software, both machine translation and text analysis can be carried out. Among the modules that are easy to implement are the following:

  • lemmatizer
  • part-of-speech tagger
  • singularizer
  • pluralizer
  • grammar checker
  • type extractor: a module that allows you to extract words from a text according to their grammatical category

For the implementation of rule-based translation provides the machine with some inherent understanding of the text, in the same way that a human being does. To put it in a nutshell, it is better artificial intelligence.

Finally, other modules, more advanced, seem possible (to be confirmed).

Analyzing relative pronouns

What is the status of ‘relative pronouns’ of classical grammar within the present conceptual framework? Traditionally, a distinction is made between simple relative pronouns (qui, que, dont, où ; who, what, whose, where) and compound relative pronouns (à qui, pour lesquelles, à côté duquel, etc.; to whom, for whom, beside whom, etc.). If we look first at simple relative pronouns, the category does not seem satisfactory, in particular because of the presence of ‘qui’ (who) and ‘que’ (what), whose grammatical role appears, in the present context, to be quite different. Consider the two short sentences: ‘la maison que j’habite est grande’; et ‘l’homme qui parle est grand’. (the house I live in is big and the man who speaks is tall.). As these two examples illustrate, the structures following ‘que’ and ‘qui’ appear different. Here, ‘que’ is followed by a personal pronoun (‘j’habite’: I live) and a conjugated verb; and ‘qui’ is followed directly by a conjugated verb (‘parle’: speaks). From our present perspective, these are inherently different structures. Here, it turns out that ‘dont’ and ‘où’ admit the same type of structure as ‘que’. Thus, the homogeneous category, from our point of view, is formed here by ‘que’, ‘dont’, ‘où’, but not by ‘qui’. If we extend this analysis to other words, by searching for those who could fit into this category, we also find: ‘duquel’ (= de lequel; from which), ‘de laquelle’, ‘desquels’ (= de lesquels; from which), ‘desquelles’ (= de lesquelles; from which), ‘auquel’ (à lequel), à laquelle, ‘auxquels’ (à lesquels), ‘auxquelles’ (à lesquelles). But we also have all forms of the same type built from another preposition than ‘de’ or ‘à’: ‘sur lequel’, ‘sur laquelle’, …, ‘par lequel’, ‘par laquelle’, ‘avec lequel’, etc. Les pronoms relatifs composés classiques tels que ‘à qui’, ‘pour lesquelles’, ‘à côté duquel’, etc.; to whom, for whom, beside whom, etc.), s’intègrent également naturellement dans cette catégorie. But from the point of view of two-sided grammar, ‘à l’aide duquel’, ‘au moyen de laquelle’, ‘à la suite de quoi’, ‘à l’aide de qui’, etc. (with the help of which, by means of which, as a result of which, with the help of whom, etc.) also belong to this category. (to be continued)

On the statistical/rule-based divide regarding MT

The classical divide with regard to MT separates statistical from rule-based MT. But this divide is not as clear-cut as one could think at first glance. For rule-based MT can operate statistically. Let us take an example, concerning the disambiguation of French ‘est’: it can be translated either as is or as east, depending on the context. Defining the rules for disambiguating ‘est’ can be somewhat complicated. A rule-based MT could then define a few rules that would cover 90% of the cases, and for the remaining 10%, it could apply a closure rule that translates ‘est’ into is inconditionnally. Such rule would be based on the statistical fact that most often, ‘est’ translates into is and not into east. Such rule may succeed in most of the cases. As we see it, such rule is statistical by essence. Hence the conclusion, the statistical/rule-based divide regarding MT is not as as clear-cut as one could think prima facie. For a disambiguating system for rule-based MT could be built with closure rules of this type, that would ooperate statistically.