Machine Translation

Shaalan, K., A. Hendam, and A. Rafea, "Rapid development and deployment of bi-directional expert systems using machine translation technology", Expert Systems with Applications, vol. 39, issue 1, no. 1, pp. 1375 - 1380, 2012. Abstractks_mt_eswa_2012.pdfWebsite

The present work reports our attempt in developing an English–Arabic bi-directional machine translation tool in the agriculture domain. It aims to achieve automated translation of agricultural expert systems. In particular, we describe the translation of domain knowledge base, including, prompts, responses, explanation text, and advices. In the Central Laboratory for Agricultural Expert Systems (CLAES) where many successful agricultural expert systems have been developed, this tool is found to be essential in developing bi-directional (English–Arabic) expert systems because both English and Arabic versions are needed for development, deployment, and usage purpose. The tool also helps knowledge engineers in overcoming the language barrier by acquiring knowledge from either English or Arabic speaking domain experts. This paper discusses our experience with the developed machine translation tool and reports on results of its application on real agricultural expert systems.

Shaalan, K., and A. H. Hossny, "Automatic rule induction in Arabic to English machine translation framework", Challenges for Arabic Machine Translation, Amsterdam, The Netherlands, John Benjamins Publishing Company, 2012. Abstractkhaled_shaalan_ch10.pdf

This paper addresses exploiting a supervised machine learning technique to automatically induce Arabic-to-English transfer rules from chunks of parallel aligned linguistic resources. The induced structural transfer rules encode the linguistic translation knowledge for converting an Arabic syntactic structure into a target English syntactic structure. These rules are going to be an integral part of an Arabic-English transfer-based machine translation. Nevertheless, a novel morphological rule induction method is employed for learning Arabic morphological rules that are applied in our Arabic morphological analyzer. To demonstrate the capability of the automated rule induction technique we conducted rule-based translation experiments that use induced rules from a relatively small data set. The translation quality of the hybrid translation experiments achieved good results in terms of WER.

Shaalan, K., A. Abdel-Monem, and A. Rafea, "Arabic Morphological Generation from Interlingua: A Rule-based Approach", Intelligent Information Processing III, vol. 228: Springer US, pp. 441-451, 2007. Abstractmorph_gen_mt.pdf

Arabic is a Semitic language that is rich in its morphology. Arabic has very numerous and complex morphological rules. Arabic morphological analysis has gained the focus of Arabic natural language processing research for a long time in order to achieve the automated understanding of Arabic. With the recent technological advances, Arabic natural language generation has received attentions in order to allow for a room for wider applications such as machine translation. For machine translation systems that support a large number of languages, interlingua-based machine translation approaches are particularly attractive. In this paper, we report our attempt at developing a rule-based Arabic morphological generator for task-oriented interlingua-based spoken dialogues. Examples of morphological generation results from the Arabic morphological generator will be given and will illustrate how the system works. Nevertheless, we will discuss the issues related to the morphological generation of Arabic words from an interlingua representation, and present how we have handled them.

Shaalan, K., "Machine Translation of Arabic Interrogative Sentence into English", the 8th International conference on Artificial Intelligence Applications, Cairo, Egypt, American University in Cairo, pp. 473–483, 2000. Abstractmt_interrogative.pdf

The present work reports our attempt in developing a bi-lingual Machine Translation (MT) tool in the agriculture domain. The work described here is part of an ongoing research to automate the translation of user interfaces of knowledge-based systems. In particular, we describe the translation of Arabic interrogative sentence into English. In Central Laboratory for Agricultural Expert Systems (CLAES), this tool is found to be essential in developing bilingual (Arabic-to-English) expert systems because both the Arabic and the English versions are needed for development and usage purpose. The tool follows the transfer-based MT approach. A major design goal of this tool is that it can be used as a stand-alone tool and can be very well integrated with a general MT system for Arabic sentence. The paper also describes our experience with the developed MT system and reports results of its application on interrogatives from real agricultural expert systems.

Abdel-Monem, A., K. Shaalan, A. Rafea, and H. Baraka, "A Proposed Approach for Generating Arabic from Interlingua in a Multilingual Machine Translation System", Language Engineering conference, Cairo, Egypt, Ain Shams University, pp. 197–206, Oct, 2003. Abstractgen_paper_nlg_conf.pdf

Intelingua (meaning) representation has been successfully used in multilingual machine translation. This paper reports our attempt to generate Arabic sentence from interlingua. The proposed system will be compatible with the NESPOLE consortium. In NESPOLE an Interlingua called interchange format or IF, designed for travel planning is used. Our approach describes how to generate grammatically correct Arabic sentence from Interlingua. It involves two main components a mapper for converting intelingua into syntactic structure (feature-structure) and a generator for generating the target Arabic sentence that represents the intended meaning. A translation example is provided to explain the inner working of the system.

Nabhan, A., A. Rafea, and K. Shaalan, "Enhancing Phrase Extraction from Word Alignments Using Morphology", The 5th Conference on Language Engineering, Egyptian Society of Language Engineering (ELSE), Cairo, Egypt, Ain Shams University, pp. 57–65, sep, 2005. Abstractnabhan_nle.pdf

We propose a technique for effective extraction of bilingual phrases from word alignments using morphological processing. Morphological processing leads to an increase of the frequency of words in the corpus, consequently reduces Alignment Error Rate (AER). Intuitively, better word alignments enhance the quality of bilingual phrases extracted. Using alignments of a stemmed corpus for phrase extraction, instead of alignments of a raw one, shows significant improvements in translation quality, especially with small corpora.

Shaalan, K., A. Abdel-Monem, A. Rafea, and H. Baraka, "Mapping Interlingua Representations to Feature Structures of Arabic Sentences", The Challenge of Arabic for NLP/MT International Conference, the British Computer Society, London, UK, British Computer Society (BCS), pp. 149–159, oct, 2006. Abstractmapping_interlingua2arabic.pdf

The interlingua approach to Machine Translation (MT) aims to achieve the translation task in two independent steps. First, the meanings of source language sentences are represented in an intermediate (interlingua) representation. Then, sentences of the target language are generated from those meaning representations. In the generation of the target sentence, determining sentence structures becomes more difficult, especially when the interlingua does not contain any syntactic information. Hence, the sentence structures cannot be transferred exactly from the interlingua representations. In this paper, we present a mapping approach for taskoriented interlingua-based spoken dialogue that transforms an interlingua representation, so-called Interchange Format (IF), into a feature structure (FS) that reflects the syntactic structure of the target Arabic sentence. This approach addresses the handling of the problem of Arabic syntactic structure determination in the interlingua approach. A mapper is developed primarily within the framework of the NESPOLE! (NEgotiating through SPOken Language in E-commerce) multilingual speech-to-speech MT project. The IF-to-Arabic FS mapper is implemented in SICStus Prolog. Examples of Arabic syntactic mapping, using the output from the English analyzer provided by Carnegie Mellon University (CMU), will illustrate how the system works.

Shaalan, K., A. A. Monem, A. Rafea, and H. Baraka, "Generating Arabic text from Interlingua", the 2nd Workshop on Computational Approaches to Arabic Script-based Languages (CAASL), Stanford, California, USA, Linguistic Society of America Summer Institute, Stanford University, pp. 137–144, jul, 2007. Abstractcaasl2_mt.pdf

In this paper, we describe a grammar-based generation approach for task-oriented interlingua-based spoken dialogue that transforms a shallow semantic interlingua representation called Interchange Format (IF) into Arabic Text that corresponds to the intentions underlying the speakers' utterances. The generation approach is developed primarily within the framework of the NESPOLE! (NEgotiating through SPOken Language in E-commerce) multilingual speech-to-speech MT project. The IF-to-Arabic generator is implemented in SICStus Prolog. We conducted an evaluation experiment using the output from the English analyzer provided by Carnegie Mellon University (CMU). The results of this experiment were promising and assured the ability of the generation approach in generating Arabic text form the interlingua taken from the travel and tourism domain.

Farouk, A., A. Rafea, and K. Shaalan, "Analysis of Spoken Arabic into Interlingua Representation using Automatic Classification Approach", 3rd International Computer Engineering Conference: Smart Applications for the Information Society, Cairo, Egypt, Cairo University, dec, 2007. Abstractanalysis_spoken.pdf

Semantic analysis is the system that takes as input a sentence and outputs a list of prominent concepts that characterize the contents of the input sentence, and for each concept, gives the set of attributes that discuss the concept along with their relevancies. This paper presents a system that employs a machine learning approach that automates the semantic analysis process of spoken Arabic into interlingua representation. An experiment has been conducted to measure the performance of our approach. The results were promising and assured the ability of this approach in capturing the semantics of Arabic utterances taken from the travel and tourism domain.

Farouk, A., A. Rafea, and K. Shaalan, "Recognizing Semantic Concepts of Spoken Arabic Utterances using Genetic Technology", The Seventh Conference on Language Engineering, Egyptian Society of Language Engineering (ELSE), Cairo, Egypt, Ain Shams University, dec, 2007. Abstractconcept_spotting.pdf

Genetic algorithms (GA) are a family of computational models inspired by evolution. GA mainly designed to solve optimization problems which can be thought of as searching through a large number of candidates for the best one that can be found. In this paper we present a genetic model to solve the problem of recognizing deep semantic concepts from spoken Arabic utterances. The aim of this algorithm is to automatically generate the grammar that recognizes each concept in the domain of discourse. This grammar is used to extract the observed concepts from the utterance. An experiment has been conducted to measure the performance of our approach. The results were promising and assured the ability of this approach in identifying the concepts of Arabic utterances taken from the travel and tourism domain.

Khaled Shaalan

Professor of Computer Science

Machine Translation