Interlingua

Shaalan, K., A. Abdel-Monem, and A. Rafea, "Arabic Morphological Generation from Interlingua: A Rule-based Approach", Intelligent Information Processing III, vol. 228: Springer US, pp. 441-451, 2007. Abstractmorph_gen_mt.pdf

Arabic is a Semitic language that is rich in its morphology. Arabic has very numerous and complex morphological rules. Arabic morphological analysis has gained the focus of Arabic natural language processing research for a long time in order to achieve the automated understanding of Arabic. With the recent technological advances, Arabic natural language generation has received attentions in order to allow for a room for wider applications such as machine translation. For machine translation systems that support a large number of languages, interlingua-based machine translation approaches are particularly attractive. In this paper, we report our attempt at developing a rule-based Arabic morphological generator for task-oriented interlingua-based spoken dialogues. Examples of morphological generation results from the Arabic morphological generator will be given and will illustrate how the system works. Nevertheless, we will discuss the issues related to the morphological generation of Arabic words from an interlingua representation, and present how we have handled them.

Abdel-Monem, A., K. Shaalan, A. Rafea, and H. Baraka, "A Proposed Approach for Generating Arabic from Interlingua in a Multilingual Machine Translation System", Language Engineering conference, Cairo, Egypt, Ain Shams University, pp. 197–206, Oct, 2003. Abstractgen_paper_nlg_conf.pdf

Intelingua (meaning) representation has been successfully used in multilingual machine translation. This paper reports our attempt to generate Arabic sentence from interlingua. The proposed system will be compatible with the NESPOLE consortium. In NESPOLE an Interlingua called interchange format or IF, designed for travel planning is used. Our approach describes how to generate grammatically correct Arabic sentence from Interlingua. It involves two main components a mapper for converting intelingua into syntactic structure (feature-structure) and a generator for generating the target Arabic sentence that represents the intended meaning. A translation example is provided to explain the inner working of the system.

Shaalan, K., A. Abdel-Monem, A. Rafea, and H. Baraka, "Mapping Interlingua Representations to Feature Structures of Arabic Sentences", The Challenge of Arabic for NLP/MT International Conference, the British Computer Society, London, UK, British Computer Society (BCS), pp. 149–159, oct, 2006. Abstractmapping_interlingua2arabic.pdf

The interlingua approach to Machine Translation (MT) aims to achieve the translation task in two independent steps. First, the meanings of source language sentences are represented in an intermediate (interlingua) representation. Then, sentences of the target language are generated from those meaning representations. In the generation of the target sentence, determining sentence structures becomes more difficult, especially when the interlingua does not contain any syntactic information. Hence, the sentence structures cannot be transferred exactly from the interlingua representations. In this paper, we present a mapping approach for taskoriented interlingua-based spoken dialogue that transforms an interlingua representation, so-called Interchange Format (IF), into a feature structure (FS) that reflects the syntactic structure of the target Arabic sentence. This approach addresses the handling of the problem of Arabic syntactic structure determination in the interlingua approach. A mapper is developed primarily within the framework of the NESPOLE! (NEgotiating through SPOken Language in E-commerce) multilingual speech-to-speech MT project. The IF-to-Arabic FS mapper is implemented in SICStus Prolog. Examples of Arabic syntactic mapping, using the output from the English analyzer provided by Carnegie Mellon University (CMU), will illustrate how the system works.

Shaalan, K., A. A. Monem, A. Rafea, and H. Baraka, "Generating Arabic text from Interlingua", the 2nd Workshop on Computational Approaches to Arabic Script-based Languages (CAASL), Stanford, California, USA, Linguistic Society of America Summer Institute, Stanford University, pp. 137–144, jul, 2007. Abstractcaasl2_mt.pdf

In this paper, we describe a grammar-based generation approach for task-oriented interlingua-based spoken dialogue that transforms a shallow semantic interlingua representation called Interchange Format (IF) into Arabic Text that corresponds to the intentions underlying the speakers' utterances. The generation approach is developed primarily within the framework of the NESPOLE! (NEgotiating through SPOken Language in E-commerce) multilingual speech-to-speech MT project. The IF-to-Arabic generator is implemented in SICStus Prolog. We conducted an evaluation experiment using the output from the English analyzer provided by Carnegie Mellon University (CMU). The results of this experiment were promising and assured the ability of the generation approach in generating Arabic text form the interlingua taken from the travel and tourism domain.

Farouk, A., A. Rafea, and K. Shaalan, "Analysis of Spoken Arabic into Interlingua Representation using Automatic Classification Approach", 3rd International Computer Engineering Conference: Smart Applications for the Information Society, Cairo, Egypt, Cairo University, dec, 2007. Abstractanalysis_spoken.pdf

Semantic analysis is the system that takes as input a sentence and outputs a list of prominent concepts that characterize the contents of the input sentence, and for each concept, gives the set of attributes that discuss the concept along with their relevancies. This paper presents a system that employs a machine learning approach that automates the semantic analysis process of spoken Arabic into interlingua representation. An experiment has been conducted to measure the performance of our approach. The results were promising and assured the ability of this approach in capturing the semantics of Arabic utterances taken from the travel and tourism domain.

Farouk, A., A. Rafea, and K. Shaalan, "Recognizing Semantic Concepts of Spoken Arabic Utterances using Genetic Technology", The Seventh Conference on Language Engineering, Egyptian Society of Language Engineering (ELSE), Cairo, Egypt, Ain Shams University, dec, 2007. Abstractconcept_spotting.pdf

Genetic algorithms (GA) are a family of computational models inspired by evolution. GA mainly designed to solve optimization problems which can be thought of as searching through a large number of candidates for the best one that can be found. In this paper we present a genetic model to solve the problem of recognizing deep semantic concepts from spoken Arabic utterances. The aim of this algorithm is to automatically generate the grammar that recognizes each concept in the domain of discourse. This grammar is used to extract the observed concepts from the utterance. An experiment has been conducted to measure the performance of our approach. The results were promising and assured the ability of this approach in identifying the concepts of Arabic utterances taken from the travel and tourism domain.

Shaalan, K., A. Abdel-Monem, and A. Rafea, "Syntactic Generation of Arabic in Interlingua-based Machine Translation Framework", Third workshop on Computational Approaches to Arabic Script-based Languages (CAASL3), Machine Translation Summit XII: ACL, 2009. Abstractsyntactic_gen_arabic_caasl3.pdf

Arabic is a highly inflectional language, with a rich morphology, relatively free word order, and two types of sentences: nominal and verbal. Arabic natural language processing in general is still underdeveloped and Arabic natural language generation (NLG) is even less developed. In particular, Arabic natural language generation from Interlingua was only investigated using template-based approaches. Moreover, tools used for other languages are not easily adaptable to Arabic due to the Arabic language complexity at both the morphological and syntactic levels. In this paper, we report our attempt at developing a rule-based Arabic generator for task-oriented interlingua-based spoken dialogues. Examples of syntactic generation results from the Arabic generator will be given and will illustrate how the system works. Our proposed syntactic generator has been effectively evaluated using real test data and achieved satisfactory results.

Khaled Shaalan

Professor of Computer Science

Interlingua