Arabic

Shaalan, K., "An Intelligent Computer Assisted Language Learning System for Arabic Learners", Computer Assisted Language Learning, vol. 18, no. 1-2: Routledge, part of the Taylor & Francis Group, pp. 81-109, 2005. Abstractarabic_icall.pdfWebsite

This paper describes the development of an intelligent computer-assisted language learning (ICALL) system for learning Arabic. This system could be used for learning Arabic by students at primary schools or by learners of Arabic as a second or foreign language. It explores the use of Natural Language Processing (NLP) techniques for learning Arabic. The learners are encouraged to produce sentences freely in various situations and contexts and guided to recognize by themselves the erroneous or inappropriate functions of their misused expressions. In this system, we use NLP tools (including morphological analyzer and syntax analyzer) and error analyzer to issue feedback to the learner. Furthermore, we propose a mechanism of correction by the learner which allows the learner to correct the typed sentence independently, and allows the learner to realize that what the error is.

Shaalan, K., H. Talhami, and I. Kamel, "A Morphological Generator for the Indexing of Arabic Audio", the Proceedings of The IASTED International Conference on Artificial Intelligence and Soft Computing (ASC), Benidorm, Spain, ACTA Press, pp. 307–312, September, 2005. Abstractmorph_audio.pdf

This paper presents a novel Arabic morphological generator (AMG) for Modern Standard Arabic (MSA) which is designed and implemented using Prolog. The AMG is used to generate inflected forms of words used for the indexing of Arabic audio. These words are also the relevant terms in the Arab authority system (library information retrieval system) used in this study. The AMG generates inflected Arabic words from the root according to pre-specified morphological features that can be extended as needed. The Arabic word is represented as a feature structure which is handled through unification during the morphological generation process. The inflected forms can then be inserted automatically into a speech recognition grammar which is used to identify these words in an audio sequence or utterance.

Shaalan, K., A. Abdel-Monem, A. Rafea, and H. Baraka, "Mapping Interlingua Representations to Feature Structures of Arabic Sentences", The Challenge of Arabic for NLP/MT International Conference, the British Computer Society, London, UK, British Computer Society (BCS), pp. 149–159, oct, 2006. Abstractmapping_interlingua2arabic.pdf

The interlingua approach to Machine Translation (MT) aims to achieve the translation task in two independent steps. First, the meanings of source language sentences are represented in an intermediate (interlingua) representation. Then, sentences of the target language are generated from those meaning representations. In the generation of the target sentence, determining sentence structures becomes more difficult, especially when the interlingua does not contain any syntactic information. Hence, the sentence structures cannot be transferred exactly from the interlingua representations. In this paper, we present a mapping approach for taskoriented interlingua-based spoken dialogue that transforms an interlingua representation, so-called Interchange Format (IF), into a feature structure (FS) that reflects the syntactic structure of the target Arabic sentence. This approach addresses the handling of the problem of Arabic syntactic structure determination in the interlingua approach. A mapper is developed primarily within the framework of the NESPOLE! (NEgotiating through SPOken Language in E-commerce) multilingual speech-to-speech MT project. The IF-to-Arabic FS mapper is implemented in SICStus Prolog. Examples of Arabic syntactic mapping, using the output from the English analyzer provided by Carnegie Mellon University (CMU), will illustrate how the system works.

Talhami, H., and K. Shaalan, "An Arabic/English switch for audio indexing and dialogue management", IASTED International Conference on Internet and Multimedia Systems and Applications (EuroIMSA 2006), Innsbruck, Austria, ACTA Press, pp. 189–192, 2006. Abstractaudio_indexing.pdf

This paper presents a technique for the automatic switching between Arabic and English which has been developed for audio indexing and dialogue management applications. It classifies utterances and sub-utterances as either U.S. English or Modern Standard Arabic (MSA) in a closed system. The approach extends the work of Zissman and Singer[1] to the problem of Arabic/English language identification problem. Two sets of acoustic phoneme models (English and Arabic HMMs) and two language models (phone bigrams) per acoustic model set are used. Four Large Vocabulary Continuous Speech Recognition (LVSCR) recognition passes are performed, (one for each HMM + language model set), using a phone loop grammar. The four path scores are fed into a Bayesian classifier (a multi-layer perceptron) which classifies each utterance as either English or Arabic. The technique demonstrated high accuracy on test data unseen by the system during the modelling process. The language switch has been used successfully as a front-end processor in an audio indexing and retrieval system as well as a dialogue management system.

Shaalan, K., and H. Talhami, "Error analysis and handling in Arabic ICALL systems", IASTED International Conference on Artificial Intelligence and Applications (AIA 2006), Innsbruck, Austria, ACTA Press, pp. 109–114, Febrauray, 2006. Abstracterror_analysis_icall.pdf

Arabic is a Semitic language that is rich in its morphology and syntax. The very numerous and complex grammar rules of the language could be confusing even for Arabic native speakers. Many Arabic intelligent computer-assisted language-learning (ICALL) systems have neither deep error analysis nor sophisticated error handling. In this paper, we report an attempt at developing an error analyzer and error handler for Arabic as an important part of the Arabic ICALL system. In this system, the learners are encouraged to construct sentences freely in various contexts and are guided to recognize by themselves the errors or inappropriate usage of their language constructs. We used natural language processing (NLP) tools such as a morphological analyzer and a syntax analyzer for error analysis and to give feedback to the learner. Furthermore, we propose a mechanism of correction by the learner, which allows the learner to correct the typed sentence independently. This will result in the learner being able to figure out what the error is. Examples of error analysis and error handling will be given and will illustrate how the system works.

Shaalan, K., A. A. Monem, A. Rafea, and H. Baraka, "Generating Arabic text from Interlingua", the 2nd Workshop on Computational Approaches to Arabic Script-based Languages (CAASL), Stanford, California, USA, Linguistic Society of America Summer Institute, Stanford University, pp. 137–144, jul, 2007. Abstractcaasl2_mt.pdf

In this paper, we describe a grammar-based generation approach for task-oriented interlingua-based spoken dialogue that transforms a shallow semantic interlingua representation called Interchange Format (IF) into Arabic Text that corresponds to the intentions underlying the speakers' utterances. The generation approach is developed primarily within the framework of the NESPOLE! (NEgotiating through SPOken Language in E-commerce) multilingual speech-to-speech MT project. The IF-to-Arabic generator is implemented in SICStus Prolog. We conducted an evaluation experiment using the output from the English analyzer provided by Carnegie Mellon University (CMU). The results of this experiment were promising and assured the ability of the generation approach in generating Arabic text form the interlingua taken from the travel and tourism domain.

Farouk, A., A. Rafea, and K. Shaalan, "Analysis of Spoken Arabic into Interlingua Representation using Automatic Classification Approach", 3rd International Computer Engineering Conference: Smart Applications for the Information Society, Cairo, Egypt, Cairo University, dec, 2007. Abstractanalysis_spoken.pdf

Semantic analysis is the system that takes as input a sentence and outputs a list of prominent concepts that characterize the contents of the input sentence, and for each concept, gives the set of attributes that discuss the concept along with their relevancies. This paper presents a system that employs a machine learning approach that automates the semantic analysis process of spoken Arabic into interlingua representation. An experiment has been conducted to measure the performance of our approach. The results were promising and assured the ability of this approach in capturing the semantics of Arabic utterances taken from the travel and tourism domain.

Shaalan, K., and E. Othman, "Issues in the Morphological Analysis of the Arabic Passive Verb", The Seventh Conference on Language Engineering, Egyptian Society of Language Engineering (ELSE), Cairo, Egypt, Ain Shams University, dec, 2007. Abstractweakpasvvrb.pdf

Arabic is a strongly structured and highly derivational language. Arabic morphology and syntax provide the ability to add a large number of affixes to each word which makes combinatorial increment of possible words. In Arabic, passive voice is used as a writing style when: 1) the subject is unknown, 2) the subject is unimportant enough to be mentioned, or 3) the author wants to highlight the object. In this paper, the issues related to the recognition of the Arabic passive verbs which impact the automated understanding of Arabic sentences were addressed. An experiment using the Buckwalter Arabic morphological analyzers, one of the mature Arabic morphological analyzer, were conducted in order to highlight the limitations in the analysis of Arabic passive verbs. Results indicated that there exists a need for handling the problems related to the morphological analysis of passive verbs in order to improve the recognition accuracy of Arabic words.

Shaalan, K., H. Talhami, and I. Kamel, "Automatic Morphological Generation for the Indexing of Arabic Speech Recordings", The International Journal of Computer Processing of Oriental Languages (IJCPOL), vol. 20, no. 1, pp. 1–14, 2007. Abstractijcpol2.pdfWebsite

Shaalan, K., H. Bakr, and I. Ziedan, "Transferring Egyptian Colloquial Dialect into Modern Standard Arabic", International Conference on Recent Advances in Natural Language Processing (RANLP 2007), Borovets, Bulgaria, John Benjamins, pp. 525–529, sep, 2007. Abstracttransferring_egyptian_colloquial2arabic_.pdf

Arabic is rooted in the Classical or Qur'anical Arabic, but over the centuries, the language has developed to what is now accepted as Modern Standard Arabic (MSA). Arab colloquial dialects are generally only spoken languages, but recently the rate of colloquial written text increases dramatically as a medium of expressing ideas especially across the WWW, usually in the form of blogs and partially colloquial articles. Most of these written colloquial has been in the Egyptian colloquial dialect, which is considered the most widely dialect understood and used throughout the Arab world. We are able to reuse MSA processing tools with colloquial Arabic by transferring colloquial Arabic words into their corresponding MSA words. The advantages of this lexical transfer are to facilitate the communication with colloquial Arabic speakers and restoring it to the standard language in use nowadays. This paper addresses the transfer techniques between colloquial Arabic and MSA, which have not yet been closely studied before. In particular, we present a rule-based lexical transfer approach for converting Egyptian colloquial words into their corresponding MSA words. This process involves morphological analysis and lexical acquisition of colloquial words.

Khaled Shaalan

Professor of Computer Science

Arabic