Arabic

Showing results in 'Publications'. Show all posts

Shaalan, K., and H. Talhami, "Error analysis and handling in Arabic ICALL systems", IASTED International Conference on Artificial Intelligence and Applications (AIA 2006), Innsbruck, Austria, ACTA Press, pp. 109–114, Febrauray, 2006. Abstracterror_analysis_icall.pdf

Arabic is a Semitic language that is rich in its morphology and syntax. The very numerous and complex grammar rules of the language could be confusing even for Arabic native speakers. Many Arabic intelligent computer-assisted language-learning (ICALL) systems have neither deep error analysis nor sophisticated error handling. In this paper, we report an attempt at developing an error analyzer and error handler for Arabic as an important part of the Arabic ICALL system. In this system, the learners are encouraged to construct sentences freely in various contexts and are guided to recognize by themselves the errors or inappropriate usage of their language constructs. We used natural language processing (NLP) tools such as a morphological analyzer and a syntax analyzer for error analysis and to give feedback to the learner. Furthermore, we propose a mechanism of correction by the learner, which allows the learner to correct the typed sentence independently. This will result in the learner being able to figure out what the error is. Examples of error analysis and error handling will be given and will illustrate how the system works.

Ezzat, M., K. Shaalan, and A. Fahmy, "Component Composition Analysis for Arabic Natural Language Processing", the 6th Conference on Language Engineering, Egyptian Society of Language Engineering (ELSE), Cairo, Egypt, Dec., 2006. Abstractcomponentcompositionanlp2.pdf

Building NLP applications from scratch is a difficult task that takes a lot of time and requires acquiring a lot of NLP knowledge. For a rich language like Arabic the difficulties is increased significantly. In this paper, we investigated how to build a tool that helps NLP application developers to build rapid and robust applications. It involves two steps. Firstly, using COM objects technology in building common NLP tools. Secondly, building NLP applications that uses these tools which can access these tools either locally or remotely. We have demonstrated the capabilities of the COM objects in developing NLP tools such as morphological analyzer and used it for building two Arabic NLP applications.

Talhami, H., and K. Shaalan, "An Arabic/English switch for audio indexing and dialogue management", IASTED International Conference on Internet and Multimedia Systems and Applications (EuroIMSA 2006), Innsbruck, Austria, ACTA Press, pp. 189–192, 2006. Abstractaudio_indexing.pdf

This paper presents a technique for the automatic switching between Arabic and English which has been developed for audio indexing and dialogue management applications. It classifies utterances and sub-utterances as either U.S. English or Modern Standard Arabic (MSA) in a closed system. The approach extends the work of Zissman and Singer[1] to the problem of Arabic/English language identification problem. Two sets of acoustic phoneme models (English and Arabic HMMs) and two language models (phone bigrams) per acoustic model set are used. Four Large Vocabulary Continuous Speech Recognition (LVSCR) recognition passes are performed, (one for each HMM + language model set), using a phone loop grammar. The four path scores are fed into a Bayesian classifier (a multi-layer perceptron) which classifies each utterance as either English or Arabic. The technique demonstrated high accuracy on test data unseen by the system during the modelling process. The language switch has been used successfully as a front-end processor in an audio indexing and retrieval system as well as a dialogue management system.

Shaalan, K., A. Abdel-Monem, A. Rafea, and H. Baraka, "Mapping Interlingua Representations to Feature Structures of Arabic Sentences", The Challenge of Arabic for NLP/MT International Conference, the British Computer Society, London, UK, British Computer Society (BCS), pp. 149–159, oct, 2006. Abstractmapping_interlingua2arabic.pdf

The interlingua approach to Machine Translation (MT) aims to achieve the translation task in two independent steps. First, the meanings of source language sentences are represented in an intermediate (interlingua) representation. Then, sentences of the target language are generated from those meaning representations. In the generation of the target sentence, determining sentence structures becomes more difficult, especially when the interlingua does not contain any syntactic information. Hence, the sentence structures cannot be transferred exactly from the interlingua representations. In this paper, we present a mapping approach for taskoriented interlingua-based spoken dialogue that transforms an interlingua representation, so-called Interchange Format (IF), into a feature structure (FS) that reflects the syntactic structure of the target Arabic sentence. This approach addresses the handling of the problem of Arabic syntactic structure determination in the interlingua approach. A mapper is developed primarily within the framework of the NESPOLE! (NEgotiating through SPOken Language in E-commerce) multilingual speech-to-speech MT project. The IF-to-Arabic FS mapper is implemented in SICStus Prolog. Examples of Arabic syntactic mapping, using the output from the English analyzer provided by Carnegie Mellon University (CMU), will illustrate how the system works.

Shaalan, K., and H. Talhami, "Arabic Error Feedback in an Online Arabic Learning System", Advances in Natural Language Processing, Research in Computing Science (RCS) Journal, vol. 18, pp. 203-212, 2006. Abstracterror_feedback_2006.pdf

Arabic is a Semitic language that is rich in its morphology and syntax. The very numerous and complex grammar rules of the language could be confusing even for Arabic native speakers. Many Arabic intelligent computer assisted language-learning (ICALL) systems have neither deep error analysis nor sophisticated error handling. In this paper, we report an attempt at developing an error analyzer and error handler for Arabic as an important part of the Arabic ICALL system. In this system, the learners are encouraged to construct sentences freely in various contexts and are guided to recognize by themselves the errors or inappropriate usage of their language constructs. We used natural language processing (NLP) tools such as a morphological analyzer and a syntax analyzer for error analysis and to give feedback to the learner.
Furthermore, we propose a mechanism of correction by the learner, which allows the learner to correct the typed sentence independently. This will result in the learner being able to figure out what the error is. Examples of error analysis and error handling will be given and will illustrate how the system works.

Nabhan, A., A. Rafea, and K. Shaalan, "Enhancing Phrase Extraction from Word Alignments Using Morphology", The 5th Conference on Language Engineering, Egyptian Society of Language Engineering (ELSE), Cairo, Egypt, Ain Shams University, pp. 57–65, sep, 2005. Abstractnabhan_nle.pdf

We propose a technique for effective extraction of bilingual phrases from word alignments using morphological processing. Morphological processing leads to an increase of the frequency of words in the corpus, consequently reduces Alignment Error Rate (AER). Intuitively, better word alignments enhance the quality of bilingual phrases extracted. Using alignments of a stemmed corpus for phrase extraction, instead of alignments of a raw one, shows significant improvements in translation quality, especially with small corpora.

Shaalan, K., H. Talhami, and I. Kamel, "A Morphological Generator for the Indexing of Arabic Audio", the Proceedings of The IASTED International Conference on Artificial Intelligence and Soft Computing (ASC), Benidorm, Spain, ACTA Press, pp. 307–312, September, 2005. Abstractmorph_audio.pdf

This paper presents a novel Arabic morphological generator (AMG) for Modern Standard Arabic (MSA) which is designed and implemented using Prolog. The AMG is used to generate inflected forms of words used for the indexing of Arabic audio. These words are also the relevant terms in the Arab authority system (library information retrieval system) used in this study. The AMG generates inflected Arabic words from the root according to pre-specified morphological features that can be extended as needed. The Arabic word is represented as a feature structure which is handled through unification during the morphological generation process. The inflected forms can then be inserted automatically into a speech recognition grammar which is used to identify these words in an audio sequence or utterance.

Shaalan, K., "An Intelligent Computer Assisted Language Learning System for Arabic Learners", Computer Assisted Language Learning, vol. 18, no. 1-2: Routledge, part of the Taylor & Francis Group, pp. 81-109, 2005. Abstractarabic_icall.pdfWebsite

This paper describes the development of an intelligent computer-assisted language learning (ICALL) system for learning Arabic. This system could be used for learning Arabic by students at primary schools or by learners of Arabic as a second or foreign language. It explores the use of Natural Language Processing (NLP) techniques for learning Arabic. The learners are encouraged to produce sentences freely in various situations and contexts and guided to recognize by themselves the erroneous or inappropriate functions of their misused expressions. In this system, we use NLP tools (including morphological analyzer and syntax analyzer) and error analyzer to issue feedback to the learner. Furthermore, we propose a mechanism of correction by the learner which allows the learner to correct the typed sentence independently, and allows the learner to realize that what the error is.

Shaalan, K., "Arabic GramCheck: a grammar checker for Arabic", Software Practice and Experience, vol. 35, no. 7, New York, NY, USA, John Wiley & Sons, Inc., pp. 643–665, 2005. Abstractarabic_gramcheck.pdfWebsite

Arabic is a Semitic language that is rich in its morphology and syntax. The very numerous and complex grammar rules of the language may be confusing for the average user of a word processor. In this paper, we report our attempt at developing a grammar checker program for Modern Standard Arabic, called Arabic GramCheck. Arabic GramCheck can help the average user by checking his/her writing for certain common grammatical errors; it describes the problem for him/her and offers suggestions for improvement. The use of the Arabic grammatical checker can increase productivity and improve the quality of the text for anyone who writes Arabic. Arabic GramCheck has been successfully implemented using SICStus Prolog on an IBM PC. The current implementation covers a well-formed subset of Arabic and focuses on people trying to write in a formal style. Successful tests have been performed using a set of Arabic sentences. It is concluded that the approach is promising by observing the results as compared to the output of a commercially available Arabic grammar checker

Shaalan, K., A. Rafea, A. Abdel-Moneim, and H. Baraka, "Machine Translation of English Noun Phrases into Arabic", The International Journal of Computer Processing of Oriental Languages, vol. 17, no. 2, pp. 121–134, 2004. Abstractmt_nlp.pdfWebsite

The present work reports our attempt in automating the translation of English noun phrase (NP) into Arabic. Translating NP is a very important task toward sentence translation since NPs form the majority of textual content of the scientific and technical documents. The system is implemented in Prolog and the parser is written in DCG formalism. The paper also describes our experience with the developed MT system and reports results of its application on real titles of theses from the computer science domain.

Khaled Shaalan

Professor of Computer Science

Arabic

Tags

Recent Publications