Natural Language Processing

Showing results in 'Publications'. Show all posts
Shaalan, K., and H. Talhami, "Arabic Error Feedback in an Online Arabic Learning System", Advances in Natural Language Processing, Research in Computing Science (RCS) Journal, vol. 18, pp. 203-212, 2006. Abstracterror_feedback_2006.pdf

Arabic is a Semitic language that is rich in its morphology and syntax. The very numerous and complex grammar rules of the language could be confusing even for Arabic native speakers. Many Arabic intelligent computer assisted language-learning (ICALL) systems have neither deep error analysis nor sophisticated error handling. In this paper, we report an attempt at developing an error analyzer and error handler for Arabic as an important part of the Arabic ICALL system. In this system, the learners are encouraged to construct sentences freely in various contexts and are guided to recognize by themselves the errors or inappropriate usage of their language constructs. We used natural language processing (NLP) tools such as a morphological analyzer and a syntax analyzer for error analysis and to give feedback to the learner.
Furthermore, we propose a mechanism of correction by the learner, which allows the learner to correct the typed sentence independently. This will result in the learner being able to figure out what the error is. Examples of error analysis and error handling will be given and will illustrate how the system works.

Shaalan, K., H. Talhami, and I. Kamel, "A Morphological Generator for the Indexing of Arabic Audio", the Proceedings of The IASTED International Conference on Artificial Intelligence and Soft Computing (ASC), Benidorm, Spain, ACTA Press, pp. 307–312, September, 2005. Abstractmorph_audio.pdf

This paper presents a novel Arabic morphological generator (AMG) for Modern Standard Arabic (MSA) which is designed and implemented using Prolog. The AMG is used to generate inflected forms of words used for the indexing of Arabic audio. These words are also the relevant terms in the Arab authority system (library information retrieval system) used in this study. The AMG generates inflected Arabic words from the root according to pre-specified morphological features that can be extended as needed. The Arabic word is represented as a feature structure which is handled through unification during the morphological generation process. The inflected forms can then be inserted automatically into a speech recognition grammar which is used to identify these words in an audio sequence or utterance.

Shaalan, K., "An Intelligent Computer Assisted Language Learning System for Arabic Learners", Computer Assisted Language Learning, vol. 18, no. 1-2: Routledge, part of the Taylor & Francis Group, pp. 81-109, 2005. Abstractarabic_icall.pdfWebsite

This paper describes the development of an intelligent computer-assisted language learning (ICALL) system for learning Arabic. This system could be used for learning Arabic by students at primary schools or by learners of Arabic as a second or foreign language. It explores the use of Natural Language Processing (NLP) techniques for learning Arabic. The learners are encouraged to produce sentences freely in various situations and contexts and guided to recognize by themselves the erroneous or inappropriate functions of their misused expressions. In this system, we use NLP tools (including morphological analyzer and syntax analyzer) and error analyzer to issue feedback to the learner. Furthermore, we propose a mechanism of correction by the learner which allows the learner to correct the typed sentence independently, and allows the learner to realize that what the error is.

Shaalan, K., "Arabic GramCheck: a grammar checker for Arabic", Software Practice and Experience, vol. 35, no. 7, New York, NY, USA, John Wiley & Sons, Inc., pp. 643–665, 2005. Abstractarabic_gramcheck.pdfWebsite

Arabic is a Semitic language that is rich in its morphology and syntax. The very numerous and complex grammar rules of the language may be confusing for the average user of a word processor. In this paper, we report our attempt at developing a grammar checker program for Modern Standard Arabic, called Arabic GramCheck. Arabic GramCheck can help the average user by checking his/her writing for certain common grammatical errors; it describes the problem for him/her and offers suggestions for improvement. The use of the Arabic grammatical checker can increase productivity and improve the quality of the text for anyone who writes Arabic. Arabic GramCheck has been successfully implemented using SICStus Prolog on an IBM PC. The current implementation covers a well-formed subset of Arabic and focuses on people trying to write in a formal style. Successful tests have been performed using a set of Arabic sentences. It is concluded that the approach is promising by observing the results as compared to the output of a commercially available Arabic grammar checker

El-Beltagy, S., M. Said, and K. Shaalan, "A Framework for Information Extraction, Storage and Retrieval", International Computer Engineering Conference: New Technologies for the Information Society (ICENCO'2004), Cairo, Egypt, Faculty of Engineering, dec, 2004. Abstractaframeworkforinformationextraction_04.pdf

This paper presents a set of tools that were developed in order to facilitate and speed up the process of building information extraction and retrieval systems for documents that exhibit a setof predefined characteristics. Specifically, the work presents a simple framework for extracting information found in publications or documents that are issued in large volumes and which cover similar concepts or issues within a given domain. The paper presents a simple model for defining background knowledge and for using that to automatically augment segments of input documents with metadata in order to assist users in easily locating information within these documents through a structured front end. The model presented makes use of both document structure as well as dynamically acquired background knowledge to achieve its goals.

Shaalan, K., A. Rafea, A. Abdel-Moneim, and H. Baraka, "Machine Translation of English Noun Phrases into Arabic", The International Journal of Computer Processing of Oriental Languages, vol. 17, no. 2, pp. 121–134, 2004. Abstractmt_nlp.pdfWebsite

The present work reports our attempt in automating the translation of English noun phrase (NP) into Arabic. Translating NP is a very important task toward sentence translation since NPs form the majority of textual content of the scientific and technical documents. The system is implemented in Prolog and the parser is written in DCG formalism. The paper also describes our experience with the developed MT system and reports results of its application on real titles of theses from the computer science domain.

Othman, E., K. Shaalan, and A. Rafea, "Towards resolving ambiguity in understanding Arabic sentence", International Conference on Arabic Language Resources and Tools, Cairo, Egypt, NEMLAR, pp. 118–122, sep, 2004. Abstractambiguity_resol_nemlar.pdf

Ambiguity is a major reason why computers do not yet understand natural language. We have made great deal strides towards developing tools for morphological and syntactic analyzers for Arabic in recent years. The absence of diacritics, which represent most vowels, in the written text creates ambiguity which hinders the development of Arabic natural language processing applications. Thus, ambiguity increases the range of possible interpretations of natural language. In this paper, we give a road map of solutions to common ambiguity problems inherent in parsing of Arabic sentence.

Abdel-Monem, A., K. Shaalan, A. Rafea, and H. Baraka, "A Proposed Approach for Generating Arabic from Interlingua in a Multilingual Machine Translation System", Language Engineering conference, Cairo, Egypt, Ain Shams University, pp. 197–206, Oct, 2003. Abstractgen_paper_nlg_conf.pdf

Intelingua (meaning) representation has been successfully used in multilingual machine translation. This paper reports our attempt to generate Arabic sentence from interlingua. The proposed system will be compatible with the NESPOLE consortium. In NESPOLE an Interlingua called interchange format or IF, designed for travel planning is used. Our approach describes how to generate grammatically correct Arabic sentence from Interlingua. It involves two main components a mapper for converting intelingua into syntactic structure (feature-structure) and a generator for generating the target Arabic sentence that represents the intended meaning. A translation example is provided to explain the inner working of the system.

Othman, E., K. Shaalan, and A. Rafea, "A chart parser for analyzing modern standard Arabic sentence", MT Summit IX Workshop on Machine Translation for Semitic Languages: Issues and Approaches, New Orleans, Louisiana, USA, ACL, pp. 37–44, September, 2003. Abstractchart_parser_mt_summit.pdf

The parsing of Arabic sentence is a necessary prerequisite for many natural language processing applications such as machine translation and information retrieval. In this paper we report our attempt to develop an efficient chart parser for Analyzing Modern Standard Arabic (MSA) sentence. From a practical point of view, the parser is able to satisfy syntactic constraints reducing parsing ambiguity. Lexical semantic features are also used to disambiguate the sentence structure. We explain also an Arabic morphological analyzer based on ATN technique. Both the Arabic parser and the Arabic morphological analyzer are implemented in Prolog. The linguistic rules were acquired from a set of sentences from MSA sentence in the Agriculture domain.

Shaalan, K., "Development of Computer Assisted Language Learning System for Arabic Using Natural Language Processing Techniques", Egyptian Informatics Journal, vol. 4, no. 2: Faculty of Comptuers and Information, pp. 131–155, dec, 2003. Abstractarabic_call_fci.pdf

This paper describes the development of a computer-assisted language learning (CALL) system for learning Arabic using natural language processing (NLP) techniques. This system can be used for learning Arabic by students at the primary schools. It provides grammar practice for learners of Arabic. The learners are encouraged to produce sentences freely in various situations and contexts and guided to recognize by themselves the erroneous or inappropriate functions of their misused expressions. In this system, we use NLP tools (including a morphological analyzer and syntax analyzer) and an error analyzer to give the adequate feedback to the learner. Furthermore, we propose the mechanism of correction by the learner which allows the learner to correct the typed sentence by herself/himself, and allows the learner to realize that what error she/he has made.

Tourism