Natural Language Processing

Shaalan, K., A. Farouk, and A. Rafea, "Towards An Arabic Parser for Modern Scientific Text", the International Conference on Artificial Intelligence for Decision , Control and Automation in Engineering and Industrial Applications (ACIDCA'2000), Monastir, Tunisia, pp. 228–235, mar, 2000. Abstractparser_modern_scientific_text.pdf

The present work reports our attempt in developing an Arabic Parser for modern scientific text. The parser is written in Definite Clause Grammar (DCG) and is targeted to be part of a machine translation system. The developing of the parser was a two-step process. In the first step, we acquired the rules that constitute a grammar for Arabic that gives a precise account of what it is for a sentence to be grammatical. The grammar covers a text from the domain of the agricultural extension documents. The second step was to implement the parser that assigns grammatical structure onto input sentence. Experiment on real extension document was performed. The paper will also describe our experience with the developed parser and results of its application on a real agricultural extension document.

Shaalan, K., "Machine Translation of Arabic Interrogative Sentence into English", the 8th International conference on Artificial Intelligence Applications, Cairo, Egypt, American University in Cairo, pp. 473–483, 2000. Abstractmt_interrogative.pdf

The present work reports our attempt in developing a bi-lingual Machine Translation (MT) tool in the agriculture domain. The work described here is part of an ongoing research to automate the translation of user interfaces of knowledge-based systems. In particular, we describe the translation of Arabic interrogative sentence into English. In Central Laboratory for Agricultural Expert Systems (CLAES), this tool is found to be essential in developing bilingual (Arabic-to-English) expert systems because both the Arabic and the English versions are needed for development and usage purpose. The tool follows the transfer-based MT approach. A major design goal of this tool is that it can be used as a stand-alone tool and can be very well integrated with a general MT system for Arabic sentence. The paper also describes our experience with the developed MT system and reports results of its application on interrogatives from real agricultural expert systems.

Shaalan, K., A. Allam, and A. Gomah, "Towards automatic spell checking for Arabic", Proceedings of the Fourth Conference on Language Engineering, Egyptian Society of Language Engineering (ELSE), Egypt: Faculty of Engineering, pp. 240–247, oct, 2003. Abstractspellcheck.pdf

Arabic's rich morphology (word construction) and complex orthography (writing system) present unique challenges for automatic spell checking. An Arabic checker attempts to find a dictionary word that might be the correct spelling of the misspelled or misrecognized word. In this paper, we report our attempt in developing an Arabic spelling checker program for solving this problem. Our approach is heuristic and involves developing an Arabic morphological analyzer, techniques of spelling checking and spelling correction, and efficient methods of lexicon operations. The developed Arabic spell checker is able to recognize common spelling errors for standard Arabic and Egyptian dialects.

Shaalan, K., "Development of Computer Assisted Language Learning System for Arabic Using Natural Language Processing Techniques", Egyptian Informatics Journal, vol. 4, no. 2: Faculty of Comptuers and Information, pp. 131–155, dec, 2003. Abstractarabic_call_fci.pdf

This paper describes the development of a computer-assisted language learning (CALL) system for learning Arabic using natural language processing (NLP) techniques. This system can be used for learning Arabic by students at the primary schools. It provides grammar practice for learners of Arabic. The learners are encouraged to produce sentences freely in various situations and contexts and guided to recognize by themselves the erroneous or inappropriate functions of their misused expressions. In this system, we use NLP tools (including a morphological analyzer and syntax analyzer) and an error analyzer to give the adequate feedback to the learner. Furthermore, we propose the mechanism of correction by the learner which allows the learner to correct the typed sentence by herself/himself, and allows the learner to realize that what error she/he has made.

Othman, E., K. Shaalan, and A. Rafea, "A chart parser for analyzing modern standard Arabic sentence", MT Summit IX Workshop on Machine Translation for Semitic Languages: Issues and Approaches, New Orleans, Louisiana, USA, ACL, pp. 37–44, September, 2003. Abstractchart_parser_mt_summit.pdf

The parsing of Arabic sentence is a necessary prerequisite for many natural language processing applications such as machine translation and information retrieval. In this paper we report our attempt to develop an efficient chart parser for Analyzing Modern Standard Arabic (MSA) sentence. From a practical point of view, the parser is able to satisfy syntactic constraints reducing parsing ambiguity. Lexical semantic features are also used to disambiguate the sentence structure. We explain also an Arabic morphological analyzer based on ATN technique. Both the Arabic parser and the Arabic morphological analyzer are implemented in Prolog. The linguistic rules were acquired from a set of sentences from MSA sentence in the Agriculture domain.

Abdel-Monem, A., K. Shaalan, A. Rafea, and H. Baraka, "A Proposed Approach for Generating Arabic from Interlingua in a Multilingual Machine Translation System", Language Engineering conference, Cairo, Egypt, Ain Shams University, pp. 197–206, Oct, 2003. Abstractgen_paper_nlg_conf.pdf

Intelingua (meaning) representation has been successfully used in multilingual machine translation. This paper reports our attempt to generate Arabic sentence from interlingua. The proposed system will be compatible with the NESPOLE consortium. In NESPOLE an Interlingua called interchange format or IF, designed for travel planning is used. Our approach describes how to generate grammatically correct Arabic sentence from Interlingua. It involves two main components a mapper for converting intelingua into syntactic structure (feature-structure) and a generator for generating the target Arabic sentence that represents the intended meaning. A translation example is provided to explain the inner working of the system.

Othman, E., K. Shaalan, and A. Rafea, "Towards resolving ambiguity in understanding Arabic sentence", International Conference on Arabic Language Resources and Tools, Cairo, Egypt, NEMLAR, pp. 118–122, sep, 2004. Abstractambiguity_resol_nemlar.pdf

Ambiguity is a major reason why computers do not yet understand natural language. We have made great deal strides towards developing tools for morphological and syntactic analyzers for Arabic in recent years. The absence of diacritics, which represent most vowels, in the written text creates ambiguity which hinders the development of Arabic natural language processing applications. Thus, ambiguity increases the range of possible interpretations of natural language. In this paper, we give a road map of solutions to common ambiguity problems inherent in parsing of Arabic sentence.

Shaalan, K., A. Rafea, A. Abdel-Moneim, and H. Baraka, "Machine Translation of English Noun Phrases into Arabic", The International Journal of Computer Processing of Oriental Languages, vol. 17, no. 2, pp. 121–134, 2004. Abstractmt_nlp.pdfWebsite

The present work reports our attempt in automating the translation of English noun phrase (NP) into Arabic. Translating NP is a very important task toward sentence translation since NPs form the majority of textual content of the scientific and technical documents. The system is implemented in Prolog and the parser is written in DCG formalism. The paper also describes our experience with the developed MT system and reports results of its application on real titles of theses from the computer science domain.

El-Beltagy, S., M. Said, and K. Shaalan, "A Framework for Information Extraction, Storage and Retrieval", International Computer Engineering Conference: New Technologies for the Information Society (ICENCO'2004), Cairo, Egypt, Faculty of Engineering, dec, 2004. Abstractaframeworkforinformationextraction_04.pdf

This paper presents a set of tools that were developed in order to facilitate and speed up the process of building information extraction and retrieval systems for documents that exhibit a setof predefined characteristics. Specifically, the work presents a simple framework for extracting information found in publications or documents that are issued in large volumes and which cover similar concepts or issues within a given domain. The paper presents a simple model for defining background knowledge and for using that to automatically augment segments of input documents with metadata in order to assist users in easily locating information within these documents through a structured front end. The model presented makes use of both document structure as well as dynamically acquired background knowledge to achieve its goals.

Shaalan, K., "Arabic GramCheck: a grammar checker for Arabic", Software Practice and Experience, vol. 35, no. 7, New York, NY, USA, John Wiley & Sons, Inc., pp. 643–665, 2005. Abstractarabic_gramcheck.pdfWebsite

Arabic is a Semitic language that is rich in its morphology and syntax. The very numerous and complex grammar rules of the language may be confusing for the average user of a word processor. In this paper, we report our attempt at developing a grammar checker program for Modern Standard Arabic, called Arabic GramCheck. Arabic GramCheck can help the average user by checking his/her writing for certain common grammatical errors; it describes the problem for him/her and offers suggestions for improvement. The use of the Arabic grammatical checker can increase productivity and improve the quality of the text for anyone who writes Arabic. Arabic GramCheck has been successfully implemented using SICStus Prolog on an IBM PC. The current implementation covers a well-formed subset of Arabic and focuses on people trying to write in a formal style. Successful tests have been performed using a set of Arabic sentences. It is concluded that the approach is promising by observing the results as compared to the output of a commercially available Arabic grammar checker