Natural Language Processing

Shaalan, K. F., M. Magdy, and A. Fahmy, "Morphological Analysis of Ill-Formed Arabic Verbs in Intelligent Language Tutoring Framework", The 23rd International Florida Artificial Intelligence Research Society Conference (FLAIRS-23), Florida, USA, FLAIRS, pp. 277–282, may, 2010. Abstractflairs-23-1755.pdf

Arabic is a language of rich and complex morphology. The nature and peculiarity of Arabic make its morphological and phonological rules confusing for second language learners (SLLs). The conjugation of Arabic verbs is central to the formulation of an Arabic sentence because of its richness of form and meaning. In this paper, we address issues related to the morphological analysis of ill-formed Arabic verbs in order to identify the source of errors and provide an in-formative feedback to SLLs of Arabic. The edit distance and constraint relaxation techniques are used to demonstrate the capability of the proposed approach in generating all possible analyses of erroneous Arabic verbs written by SLLs. Filtering mechanisms are applied to exclude the irrelevant constructions and determine the target stem. A morphological analyzer has been developed and effectively evaluated using real test data. It achieved satisfactory results in terms of the recall rate.

Shaalan, K., "Rule-based Approach in Arabic Natural Language Processing", the International Journal on Information and Communication Technologies (IJICT), vol. 3, no. 3: Serial Publications, pp. 11–19, 2010. Abstractrules_based_nlp.pdfWebsite

The rule-based approach has successfully been used in developing many natural language processing systems. Systems that use rule-based transformations are based on a core of solid linguistic knowledge. The linguistic knowledge acquired for one natural language processing system may be reused to build knowledge required for a similar task in another system. The advantage of the rule-based approach over the corpus-based approach is clear for: 1) less-resourced languages, for which large corpora, possibly parallel or bilingual, with representative structures and entities are neither available nor easily affordable, and 2) for morphologically rich languages, which even with the availability of corpora suffer from data sparseness. These have motivated many researchers to fully or partially follow the rule-based approach in developing their Arabic natural processing tools and systems. In this paper we address our successful efforts that involved rule-based approach for different Arabic natural language processing tasks.

Shaalan, K., M. Magdy, and D. Samy, "Towards Resolving Morphological Ambiguity in Arabic Intelligent Language Tutoring Framework", The seventh international conference on Language Resources and Evaluation (LREC'10) Workshop on Supporting eLearning with Language Resources and Semantic Data, Valletta, Malta, LREC, 2010. Abstractlrec2010elearing_workshop.pdf

Ambiguity is a major issue in any NLP application that occurs when multiple interpretations of the same language phenomenon are produced. Given the complexity of the Arabic morphological system, it is difficult to determine what the intended meaning of the writer is. Moreover, Intelligent Language Tutoring Systems which need to analyze erroneous learner answers, generally, introduce techniques, such as constraints relaxation, that would produce more interpretations than systems designed for processing well-formed input. This paper addresses issues related to the morphological disambiguation of corrected interpretations of erroneous Arabic verbs that were written by beginner to intermediate Second Language Learners. The morphological disambiguation has been developed and effectively evaluated using real test data. It achieved satisfactory results in terms of the recall rate.

Shaalan, K., A. Hendam, and A. Rafea, "An English-Arabic Bi-directional Machine Translation Tool in the Agriculture Domain", Intelligent Information Processing V, vol. 340, Berlin, Heidelberg, Springer Boston, pp. 281–290, 2010. Abstractbi_direct_a_e_mt.pdf

The present work reports our attempt in developing an English-Arabic bi-directional Machine Translation (MT) tool in the agriculture domain. It aims to achieve automated translation of expert systems. In particular, we describe the translation of knowledge base, including, prompts, responses, explanation text, and advices. In the central laboratory for agricultural expert systems, this tool is found to be essential in developing bi-directional (English-Arabic) expert systems because both English and Arabic versions are needed for development, deployment, and usage purpose. The tool follows the rule-based transfer MT approach. A major design goal of this tool is that it can be used as a stand-alone tool and can be very well integrated with a general (English-Arabic) MT system for Arabic scientific text. The paper also discusses our experience with the developed MT system and reports on results of its application on real agricultural expert systems.

Shaalan, K., R. Aref, and A. Fahmy, "An Approach for Analyzing and Correcting Spelling Errors for Non-native Arabic learners", The 7th International Conference on Informatics and Systems (INFOS2010), Cairo, Egypt, Faculty of Comptuers and Information, 2010. Abstractnlp_09_p053-059.pdf

Spell checkers are widely used in many software products for identifying errors in users' writings. However, they are not designed to address spelling errors made by non-native learners of a language. As a matter of fact, spelling errors made by non-native learners are more than just misspellings. Non-native learners' errors require special handling in terms of detection and correction, especially when it comes to morphologically rich languages such as Arabic, which have few related resources. In this paper, we address common error patterns made by non-native Arabic learners and suggest a two-layer spell-checking approach, including spelling error detection and correction. The proposed error detection mechanism is applied on top of Buckwalter's Arabic morphological analyzer in order to demonstrate the capability of our approach in detecting possible spelling errors. The correction mechanism adopts a rule-based edit distance algorithm. Rules are designed in accordance with common spelling error patterns made by Arabic learners. Error correction uses a multiple filtering mechanism to propose final corrections. The approach utilizes semantic information given in exercising questions in order to achieve highly accurate detection and correction of spelling errors made by non-native Arabic learners. Finally, the proposed approach was evaluated using real test data and promising results were achieved.

Khaled Shaalan

Professor of Computer Science

Natural Language Processing