Natural Language Processing

Showing results in 'Publications'. Show all posts

Farouk, A., A. Rafea, and K. Shaalan, "Recognizing Semantic Concepts of Spoken Arabic Utterances using Genetic Technology", The Seventh Conference on Language Engineering, Egyptian Society of Language Engineering (ELSE), Cairo, Egypt, Ain Shams University, dec, 2007. Abstractconcept_spotting.pdf

Genetic algorithms (GA) are a family of computational models inspired by evolution. GA mainly designed to solve optimization problems which can be thought of as searching through a large number of candidates for the best one that can be found. In this paper we present a genetic model to solve the problem of recognizing deep semantic concepts from spoken Arabic utterances. The aim of this algorithm is to automatically generate the grammar that recognizes each concept in the domain of discourse. This grammar is used to extract the observed concepts from the utterance. An experiment has been conducted to measure the performance of our approach. The results were promising and assured the ability of this approach in identifying the concepts of Arabic utterances taken from the travel and tourism domain.

Shaalan, K., and H. Raza, "Person name entity recognition for Arabic", ACL 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources, Prague, Czech Republic, Association for Computational Linguistics, pp. 17–24, 28 June, 2007. Abstractpera_cameraready.pdf

Named entity recognition (NER) is nowadays an important task, which is responsible for the identification of proper names in text and their classification as different types of named entity such as people, locations, and organizations. In this paper, we present our attempt at the recognition and extraction of the most important proper name entity, that is, the person name, for the Arabic language. We developed the system, Person Name Entity Recognition for Arabic (PERA), using a rule-based approach. The system consists of a lexicon, in the form of gazetteer name lists, and a grammar, in the form of regular expressions, which are responsible for recognizing person name entities. The PERA system is evaluated using a corpus that is tagged in a semi-automated way. The system performance results achieved were satisfactory and confirm to the targets set forth for the precision, recall, and f-measure.

Shaalan, K., H. Bakr, and I. Ziedan, "Transferring Egyptian Colloquial Dialect into Modern Standard Arabic", International Conference on Recent Advances in Natural Language Processing (RANLP 2007), Borovets, Bulgaria, John Benjamins, pp. 525–529, sep, 2007. Abstracttransferring_egyptian_colloquial2arabic_.pdf

Arabic is rooted in the Classical or Qur'anical Arabic, but over the centuries, the language has developed to what is now accepted as Modern Standard Arabic (MSA). Arab colloquial dialects are generally only spoken languages, but recently the rate of colloquial written text increases dramatically as a medium of expressing ideas especially across the WWW, usually in the form of blogs and partially colloquial articles. Most of these written colloquial has been in the Egyptian colloquial dialect, which is considered the most widely dialect understood and used throughout the Arab world. We are able to reuse MSA processing tools with colloquial Arabic by transferring colloquial Arabic words into their corresponding MSA words. The advantages of this lexical transfer are to facilitate the communication with colloquial Arabic speakers and restoring it to the standard language in use nowadays. This paper addresses the transfer techniques between colloquial Arabic and MSA, which have not yet been closely studied before. In particular, we present a rule-based lexical transfer approach for converting Egyptian colloquial words into their corresponding MSA words. This process involves morphological analysis and lexical acquisition of colloquial words.

Shaalan, K., H. Talhami, and I. Kamel, "Automatic Morphological Generation for the Indexing of Arabic Speech Recordings", The International Journal of Computer Processing of Oriental Languages (IJCPOL), vol. 20, no. 1, pp. 1–14, 2007. Abstractijcpol2.pdfWebsite

This paper presents a novel Arabic morphological generator (AMG) for Modern Standard Arabic (MSA) which is designed and implemented using Prolog. The AMG is used to generate inflected forms of words used for the indexing of Arabic audio. These words are also the relevant terms in the Arab authority system (library information retrieval system) used in this study. The AMG generates inflected Arabic words from the root according to pre-specified morphological features that can be extended as needed. The Arabic word is represented as a feature structure which is handled through unification during the morphological generation process. The inflected forms can then be inserted automatically into a speech recognition grammar which is used to identify these words in an audio sequence or utterance.

Shaalan, K., and E. Othman, "Issues in the Morphological Analysis of the Arabic Passive Verb", The Seventh Conference on Language Engineering, Egyptian Society of Language Engineering (ELSE), Cairo, Egypt, Ain Shams University, dec, 2007. Abstractweakpasvvrb.pdf

Arabic is a strongly structured and highly derivational language. Arabic morphology and syntax provide the ability to add a large number of affixes to each word which makes combinatorial increment of possible words. In Arabic, passive voice is used as a writing style when: 1) the subject is unknown, 2) the subject is unimportant enough to be mentioned, or 3) the author wants to highlight the object. In this paper, the issues related to the recognition of the Arabic passive verbs which impact the automated understanding of Arabic sentences were addressed. An experiment using the Buckwalter Arabic morphological analyzers, one of the mature Arabic morphological analyzer, were conducted in order to highlight the limitations in the analysis of Arabic passive verbs. Results indicated that there exists a need for handling the problems related to the morphological analysis of passive verbs in order to improve the recognition accuracy of Arabic words.

Farouk, A., A. Rafea, and K. Shaalan, "Analysis of Spoken Arabic into Interlingua Representation using Automatic Classification Approach", 3rd International Computer Engineering Conference: Smart Applications for the Information Society, Cairo, Egypt, Cairo University, dec, 2007. Abstractanalysis_spoken.pdf

Semantic analysis is the system that takes as input a sentence and outputs a list of prominent concepts that characterize the contents of the input sentence, and for each concept, gives the set of attributes that discuss the concept along with their relevancies. This paper presents a system that employs a machine learning approach that automates the semantic analysis process of spoken Arabic into interlingua representation. An experiment has been conducted to measure the performance of our approach. The results were promising and assured the ability of this approach in capturing the semantics of Arabic utterances taken from the travel and tourism domain.

Shaalan, K., A. A. Monem, A. Rafea, and H. Baraka, "Generating Arabic text from Interlingua", the 2nd Workshop on Computational Approaches to Arabic Script-based Languages (CAASL), Stanford, California, USA, Linguistic Society of America Summer Institute, Stanford University, pp. 137–144, jul, 2007. Abstractcaasl2_mt.pdf

In this paper, we describe a grammar-based generation approach for task-oriented interlingua-based spoken dialogue that transforms a shallow semantic interlingua representation called Interchange Format (IF) into Arabic Text that corresponds to the intentions underlying the speakers' utterances. The generation approach is developed primarily within the framework of the NESPOLE! (NEgotiating through SPOken Language in E-commerce) multilingual speech-to-speech MT project. The IF-to-Arabic generator is implemented in SICStus Prolog. We conducted an evaluation experiment using the output from the English analyzer provided by Carnegie Mellon University (CMU). The results of this experiment were promising and assured the ability of the generation approach in generating Arabic text form the interlingua taken from the travel and tourism domain.

Shaalan, K., A. Abdel-Monem, and A. Rafea, "Arabic Morphological Generation from Interlingua: A Rule-based Approach", Intelligent Information Processing III, vol. 228: Springer US, pp. 441-451, 2007. Abstractmorph_gen_mt.pdf

Arabic is a Semitic language that is rich in its morphology. Arabic has very numerous and complex morphological rules. Arabic morphological analysis has gained the focus of Arabic natural language processing research for a long time in order to achieve the automated understanding of Arabic. With the recent technological advances, Arabic natural language generation has received attentions in order to allow for a room for wider applications such as machine translation. For machine translation systems that support a large number of languages, interlingua-based machine translation approaches are particularly attractive. In this paper, we report our attempt at developing a rule-based Arabic morphological generator for task-oriented interlingua-based spoken dialogues. Examples of morphological generation results from the Arabic morphological generator will be given and will illustrate how the system works. Nevertheless, we will discuss the issues related to the morphological generation of Arabic words from an interlingua representation, and present how we have handled them.

Magdy, M., K. Shaalan, and A. Fahmy, "Lexical Error Diagnosis for Second Language Learners of Arabic", The Seventh Conference on Language Engineering, Egyptian Society of Language Engineering (ELSE), Cairo, Egypt, Dec., 2007. Abstractlexical_error_diagnosis_nle.pdf

This paper addresses the development of an automated lexical error diagnosis system, which helps Arabic second language learners to learn well-formed weak verbs. The learners are encouraged to produce input freely in various situations and contexts and guided to recognize by themselves the erroneous or inappropriate functions of their misused expressions. In this system, we successfully used constraint relaxation and edit-distance techniques to provide error-specific diagnosis and feedback to second language learners of Arabic. We demonstrated the capabilities of these techniques to diagnose errors related to the Arabic weak verb which is formed using complex morphological rules. Furthermore, the developed system allows for individualization of the learning process by providing feedback that conforms to the learner’s expertise. Inexperienced learners might require detailed instruction while experienced learners benefit from higher level reminders and explanations.

Ezzat, M., K. Shaalan, and A. Fahmy, "Component Composition Analysis for Arabic Natural Language Processing", the 6th Conference on Language Engineering, Egyptian Society of Language Engineering (ELSE), Cairo, Egypt, Dec., 2006. Abstractcomponentcompositionanlp2.pdf

Building NLP applications from scratch is a difficult task that takes a lot of time and requires acquiring a lot of NLP knowledge. For a rich language like Arabic the difficulties is increased significantly. In this paper, we investigated how to build a tool that helps NLP application developers to build rapid and robust applications. It involves two steps. Firstly, using COM objects technology in building common NLP tools. Secondly, building NLP applications that uses these tools which can access these tools either locally or remotely. We have demonstrated the capabilities of the COM objects in developing NLP tools such as morphological analyzer and used it for building two Arabic NLP applications.

Khaled Shaalan

Professor of Computer Science

Natural Language Processing

Tags

Recent Publications