Publications

Export 116 results:

]

2012

Abdallah, S., K. Shaalan, and M. Shoaib, "Integrating Rule-Based System with Classification for Arabic Named Entity Recognition", Computational Linguistics and Intelligent Text Processing, vol. 7181, Berlin, Heidelberg, Springer , pp. 311-322, 2012. Abstracthybrid_nera_2012.pdf

Named Entity Recognition (NER) is a subtask of information extraction that seeks to recognize and classify named entities in unstructured text into predefined categories such as the names of persons, organizations, locations, etc. The majority of researchers used machine learning, while few researchers used handcrafted rules to solve the NER problem. We focus here on NER for the Arabic language (NERA), an important language with its own distinct challenges. This paper proposes a simple method for integrating machine learning with rule-based systems and implement this proposal using the state-of-the-art rule-based system for NERA. Experimental evaluation shows that our integrated approach increases the F-measure by 8 to 14% when compared to the original (pure) rule based system and the (pure) machine learning approach, and the improvement is statistically significant for different datasets. More importantly, our system outperforms the state-of-the-art machine-learning system in NERA over a benchmark dataset.

Shaalan, K., S. Al-Sheikh, and F. Oroumchian, "Query Expansion Based-on Similarity of Terms for Improving Arabic Information Retrieval", Intelligent Information Processing VI, vol. 385, Berlin Heidelberg, Springer, pp. 167-176, 2012. Abstractquery_arabic_iip_ifip.pdf

This research suggests a method for query expansion on Arabic Information Retrieval using Expectation Maximization (EM). We employ the EM algorithm in the process of selecting relevant terms for expanding the query and weeding out the non-related terms. We tested our algorithm on INFILE test collection of CLLEF2009, and the experiments show that query expansion that considers similarity of terms both improves precision and retrieves more relevant documents. The main finding of this research is that we can increase the recall while keeping the precision at the same level by this method.

Shaalan, K., A. Hendam, and A. Rafea, "Rapid development and deployment of bi-directional expert systems using machine translation technology", Expert Systems with Applications, vol. 39, issue 1, no. 1, pp. 1375 - 1380, 2012. Abstractks_mt_eswa_2012.pdfWebsite

The present work reports our attempt in developing an English–Arabic bi-directional machine translation tool in the agriculture domain. It aims to achieve automated translation of agricultural expert systems. In particular, we describe the translation of domain knowledge base, including, prompts, responses, explanation text, and advices. In the Central Laboratory for Agricultural Expert Systems (CLAES) where many successful agricultural expert systems have been developed, this tool is found to be essential in developing bi-directional (English–Arabic) expert systems because both English and Arabic versions are needed for development, deployment, and usage purpose. The tool also helps knowledge engineers in overcoming the language barrier by acquiring knowledge from either English or Arabic speaking domain experts. This paper discusses our experience with the developed machine translation tool and reports on results of its application on real agricultural expert systems.

2011

AlNuaimi, M., K. Shaalan, M. Alnuaimi, and K. Alnuaimi, "Barriers to Electronic Government Citizens' Adoption: A Case of Municipal Sector in the Emirate of Abu Dhabi", The International Conference on Developments in eSystems Engineering (DeSE’11), Dubai, United Arab Emirates, 8 December, 2011. Abstracte-gov.pdf

The advances in information and communication technologies have changed the way the governments interact with their citizens. The development of the internet and its vast capabilities played a vital role in this change. According the UN report 2010, the UN e-Government development index of the world rated United Arab Emirates in 49 which comes late 17 places than 2008 ranking. During 2008 the UAE ranked in 32. Therefore, it can be seen that UAE is having some troubles in citizens’ adoption of the e-Government. In this paper, we will examine the key berries to the use of the e-Government services by citizens through testing the effect of 11 independent variables on the citizens’ use of e-Government in municipal of Emirate of Abu Dhabi.

Shaalan, K., M. Al-Mansoori, H. Tawfik, and A. - H. Mohamed, "Evaluation of an E-Learning Diabetes Awareness Prototype", The International Conference on Developments in eSystems Engineering (DeSE’11), Dubai, United Arab Emirates, 7 December, 2011. Abstractevaluationofane-learningdiabetesawarenessprototype.pdf

E-Learning has been increasingly used as a medium for promoting health awareness with successful outcomes. This paper reports on the design and evaluation of prototype that uses the potential of E-Learning to help children with diabetes. The pedagogical principle of the proposed prototype is raising the diabetes awareness among young children. The prototype evaluation results indicated that computer-based learning can generate positive learning and motivational attitudes in children. Children who were educated through the prototype were able to complete awareness tasks faster than those who were educated using traditional methods. The prototype has been found makes learning more fun, and allows children to learn at their own pace.

Al-Mansoori, M., K. Shaalan, and H. Tawfik, "Using E-learning for helping children with diabetes", The 7th International Conference on Innovations in Information Technology (Innovations’11), Abu Dhabi, United Arab Emirates, pp. 145-149, 26 April, 2011. Abstracthi.pdf

Diabetes is a common and costly condition disease that is associated with significant morbidity and mortality. Recent studies have shown remarkable increases in diabetes during the last decade. This has attracted many researchers and doctors to investigate e-learning technologies as a way of assisting people
with diabetes. However very little work exist that focus on educating children to adopt healthy lifestyle. As a result, this research work aims to create awareness of diabetes among children, and thereby, ultimately contribute to reducing the growing rate of diabetes. This paper presents an investigation into E-Learning systems and how it can help people with diabetes, especially when it comes to children who are largely unaware and poorly informed about the menace of the disease. This research addresses children’ needs expectations, and proposes a design of an E-Learning prototype that can raise their awareness and knowledge in order to help reduce the effects of this disease on children.

Shaalan, K., and M. Magdy, "Adaptive Feedback Message Generation for Second Language Learners of Arabic", Recent Advances in Natural Language Processing (RANLP - 2011),, Hissar, Bulgaria, 12 September , 2011. Abstractr11-1110.pdf

This paper addresses issues related to generating feedback messages to errors related to Arabic verbs made by second language learners (SLLs). The proposed approach allows for individualization. When a SLL of Arabic writes a wrong verb, it performs analysis of the input and distinguishes between different lexical error types. The proposed system issues the intelligent feedback that conforms to the learner’s proficiency level for each class of error. The proposed system has been effectively evaluated using real test data and achieved satisfactory results.

2010

Shaalan, K. F., M. Magdy, and A. Fahmy, "Morphological Analysis of Ill-Formed Arabic Verbs in Intelligent Language Tutoring Framework", The 23rd International Florida Artificial Intelligence Research Society Conference (FLAIRS-23), Florida, USA, FLAIRS, pp. 277–282, may, 2010. Abstractflairs-23-1755.pdf

Arabic is a language of rich and complex morphology. The nature and peculiarity of Arabic make its morphological and phonological rules confusing for second language learners (SLLs). The conjugation of Arabic verbs is central to the formulation of an Arabic sentence because of its richness of form and meaning. In this paper, we address issues related to the morphological analysis of ill-formed Arabic verbs in order to identify the source of errors and provide an in-formative feedback to SLLs of Arabic. The edit distance and constraint relaxation techniques are used to demonstrate the capability of the proposed approach in generating all possible analyses of erroneous Arabic verbs written by SLLs. Filtering mechanisms are applied to exclude the irrelevant constructions and determine the target stem. A morphological analyzer has been developed and effectively evaluated using real test data. It achieved satisfactory results in terms of the recall rate.

Anya, O., H. Tawfik, S. Amin, A. Nagar, and K. Shaalan, "Context-Aware Knowledge Modelling for Decision Support in E-Health", In the Proceedings of the International Joint Conference on Neural Networks (IJCNN 2010), Barcelona, Spain, IEEE, 21 July, 2010. Abstractanya_tawfik_nagar_amin_shaalan_wcci2010.pdf

In the context of e-health, professionals and healthcare service providers in various organisational and geographical locations are to work together, using information and communication systems, for the purpose of providing better patient-centred and technology-supported healthcare services at any time and from anywhere. However, various organisations and geographies have varying contexts of work, which are dependent on their local work culture, available expertise,available technologies, people's perspectives and attitudes and organisational and regional agendas. As a result, there is the need to ensure that a suggestion – information and knowledge –provided by a professional to support decision making in a different, and often distant, organisation and geography takes into cognizance the context of the local work setting in which the suggestion is to be used. To meet this challenge, we propose a framework for context-aware knowledge modelling in e-health,which we refer to as Context Morph. Context Morph combines the commonKADS knowledge modelling methodology with the concept of activity landscape and context-aware modelling techniques in order to morph, i.e. enrich and optimise, a knowledge resource to support decision making across various contexts of work. The goal is to integrate explicit information and tacit expert experiences across various work domains into a knowledge resource adequate for supporting the operational context of the work setting in which it is to be used.

Shaalan, K., R. Aref, and A. Fahmy, "An Approach for Analyzing and Correcting Spelling Errors for Non-native Arabic learners", The 7th International Conference on Informatics and Systems (INFOS2010), Cairo, Egypt, Faculty of Comptuers and Information, 2010. Abstractnlp_09_p053-059.pdf

Spell checkers are widely used in many software products for identifying errors in users' writings. However, they are not designed to address spelling errors made by non-native learners of a language. As a matter of fact, spelling errors made by non-native learners are more than just misspellings. Non-native learners' errors require special handling in terms of detection and correction, especially when it comes to morphologically rich languages such as Arabic, which have few related resources. In this paper, we address common error patterns made by non-native Arabic learners and suggest a two-layer spell-checking approach, including spelling error detection and correction. The proposed error detection mechanism is applied on top of Buckwalter's Arabic morphological analyzer in order to demonstrate the capability of our approach in detecting possible spelling errors. The correction mechanism adopts a rule-based edit distance algorithm. Rules are designed in accordance with common spelling error patterns made by Arabic learners. Error correction uses a multiple filtering mechanism to propose final corrections. The approach utilizes semantic information given in exercising questions in order to achieve highly accurate detection and correction of spelling errors made by non-native Arabic learners. Finally, the proposed approach was evaluated using real test data and promising results were achieved.

Shaalan, K., A. Hendam, and A. Rafea, "An English-Arabic Bi-directional Machine Translation Tool in the Agriculture Domain", Intelligent Information Processing V, vol. 340, Berlin, Heidelberg, Springer Boston, pp. 281–290, 2010. Abstractbi_direct_a_e_mt.pdf

The present work reports our attempt in developing an English-Arabic bi-directional Machine Translation (MT) tool in the agriculture domain. It aims to achieve automated translation of expert systems. In particular, we describe the translation of knowledge base, including, prompts, responses, explanation text, and advices. In the central laboratory for agricultural expert systems, this tool is found to be essential in developing bi-directional (English-Arabic) expert systems because both English and Arabic versions are needed for development, deployment, and usage purpose. The tool follows the rule-based transfer MT approach. A major design goal of this tool is that it can be used as a stand-alone tool and can be very well integrated with a general (English-Arabic) MT system for Arabic scientific text. The paper also discusses our experience with the developed MT system and reports on results of its application on real agricultural expert systems.

Damankesh, A., F. Oroumchian, and K. F. Shaalan, "Multilingual Information Filtering by Human Plausible Reasoning", Multilingual Information Access Evaluation I, Text Retrieval Experiments, 10th Workshop of the Cross-Language Evaluation Forum (CLEF 2009), vol. 6241, Berlin, Heidelberg, Springer-Verlag , pp. 366–373, 2010. Abstractmulti_lingu_filter.pdf

The theory of Human Plausible Reasoning (HPR) is an attempt by Collins and Michalski to explain how people answer questions when they are uncertain. The theory consists of a set of patterns and a set of inferences which could be applied on those patterns. This paper, investigates the application of HPR theory to the domain of cross language filtering. Our approach combines Natural Language Processing with HPR. The documents and topics are partially represented by automatically extracted concepts, logical terms and logical statements in a language neutral knowledge base. Reasoning provides the evidence of relevance. We have conducted hundreds of experiments especially with the depth of the reasoning, evidence combination and topic selection methods. The results show that HPR contributes to the overall performance by introducing new terms for topics. Also the number of inference paths from a document to a topic is an indication of its relevance.

Shaalan, K., "Nizar Y. Habash, Introduction to Arabic natural language processing (Synthesis lectures on human language technologies)", Machine Translation, vol. 24, no. 3-4: Springer Netherlands, pp. 285-289, 2010. Abstractintro_arabic_nlp.pdfWebsite

n/a

Shaalan, K., "Rule-based Approach in Arabic Natural Language Processing", the International Journal on Information and Communication Technologies (IJICT), vol. 3, no. 3: Serial Publications, pp. 11–19, 2010. Abstractrules_based_nlp.pdfWebsite

The rule-based approach has successfully been used in developing many natural language processing systems. Systems that use rule-based transformations are based on a core of solid linguistic knowledge. The linguistic knowledge acquired for one natural language processing system may be reused to build knowledge required for a similar task in another system. The advantage of the rule-based approach over the corpus-based approach is clear for: 1) less-resourced languages, for which large corpora, possibly parallel or bilingual, with representative structures and entities are neither available nor easily affordable, and 2) for morphologically rich languages, which even with the availability of corpora suffer from data sparseness. These have motivated many researchers to fully or partially follow the rule-based approach in developing their Arabic natural processing tools and systems. In this paper we address our successful efforts that involved rule-based approach for different Arabic natural language processing tasks.

Shaalan, K., M. Magdy, and D. Samy, "Towards Resolving Morphological Ambiguity in Arabic Intelligent Language Tutoring Framework", The seventh international conference on Language Resources and Evaluation (LREC'10) Workshop on Supporting eLearning with Language Resources and Semantic Data, Valletta, Malta, LREC, 2010. Abstractlrec2010elearing_workshop.pdf

Ambiguity is a major issue in any NLP application that occurs when multiple interpretations of the same language phenomenon are produced. Given the complexity of the Arabic morphological system, it is difficult to determine what the intended meaning of the writer is. Moreover, Intelligent Language Tutoring Systems which need to analyze erroneous learner answers, generally, introduce techniques, such as constraints relaxation, that would produce more interpretations than systems designed for processing well-formed input. This paper addresses issues related to the morphological disambiguation of corrected interpretations of erroneous Arabic verbs that were written by beginner to intermediate Second Language Learners. The morphological disambiguation has been developed and effectively evaluated using real test data. It achieved satisfactory results in terms of the recall rate.

2009

Hossny, A., K. Shaalan, and A. Fahmy, "Machine translation model using inductive logic programming", the 2009 IEEE International Conference on Natural Language Processing and Knowledge Engineering (IEEE NLP-KE’09), Dalian, China, pp. 1–8, sep, 2009. Abstract101.pdf

Rule based machine translation systems face different challenges in building the translation model in a form of transfer rules. Some of these problems require enormous human effort to state rules and their consistency. This is where different human linguists make different rules for the same sentence. A human linguist states rules to be understood by human rather than machines. The proposed translation model (from Arabic to English) tackles the mentioned problem of building translation model. This model employs Inductive Logic Programming (ILP) to learn the language model from a set of example pairs acquired from parallel corpora and represent the language model in a rule-based format that maps Arabic sentence pattern to English sentence pattern. By testing the model on a small set of data, it generated translation rules with logarithmic growing rate and with word error rate 11%.

Shaalan, K., H. Abo-Bakr, and I. Ziedan, "A hybrid approach for building Arabic diacritizer", the 12th European Chapter of the Association for Computational Linguistics (EACL 2009) Workshop on Computational Approaches to Semitic Languages, Association for Computational Linguistics, Athens, Greece, Association for Computational Linguistics, pp. 27–35, mar, 2009. Abstracthybridapproachforbuildingarabicdiacritizer_eacl2009.pdf

Modern standard Arabic is usually written without diacritics. This makes it difficult for performing Arabic text processing. Diacritization helps clarify the meaning of words and disambiguate any vague spellings or pronunciations, as some Arabic words are spelled the same but differ in meaning. In this paper, we address the issue of adding diacritics to undiacritized Arabic text using a hybrid approach. The approach requires an Arabic lexicon and large corpus of fully diacritized text for training purposes in order to detect diacritics. Case-Ending is treated as a separate post processing task using syntactic information. The hybrid approach relies on lexicon retrieval, bigram, and SVM-statistical prioritized techniques. We present results of an evaluation of the proposed diacritization approach and discuss various modifications for improving the performance of this approach.

Damankesh, A., J. Singh, F. Jahedpari, K. Shaalan, and F. Oroumchian, "Using Human Plausible Reasoning as a Framework for Multilingual Information Filtering", CLEF 2009 Workshop, in conjunction with ECDL2009, 13th European Conference on Digital Libraries, Corfu, Greece, 30 September , 2009. Abstractdamankesh-paper-clef2009.pdf

In this paper the application of the theory of Human Plausible Reasoning (HPR) has been investigated in the domain of filtering and cross language information retrieval. The theory of Human Plausible Reasoning first has been introduced by Collins and Michalski on early 1990s; it has been applied to IR since 1995. This work is an extension to those experiments which focuses on building a framework for cross language information retrieval. The system built in these experiments utilizes plausible inferences to infer new, unknown knowledge from existing knowledge to retrieve not only documents which are indexed by the query terms but also those which are plausibly relevant.

Abo-Bakr, H., K. Shaalan, and I. Ziedan, "A Statistical Method for Detecting the Arabic Empty Category", The Second International Conference on Arabic Language Resources and Tools, Cairo, Egypt, The MEDAR Consortium, 22 April, 2009. Abstract32_finalsubmission_pdf.pdf

In this paper we introduce a statistical approach for detecting the position of Empty-Category presented in Arabic Treebank. This can help in detecting the position of the elliptic personnel pronoun and overcoming, for some cases, the identification of dropped words within a sentence given the free word order nature of Arabic. The proposed approach requires a large corpus. The training for detecting the Empty-Category for each token is based on its Part Of Speech (POS), Base Phrase (BP)-chunk position, and the position of the token in the sentence. The Empty-Category detection is efficiently obtained using the Support Vector Machines (SVM) technique. We conducted an evaluation of the proposed diacritization algorithm, discussed the obtained results, and proposed various modifications for improving the performance of this approach.

Farghaly, A., and K. Shaalan, "Arabic Natural Language Processing: Challenges and Solutions", ACM Transactions on Asian Language Information Processing (TALIP), vol. 8, no. 4, New York, NY, USA, ACM, pp. 1-22, 2009. Abstractfarghaly_shaalan_talip_anlp_pdf.pdfWebsite

The Arabic language presents researchers and developers of natural language processing (NLP) applications for Arabic text and speech with serious challenges. The purpose of this article is to describe some of these challenges and to present some solutions that would guide current and future practitioners in the field of Arabic natural language processing (ANLP). We begin with general features of the Arabic language in Sections 1, 2, and 3 and then we move to more specific properties of the language in the rest of the article. In Section 1 of this article we highlight the significance of the Arabic language today and describe its general properties. Section 2 presents the feature of Arabic Diglossia showing how the sociolinguistic aspects of the Arabic language differ from other languages. The stability of Arabic Diglossia and its implications for ANLP applications are discussed and ways to deal with this problematic property are proposed. Section 3 deals with the properties of the Arabic script and the explosion of ambiguity that results from the absence of short vowel representations and overt case markers in contemporary Arabic texts. We present in Section 4 specific features of the Arabic language such as the nonconcatenative property of Arabic morphology, Arabic as an agglutinative language, Arabic as a pro-drop language, and the challenge these properties pose to ANLP. We also present solutions that have already been adopted by some pioneering researchers in the field. In Section 5 we point out to the lack of formal and explicit grammars of Modern Standard Arabic which impedes the progress of more advanced ANLP systems. In Section 6 we draw our conclusion.

Khaled Shaalan

Professor of Computer Science

Publications

Tags

Recent Publications