Information Retrieval

Chalabi, H. A., S. Ray, and K. Shaalan, "Question Classification for Arabic Question Answering Systems", The International Conference on Information and Communication Technology Research (ICTRC), UAE, 17 May, 2015. Abstractqueastion_classification-_in_final_proceedings.pdf

Due to very fast growth of information in the last few decades, getting precise information in real time is becoming increasingly difficult. Search engines such as Google and Yahoo are helping in finding the information but the information provided by them are in the form of documents which consumes a lot of time of the user. Question Answering Systems have emerged as a good alternative to search engines where they produce the desired information in a very precise way in the real time. This saves a lot of time for the user. There has been a lot of research in the field of English and some European language Question Answering Systems. However, Arabic Question Answering Systems could not match the pace due to some inherent difficulties with the language itself as well as due to lack of tools available to assist the researchers. Question classification is a very important module of Question Answering Systems. In this paper, we are presenting a method to accurately classify the Arabic questions in order to retrieve precise answers. The proposed method gives promising results.

Al-Zoghby, A., and K. Shaalan, "Semantic Search for Arabic", International Florida Artificial Intelligence Research Society Conference (FLAIRS), USA, 19 May, 2015. Abstractsemantic_search_arabic.pdf

There is a growing interest in Arabic web content worldwide due to its importance for culture, religion, and economics. In the literature, researches that address searching Arabic web content using semantic web technology are still insufficient compared to Arabic’s actual importance as a language. In this research, we propose an Arabic semantic search approach that is applied on Arabic web content. This approach is based on the Vector Space Model (VSM). It uses the Universal WordNet ontology to build a rich concept-space index instead of the traditional term-space index. The proposed index is used for enhancing the capability of the semantic-based VSM. Moreover, the approach introduces a new incidence measurement to calculate the semantic significance degree of the document's concepts which is more suitable than the traditional term frequency measure. Furthermore, a novel method for calculating the semantic weight of the concept is introduced in order to determine the semantic similarity of two vectors. As a proof of concept, a system is applied on a full dump of the Arabic Wikipedia. The experimental results in terms of Precision, Recall and F-measure have showed improvement in performance from 77%, 56%, and 63% to 71%, 96%, and 81%, respectively.

Al-Zoghby, A., and K. Shaalan, "Conceptual Search for Arabic Web Content", Lecture Notes in Computer Science, Germany, Springer, 2015. Abstractarabic_conceptual_search.pdf

The main reason of adopting Semantic Web technology in information retrieval is to improve the retrieval performance. A semantic search-based system is characterized by locating web contents that are semantically related to the query's concepts rather than relying on the exact matching with keywords in queries. There is a growing interest in Arabic web content worldwide due to its importance for culture, political aspect, strategic location, and economics. Arabic is linguistically rich across all levels which makes the effective search of Arabic text a challenge. In the literature, researches that address searching the Arabic web content using semantic web technology are still insufficient compared to Arabic’s actual importance as a language. In this research, we propose an Arabic semantic search approach that is applied on Arabic web content. This approach is based on the Vector Space Model (VSM), which has proved its success and many researches have been focused on improving its traditional version. Our approach uses the Universal WordNet to build a rich concept-space index instead of the traditional term-space index. This index is used for enabling a Semantic VSM capabilities. Moreover, we introduced a new incidence measurement to calculate the semantic significance degree of the concept in a document which fits with our model rather than the traditional term frequency. Furthermore, for the purpose of determining the semantic similarity of two vectors, we introduced a new formula for calculating the semantic weight of the concept. Because documents are indexed by their topics and classified semantically, we were able to search Arabic documents effectively. The experimental results in terms of Precision, Recall and F-measure have showed improvement in performance from 77%, 56%, and 63% to 71%, 96%, and 81%, respectively.

Shaalan, K., S. Al-Sheikh, and F. Oroumchian, "Query Expansion Based-on Similarity of Terms for Improving Arabic Information Retrieval", Intelligent Information Processing VI, vol. 385, Berlin Heidelberg, Springer, pp. 167-176, 2012. Abstractquery_arabic_iip_ifip.pdf

This research suggests a method for query expansion on Arabic Information Retrieval using Expectation Maximization (EM). We employ the EM algorithm in the process of selecting relevant terms for expanding the query and weeding out the non-related terms. We tested our algorithm on INFILE test collection of CLLEF2009, and the experiments show that query expansion that considers similarity of terms both improves precision and retrieves more relevant documents. The main finding of this research is that we can increase the recall while keeping the precision at the same level by this method.

Damankesh, A., J. Singh, F. Jahedpari, K. Shaalan, and F. Oroumchian, "Using Human Plausible Reasoning as a Framework for Multilingual Information Filtering", CLEF 2009 Workshop, in conjunction with ECDL2009, 13th European Conference on Digital Libraries, Corfu, Greece, 30 September , 2009. Abstractdamankesh-paper-clef2009.pdf

In this paper the application of the theory of Human Plausible Reasoning (HPR) has been investigated in the domain of filtering and cross language information retrieval. The theory of Human Plausible Reasoning first has been introduced by Collins and Michalski on early 1990s; it has been applied to IR since 1995. This work is an extension to those experiments which focuses on building a framework for cross language information retrieval. The system built in these experiments utilizes plausible inferences to infer new, unknown knowledge from existing knowledge to retrieve not only documents which are indexed by the query terms but also those which are plausibly relevant.

Damankesh, A., F. Oroumchian, and K. F. Shaalan, "Multilingual Information Filtering by Human Plausible Reasoning", Multilingual Information Access Evaluation I, Text Retrieval Experiments, 10th Workshop of the Cross-Language Evaluation Forum (CLEF 2009), vol. 6241, Berlin, Heidelberg, Springer-Verlag , pp. 366–373, 2010. Abstractmulti_lingu_filter.pdf

The theory of Human Plausible Reasoning (HPR) is an attempt by Collins and Michalski to explain how people answer questions when they are uncertain. The theory consists of a set of patterns and a set of inferences which could be applied on those patterns. This paper, investigates the application of HPR theory to the domain of cross language filtering. Our approach combines Natural Language Processing with HPR. The documents and topics are partially represented by automatically extracted concepts, logical terms and logical statements in a language neutral knowledge base. Reasoning provides the evidence of relevance. We have conducted hundreds of experiments especially with the depth of the reasoning, evidence combination and topic selection methods. The results show that HPR contributes to the overall performance by introducing new terms for topics. Also the number of inference paths from a document to a topic is an indication of its relevance.

Khaled Shaalan

Professor of Computer Science

Information Retrieval