Integrating Rule-Based System with Classification for Arabic Named Entity Recognition

Citation:
Abdallah, S., K. Shaalan, and M. Shoaib, "Integrating Rule-Based System with Classification for Arabic Named Entity Recognition", Computational Linguistics and Intelligent Text Processing, vol. 7181, Berlin, Heidelberg, Springer , pp. 311-322, 2012. copy at www.tinyurl.com/gthlp7g

Abstract:

Named Entity Recognition (NER) is a subtask of information extraction that seeks to recognize and classify named entities in unstructured text into predefined categories such as the names of persons, organizations, locations, etc. The majority of researchers used machine learning, while few researchers used handcrafted rules to solve the NER problem. We focus here on NER for the Arabic language (NERA), an important language with its own distinct challenges. This paper proposes a simple method for integrating machine learning with rule-based systems and implement this proposal using the state-of-the-art rule-based system for NERA. Experimental evaluation shows that our integrated approach increases the F-measure by 8 to 14% when compared to the original (pure) rule based system and the (pure) machine learning approach, and the improvement is statistically significant for different datasets. More importantly, our system outperforms the state-of-the-art machine-learning system in NERA over a benchmark dataset.

Notes:

n/a

Related External Link

PreviewAttachmentSize
hybrid_nera_2012.pdf170.06 KB
Tourism