A Pipeline Arabic Named Entity Recognition Using a Hybrid Approach

Oudah, M., and K. Shaalan, "A Pipeline Arabic Named Entity Recognition Using a Hybrid Approach", The International Conference on Computational Linguistics (COLING), Mumbai, India, 14 December, 2012. copy at www.tinyurl.com/zcdlqk7

Date Presented:

14 December


Most Arabic Named Entity Recognition (NER) systems have been developed using either of two approaches: a rule-based or Machine Learning (ML) based approach, with their strengths and weaknesses. In this paper, the problem of Arabic NER is tackled through integrating the two approaches together in a pipelined process to create a hybrid system with the aim of enhancing the overall performance of NER tasks. The proposed system is capable of recognizing 11 different types of named entities (NEs): Person, Location, Organization, Date, Time, Price, Measurement, Percent, Phone Number, ISBN and File Name. Extensive experiments are conducted using three different ML classifiers to evaluate the overall performance of the hybrid system. The empirical results indicate that the hybrid approach outperforms both the rule-based and the ML-based approaches. Moreover, our system outperforms the state-of-the-art of Arabic NER in terms of accuracy when applied to ANERcorp dataset, with f-measures 94.4% for Person, 90.1% for Location, and 88.2% for Organization.

Related External Link

pipeline_ner.pdf255.08 KB