A Statistical Method for Detecting the Arabic Empty Category

Abo-Bakr, Hitham; Khaled Shaalan; Ibrahim Ziedan

A Statistical Method for Detecting the Arabic Empty Category

Citation:: Abo-Bakr, H., K. Shaalan, and I. Ziedan, "A Statistical Method for Detecting the Arabic Empty Category", The Second International Conference on Arabic Language Resources and Tools, Cairo, Egypt, The MEDAR Consortium, 22 April, 2009. copy at www.tinyurl.com/m6bn2jf

Date Presented:

22 April

Abstract:

In this paper we introduce a statistical approach for detecting the position of Empty-Category presented in Arabic Treebank. This can help in detecting the position of the elliptic personnel pronoun and overcoming, for some cases, the identification of dropped words within a sentence given the free word order nature of Arabic. The proposed approach requires a large corpus. The training for detecting the Empty-Category for each token is based on its Part Of Speech (POS), Base Phrase (BP)-chunk position, and the position of the token in the sentence. The Empty-Category detection is efficiently obtained using the Support Vector Machines (SVM) technique. We conducted an evaluation of the proposed diacritization algorithm, discussed the obtained results, and proposed various modifications for improving the performance of this approach.

Notes:

n/a

Khaled Shaalan

Professor of Computer Science