Improved Spelling Error Detection and Correction for Arabic

Attia, M., P. Pecina, Y. Samih, K. Shaalan, and J. van Genabith, "Improved Spelling Error Detection and Correction for Arabic", The International Conference on Computational Linguistics (COLING), Mumbai, India, 14 December, 2012. copy at

Date Presented:

14 December


A spelling error detection and correction application is based on three main components: a dictionary (or reference word list), an error model and a language model. While most of the attention in the literature has been directed to the language model, we show how improvements in any of the three components can lead to significant cumulative improvements in the overall performance of the system. We semi-automatically develop a dictionary of 9.3 million fully inflected Arabic words using a morphological transducer and a large corpus. We improve the error model by analysing error types and creating an edit distance based re-ranker. We also improve the language model by analysing the level of noise in different sources of data and selecting the optimal subset to train the system on. Testing and evaluation experiments show that our system significantly outperforms Microsoft Word 2010, OpenOffice Ayaspell and Google Docs.

Related External Link

improved_spelling.pdf779.48 KB