audio indexing

Showing results in 'Publications'. Show all posts
Talhami, H., and K. Shaalan, "An Arabic/English switch for audio indexing and dialogue management", IASTED International Conference on Internet and Multimedia Systems and Applications (EuroIMSA 2006), Innsbruck, Austria, ACTA Press, pp. 189–192, 2006. Abstractaudio_indexing.pdf

This paper presents a technique for the automatic switching between Arabic and English which has been developed for audio indexing and dialogue management applications. It classifies utterances and sub-utterances as either U.S. English or Modern Standard Arabic (MSA) in a closed system. The approach extends the work of Zissman and Singer[1] to the problem of Arabic/English language identification problem. Two sets of acoustic phoneme models (English and Arabic HMMs) and two language models (phone bigrams) per acoustic model set are used. Four Large Vocabulary Continuous Speech Recognition (LVSCR) recognition passes are performed, (one for each HMM + language model set), using a phone loop grammar. The four path scores are fed into a Bayesian classifier (a multi-layer perceptron) which classifies each utterance as either English or Arabic. The technique demonstrated high accuracy on test data unseen by the system during the modelling process. The language switch has been used successfully as a front-end processor in an audio indexing and retrieval system as well as a dialogue management system.