6 months ago

Abstract

In this paper, we present our retrieving system for QUery by Example Search on Speech Task (QUESST), comprising the posteriorgram-based modeling approach along with the weighted fast sequential dynamic time warping algorithm (WFS-DTW). For this year, our main effort was directed toward developing language-dependent keyword matching system, utilizing all available information about spoken languages, considering all queries and utterance files. Despite the fact that the retrieving algorithm is the same as we used in previous year, a big novelty resides in the way of utilizing the information about all languages spoken in the retrieving database. Two low-resource systems using language dependent acoustic unit modeling (AUM) approaches have been submitted. The first one, called supervised, employs four well-trained phonetic decoders using acoustic models trained on time-aligned and annotated speech. The second one, defined as unsupervised, uses blind phonetic segmentation for the specific language where the information about spoken language is extracted from Mediaeval 2013 and Mediaeval 2014 databases. Considering the influence on the overall retrieving performance, the acoustic model adaptation to the specific language through retraining procedure was investigated for both approaches as well.

Source PDF