Back to list

Detail of contribution

Auteur: Katarzyna KLESSA

Computer-assisted analysis of scalar features in spoken language corpora

Abstract/Résumé: In the recent years, the process of development and use of language and speech corpora has been posing more and more new challenges due to the growing expectations related both to the greater size and increasing spontaneity and naturalness of data. Thus, the existing specifications for the design and implementation of databases cease to be fully sufficient. For example, it is often quite problematic to reflect the individual character and specificity of spontaneous or even just quasi-spontaneous speech in the process of annotation of recordings (not to mention the complexity of acquiring reliable data of certain types, e.g. real emotions) [1,2,3]. In automatic speaker recognition or text-independent speaker identification (needed e.g. for advanced forensic applications) the results obtained with spontaneous speech data are reported to be worse than those based on „cleaner”, more controlled corpora (e.g. [4]) and at the same time the precise definition of the cues and features governing speaker identification by humans still remains unclear. As it appears, one of the basic tasks would be to formulate annotation specifications enabling better tracking and extraction of such features. Due to the vagueness of definitions and overlapping character of many suprasegmental or paralinguistic features, for at least some of them it seems justified to apply a bottom-up approach based on scalar (or vector) feature spaces rather than on pre-defined categories or levels. In this paper, a proposal of such an annotation framework is discussed in the context of its potential future application in the development of an automatic speaker characterization and identification system. The presented framework includes a new annotation software which has been recently developed with a view to support annotation of both scalar and categorial features in spontaneous and emotional speech. Acknowledgement: This work is supported from the financial resources for science in the years 2010–2012 as a development project no. OR00017012. References: [1] Campbell, N., 2002. Towards a grammar of spoken language: incorporating paralinguistic information. In J. H. L. Hansen, B. L. Pellom, Eds., 7th ICSLP 2002 - INTERSPEECH 2002, Denver, Colorado, USA, September 16-20, 2002. [2] Douglas-Cowie, E., N. Campbell, R. Cowie and P. Roach, 2003. Emotional speech: towards a new generation of databases. Speech Communication. (40), 33-60. [3] Devillers, L., Vasilescu, I., 2003. Prosodic cues for emotion characterization in real-life spoken dialogs. In ISCA Eurospeech, Geneva, September 2003. [4] Lamel, L, Gauvain, J.L., 2000. Speaker verification over the telephone. Speech Communication 31(2-3): pp.141-154.