Graph Learning Based Speaker Independent Speech Emotion Recognition

doi:10.4316/AECE.2014.02003

2/2014 - 3

View TOC | « Previous Article | Next Article »

HIGH-IMPACT PAPER

Graph Learning Based Speaker Independent Speech Emotion Recognition

XU, X. , HUANG, C. , WU, C. , WANG, Q. , ZHAO, L.

View the paper record and citations in

Click to see author's profile in

SCOPUS,

IEEE Xplore,

Web of Science

Download PDF (729 KB) | Citation | Downloads: 1,027 | Views: 4,128

Author keywords
speech emotion recognition, speaker penalty graph learning, graph embedding framework, dimensionality reduction

References keywords
recognition(12), speech(10), emotion(8), analysis(8), pattern(7), reduction(5), human(5), dimensionality(5), science(4), machine(4)
Blue keywords are present in both the references section and the paper title.

About this article
Date of Publication: 2014-05-31
Volume 14, Issue 2, Year 2014, On page(s): 17 - 22
ISSN: 1582-7445, e-ISSN: 1844-7600
Digital Object Identifier: 10.4316/AECE.2014.02003
Web of Science Accession Number: 000340868100003
SCOPUS ID: 84901856862

Abstract

Full text preview

In this paper, the algorithm based on graph learning and graph embedding framework, Speaker-Penalty Graph Learning (SPGL), is proposed in the research of speech emotion recognition to solve the problems caused by different speakers. Graph embedding framework theory is used to construct the dimensionality reduction stage of speech emotion recognition. Special penalty and intrinsic graphs of the graph embedding framework is proposed to penalize the impacts from different speakers in the task of speech emotion recognition. The original speech emotion features are extracted by various categories, reflecting different characteristics of each speech sample. According to the experiments in speech emotion corpus using different classifiers, the proposed method with linear and kernelized mapping forms can both achieve relatively better performance than the state-of-the-art dimensionality reduction methods.

References

Cited By «-- Click to see who has cited this paper

[1] F. Dellaert, T. Polzin, A. Waibel, "Recognizing emotion in speech," in International Conference on Spoken Language, Philadelphia, PA, USA, 1996, pp.1970-1973.
[CrossRef]

[2] D. Ververidis, C. Kotropoulos, "Emotional speech recognition: Resources, features, and methods," Speech Communication, vol. ED-48, pp. 1162-1181, 2006.
[CrossRef] [Web of Science Times Cited 542] [SCOPUS Times Cited 725]

[3] B. Schuller, G. Rigoll, "Timing levels in segment-based speech emotion recognition," in INTERSPEECH'2006, Pittsburgh, PA, USA, 2006, pp. 1818-1821.

[4] P. Oudeyer, "The production and recognition of emotions in speech: features and algorithms," International Journal of Human-Computer Studies, vol. ED-59, pp. 157-183, 2003.
[CrossRef] [Web of Science Times Cited 167] [SCOPUS Times Cited 353]

[5] R. Tato, R. Santos, R. Kompe, J. Pardo, "Emotional space improves emotion recognition," in International Conference on Spoken Language, Denver, CO, USA, 2002, pp. 2029-2032.

[6] B. Schuller, R. Müller, M. K. Lang, G. Rigoll, "Speaker independent emotion recognition by early fusion of acoustic and linguistic features within ensembles," in INTERSPEECH'2005, Lisbon, Portugal, 2005, pp. 805-808.

[7] B. Schuller, S. Reiter, R. Muller, M. Al-Hames, "Speaker independent speech emotion recognition by ensemble classification," in IEEE International Conf. Multimedia and Expo(ICME), Amsterdam, The Netherlands, 2005, pp. 864-867.
[CrossRef] [SCOPUS Times Cited 130]

[8] T. Kostoulas, T. Ganchev, N. Fakotakis, "Study on speaker-independent emotion recognition from speech on real-world data," in Verbal and nonverbal features of human-human and human-machine interaction, Springer Berlin Heidelberg, 2008, pp. 235-242.
[CrossRef] [SCOPUS Times Cited 13]

[9] M. Belkin, P. Niyogi, "Laplacian eigenmaps and spectral techniques for embedding and clustering," in Advances in Neutral Information Processing Systems(NIPS) 14, Vancouver, Canada, 2002, pp. 585-591.

[10] X. He, P. Niyogi, "Locality preserving projections," in Advances in Neural Information Processing Systems (NIPS) 16, Whistler, Canada, 2003, pp. 153-160.

[11] S. Roweis, L. Saul, "Nonlinear dimensionality reduction by locally linear embedding," Science, vol. ED-290(5500), pp. 2323-2326, 2000.
[CrossRef] [Web of Science Times Cited 10044] [SCOPUS Times Cited 12673]

[12] S. Lafon, A. Lee, "Diffusion maps and coarse-graining: A unified framework for dimensionality reduction, graph partitioning, and data set parameterization," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. ED-28(9), pp. 1393-1403, 2006.
[CrossRef] [Web of Science Times Cited 397] [SCOPUS Times Cited 485]

[13] J. Tenenbaum, V. de Silva, J. Langford, "A global geometric framework for nonlinear dimensionality reduction," Science, vol. ED-290, pp. 2319-2323, 2000.
[CrossRef] [Web of Science Times Cited 8685] [SCOPUS Times Cited 11068]

[14] H. Chen, H. Chang, T. Liu, "Local discriminant embedding and its variants," in IEEE Conf. Computer Vision and Pattern Recognition (CVPR), San Diego, CA, USA, 2005, pp. 846-853.
[CrossRef] [SCOPUS Times Cited 598]

[15] S. Yan, D. Xu, B. Zhang, H. Zhang, Q. Yang, S. Lin, "Graph embedding and extensions: a general framework for dimensionality reduction," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. ED-29(1), pp. 40-51, 2007.
[CrossRef] [Web of Science Times Cited 2333] [SCOPUS Times Cited 2832]

[16] F. De la Torre, "A least-squares framework for component analysis," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. ED-34(6), pp. 1041-1055, 2012.
[CrossRef] [Web of Science Times Cited 124] [SCOPUS Times Cited 147]

[17] M. You, C. Chen, J. Bu, J. Liu, J. Tao, "Emotional speech analysis on nonlinear manifold," in International Conference on Pattern Recognition(ICPR), Hong Kong, 2006, pp. 91-94.
[CrossRef] [SCOPUS Times Cited 24]

[18] S. Zhang, X. Zhao, B. Lei, "Speech emotion recognition using an enhanced Kernel Isomap for human-robot interaction," International Journal of Advanced Robotic Systems, vol. ED-10(114), pp. 1-7, 2013.
[CrossRef] [Web of Science Times Cited 7] [SCOPUS Times Cited 26]

[19] J. Shawe-Taylor, N. Cristianini, Kernel methods for pattern analysis. Cambridge University Press, 2004.

[20] Friedman J H, "Regularized discriminant analysis," Journal of the American Statistical Association, vol. ED-84(405), pp. 165-175, 1989.
[CrossRef] [SCOPUS Times Cited 1890]

[21] D. Cai, X. He, "Semi-supervised discriminant analysis," in International Conference on Computer Vision(ICCV). Rio de Janeiro, Brazil, 2007, pp. 1-7.
[CrossRef] [SCOPUS Times Cited 694]

[22] L. He, J. M. Buenaposada, L. Baumela, "An empirical comparison of graph-based dimensionality reduction algorithms on facial expression recognition tasks," in International Conf. Pattern Recognition (ICPR), Tampa, FL, USA, 2008, pp. 1-4.
[CrossRef]

[23] F. Burkhardt, A. Paeschke, M. Rolfes, W. F. Sendlmeier, B. Weiss, "A database of German emotional speech," in INTERSPEECH'2005, Lisbon, Portugal, 2005, pp. 1517-1520.

[24] O. Martin, I. Kotsia, B. Macq, I. Pitas, "The enterface'05 audio-visual emotion database," in IEEE Conf. Data Engineering Workshops, Atlanta, GA, USA, 2006, pp. 8-8.
[CrossRef] [SCOPUS Times Cited 524]

References Weight

Web of Science® Citations for all references: 22,299 TCR
SCOPUS® Citations for all references: 32,182 TCR

Web of Science® Average Citations per reference: 892 ACR
SCOPUS® Average Citations per reference: 1,287 ACR

TCR = Total Citations for References / ACR = Average Citations per Reference

We introduced in 2010 - for the first time in scientific publishing, the term "References Weight", as a quantitative indication of the quality ... Read more

Citations for references updated on 2024-04-25 14:32 in 107 seconds.

Note¹: Web of Science® is a registered trademark of Clarivate Analytics.
Note²: SCOPUS® is a registered trademark of Elsevier B.V.
Disclaimer: All queries to the respective databases were made by using the DOI record of every reference (where available). Due to technical problems beyond our control, the information is not always accurate. Please use the CrossRef link to visit the respective publisher site.

Copyright ©2001-2024
Faculty of Electrical Engineering and Computer Science
Stefan cel Mare University of Suceava, Romania

All rights reserved: Advances in Electrical and Computer Engineering is a registered trademark of the Stefan cel Mare University of Suceava. No part of this publication may be reproduced, stored in a retrieval system, photocopied, recorded or archived, without the written permission from the Editor. When authors submit their papers for publication, they agree that the copyright for their article be transferred to the Faculty of Electrical Engineering and Computer Science, Stefan cel Mare University of Suceava, Romania, if and only if the articles are accepted for publication. The copyright covers the exclusive rights to reproduce and distribute the article, including reprints and translations.

Permission for other use: The copyright owner's consent does not extend to copying for general distribution, for promotion, for creating new works, or for resale. Specific written permission must be obtained from the Editor for such copying. Direct linking to files hosted on this website is strictly prohibited.

Disclaimer: Whilst every effort is made by the publishers and editorial board to see that no inaccurate or misleading data, opinions or statements appear in this journal, they wish to make it clear that all information and opinions formulated in the articles, as well as linguistic accuracy, are the sole responsibility of the author.

Menu:

Graph Learning Based Speaker Independent Speech Emotion Recognition