2/2012 - 3 |
Speech Segregation based on Pitch Track Correction and Music-Speech ClassificationKIM, H.-G. , JANG, G.-J. , PARK, J.-S. , KIM, J.-H. , OH, Y.-H. |
Extra paper information in |
Click to see author's profile in SCOPUS, IEEE Xplore, Web of Science |
Download PDF (872 KB) | Citation | Downloads: 1,122 | Views: 4,081 |
Author keywords
source separation, speech processing, speech analysis, signal denoising, noise cancellation
References keywords
processing(7), neural(6), signal(5), separation(5), music(5), auditory(5), speech(4), negative(4), factorization(4)
Blue keywords are present in both the references section and the paper title.
About this article
Date of Publication: 2012-05-30
Volume 12, Issue 2, Year 2012, On page(s): 15 - 20
ISSN: 1582-7445, e-ISSN: 1844-7600
Digital Object Identifier: 10.4316/AECE.2012.02003
Web of Science Accession Number: 000305608000003
SCOPUS ID: 84865301789
Abstract
A novel approach for pitch track correction and music-speech classification is proposed in order to improve the performance of the speech segregation system. The proposed pitch track correction method adjusts unreliable pitch estimates from adjacent reliable pitch streaks, in contrast to the previous approach using a single pitch streak which is the longest among the reliable pitch streaks in a sentence. The proposed music and speech classification method finds continuous pitch streaks of the mixture, and labels each streak as music-dominant or speech-dominant based on the observation that music pitch seldom changes in a short-time period whereas speech pitch fluctuates a lot. The speech segregation results for mixtures of speech and various competing sound sources demonstrated that the proposed methods are superior to the conventional method, especially for mixtures of speech and music signals. |
References | | | Cited By «-- Click to see who has cited this paper |
[1] A. S. Bregman, Auditory Scene Analysis: The Perceptual Organization of Sound. MIT Press, Cambridge MA, 1990.
[2] P. Smaragdis and J. C. Brown, "Non-negative matrix factorization for polyphonic music transcription," IEEE workshop on applications of signal processing to audio and acoustics, pp. 177-180, 2003 [CrossRef] [Web of Science Times Cited 434] [SCOPUS Times Cited 712] [3] S. Choi, A. Cichocki, H.-M. Park, and S.-Y. Lee, "Blind source separation and independent component analysis: A review," Neural Information Processing - Letters and Reviews, vol. 6, 1 2005 [4] B. Raj, T. Virtanen, S. Chaudhuri, and R. Singh, "Non-negative matrix factorization based compensation of music for automatic speech recognition," in Proc. INTERSPEECH, pp. 717-720, 2010 [5] A. Nehorai and B. Porat, "Adaptive comb filtering for harmonic signal enhancement," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 34, no. 5, pp. 1124-1138, 1986 [CrossRef] [Web of Science Times Cited 163] [SCOPUS Times Cited 213] [6] S. T. Roweis, "One microphone source separation," Advances in Neural Information Processing Systems, vol. 13, pp. 793-799, 2001 [7] G.-J. Jang, T.-W. Lee, and Y.-H. Oh, "Single channel signal separation using time-domain basis functions," IEEE Signal Processing Letters, vol. 10, pp. 168-171, 6 2003 [CrossRef] [Web of Science Times Cited 45] [SCOPUS Times Cited 70] [8] G. Hu and D. Wang, "Monaural speech segregation based on pitch tracking and amplitude modulation," IEEE Transactions on Neural Networks, vol. 15, no. 5, pp. 1135-1150, 2004 [CrossRef] [PubMed] [Web of Science Times Cited 275] [SCOPUS Times Cited 348] [9] D. D. Lee and H. S. Seung, "Algorithms for non-negative matrix factorization," Advances in Neural Information Processing Systems, vol. 13, pp. 556-562, 2001 [10] T. Virtanen, A. Mesaros, and M. Ryynänen, "Combining pitch-based inference and non-negative spectrogram factorization in separating vocals from polyphonic music," in Proc. ITRW on Statistical and Perceptual Audio Processing, pp. 17-22, 2008 [11] R. D. Patterson, I. Nimmo-Smith, J. Holdsworth, and P. Rice, "An efficient auditory filterbank based on the gammatone function," tech. rep., Annex B of the SVos Final Report: The auditory filterbank, APU Report 2341, 1988 [12] M. Weintraub, "A theory and computational model of auditory monaural sounds separation," ph.d. thesis, Stanford University, 1985 [13] M. P. Ryynänen and A. P. Klapuri, "Automatic transcription of melody, bass line, and chords in polyphonic music," Computer Music Journal, vol. 32, no. 3, pp. 72-86, 2008 [CrossRef] [Web of Science Times Cited 91] [SCOPUS Times Cited 148] [14] D. L. Wang and G. J. Brown, "Separation of speech from interfering sounds based on oscillatory correlation," IEEE transactions on neural networks, vol. 10, no. 3, pp. 684-697, 1999 [CrossRef] [PubMed] [Web of Science Times Cited 186] [SCOPUS Times Cited 249] [15] Ray Meddis, "Simulation of auditory-neural transduction: further studies," Acoustical Society of America, vol. 83, pp. 1056-1063, 1988 Web of Science® Citations for all references: 1,194 TCR SCOPUS® Citations for all references: 1,740 TCR Web of Science® Average Citations per reference: 75 ACR SCOPUS® Average Citations per reference: 109 ACR TCR = Total Citations for References / ACR = Average Citations per Reference We introduced in 2010 - for the first time in scientific publishing, the term "References Weight", as a quantitative indication of the quality ... Read more Citations for references updated on 2024-11-16 20:40 in 43 seconds. Note1: Web of Science® is a registered trademark of Clarivate Analytics. Note2: SCOPUS® is a registered trademark of Elsevier B.V. Disclaimer: All queries to the respective databases were made by using the DOI record of every reference (where available). Due to technical problems beyond our control, the information is not always accurate. Please use the CrossRef link to visit the respective publisher site. |
Faculty of Electrical Engineering and Computer Science
Stefan cel Mare University of Suceava, Romania
All rights reserved: Advances in Electrical and Computer Engineering is a registered trademark of the Stefan cel Mare University of Suceava. No part of this publication may be reproduced, stored in a retrieval system, photocopied, recorded or archived, without the written permission from the Editor. When authors submit their papers for publication, they agree that the copyright for their article be transferred to the Faculty of Electrical Engineering and Computer Science, Stefan cel Mare University of Suceava, Romania, if and only if the articles are accepted for publication. The copyright covers the exclusive rights to reproduce and distribute the article, including reprints and translations.
Permission for other use: The copyright owner's consent does not extend to copying for general distribution, for promotion, for creating new works, or for resale. Specific written permission must be obtained from the Editor for such copying. Direct linking to files hosted on this website is strictly prohibited.
Disclaimer: Whilst every effort is made by the publishers and editorial board to see that no inaccurate or misleading data, opinions or statements appear in this journal, they wish to make it clear that all information and opinions formulated in the articles, as well as linguistic accuracy, are the sole responsibility of the author.