Speech Segregation based on Pitch Track Correction and Music-Speech Classification

doi:10.4316/AECE.2012.02003

2/2012 - 3

View TOC | « Previous Article | Next Article »

Speech Segregation based on Pitch Track Correction and Music-Speech Classification

KIM, H.-G. , JANG, G.-J. , PARK, J.-S. , KIM, J.-H. , OH, Y.-H.

View the paper record and citations in

Click to see author's profile in

SCOPUS,

IEEE Xplore,

Web of Science

Download PDF (872 KB) | Citation | Downloads: 1,036 | Views: 3,843

Author keywords
source separation, speech processing, speech analysis, signal denoising, noise cancellation

References keywords
processing(7), neural(6), signal(5), separation(5), music(5), auditory(5), speech(4), negative(4), factorization(4)
Blue keywords are present in both the references section and the paper title.

About this article
Date of Publication: 2012-05-30
Volume 12, Issue 2, Year 2012, On page(s): 15 - 20
ISSN: 1582-7445, e-ISSN: 1844-7600
Digital Object Identifier: 10.4316/AECE.2012.02003
Web of Science Accession Number: 000305608000003
SCOPUS ID: 84865301789

Abstract

Full text preview

A novel approach for pitch track correction and music-speech classification is proposed in order to improve the performance of the speech segregation system. The proposed pitch track correction method adjusts unreliable pitch estimates from adjacent reliable pitch streaks, in contrast to the previous approach using a single pitch streak which is the longest among the reliable pitch streaks in a sentence. The proposed music and speech classification method finds continuous pitch streaks of the mixture, and labels each streak as music-dominant or speech-dominant based on the observation that music pitch seldom changes in a short-time period whereas speech pitch fluctuates a lot. The speech segregation results for mixtures of speech and various competing sound sources demonstrated that the proposed methods are superior to the conventional method, especially for mixtures of speech and music signals.

References

Cited By «-- Click to see who has cited this paper

[1] A. S. Bregman, Auditory Scene Analysis: The Perceptual Organization of Sound. MIT Press, Cambridge MA, 1990.

[2] P. Smaragdis and J. C. Brown, "Non-negative matrix factorization for polyphonic music transcription," IEEE workshop on applications of signal processing to audio and acoustics, pp. 177-180, 2003
[CrossRef] [Web of Science Times Cited 428] [SCOPUS Times Cited 701]

[3] S. Choi, A. Cichocki, H.-M. Park, and S.-Y. Lee, "Blind source separation and independent component analysis: A review," Neural Information Processing - Letters and Reviews, vol. 6, 1 2005

[4] B. Raj, T. Virtanen, S. Chaudhuri, and R. Singh, "Non-negative matrix factorization based compensation of music for automatic speech recognition," in Proc. INTERSPEECH, pp. 717-720, 2010

[5] A. Nehorai and B. Porat, "Adaptive comb filtering for harmonic signal enhancement," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 34, no. 5, pp. 1124-1138, 1986
[CrossRef] [Web of Science Times Cited 162] [SCOPUS Times Cited 213]

[6] S. T. Roweis, "One microphone source separation," Advances in Neural Information Processing Systems, vol. 13, pp. 793-799, 2001

[7] G.-J. Jang, T.-W. Lee, and Y.-H. Oh, "Single channel signal separation using time-domain basis functions," IEEE Signal Processing Letters, vol. 10, pp. 168-171, 6 2003
[CrossRef] [Web of Science Times Cited 44] [SCOPUS Times Cited 68]

[8] G. Hu and D. Wang, "Monaural speech segregation based on pitch tracking and amplitude modulation," IEEE Transactions on Neural Networks, vol. 15, no. 5, pp. 1135-1150, 2004
[CrossRef] [PubMed] [Web of Science Times Cited 273] [SCOPUS Times Cited 346]

[9] D. D. Lee and H. S. Seung, "Algorithms for non-negative matrix factorization," Advances in Neural Information Processing Systems, vol. 13, pp. 556-562, 2001

[10] T. Virtanen, A. Mesaros, and M. Ryynänen, "Combining pitch-based inference and non-negative spectrogram factorization in separating vocals from polyphonic music," in Proc. ITRW on Statistical and Perceptual Audio Processing, pp. 17-22, 2008

[11] R. D. Patterson, I. Nimmo-Smith, J. Holdsworth, and P. Rice, "An efficient auditory filterbank based on the gammatone function," tech. rep., Annex B of the SVos Final Report: The auditory filterbank, APU Report 2341, 1988

[12] M. Weintraub, "A theory and computational model of auditory monaural sounds separation," ph.d. thesis, Stanford University, 1985

[13] M. P. Ryynänen and A. P. Klapuri, "Automatic transcription of melody, bass line, and chords in polyphonic music," Computer Music Journal, vol. 32, no. 3, pp. 72-86, 2008
[CrossRef] [Web of Science Times Cited 89] [SCOPUS Times Cited 147]

[14] D. L. Wang and G. J. Brown, "Separation of speech from interfering sounds based on oscillatory correlation," IEEE transactions on neural networks, vol. 10, no. 3, pp. 684-697, 1999
[CrossRef] [PubMed] [Web of Science Times Cited 183] [SCOPUS Times Cited 243]

[15] Ray Meddis, "Simulation of auditory-neural transduction: further studies," Acoustical Society of America, vol. 83, pp. 1056-1063, 1988

References Weight

Web of Science® Citations for all references: 1,179 TCR
SCOPUS® Citations for all references: 1,718 TCR

Web of Science® Average Citations per reference: 74 ACR
SCOPUS® Average Citations per reference: 107 ACR

TCR = Total Citations for References / ACR = Average Citations per Reference

We introduced in 2010 - for the first time in scientific publishing, the term "References Weight", as a quantitative indication of the quality ... Read more

Citations for references updated on 2024-04-26 02:08 in 37 seconds.

Note¹: Web of Science® is a registered trademark of Clarivate Analytics.
Note²: SCOPUS® is a registered trademark of Elsevier B.V.
Disclaimer: All queries to the respective databases were made by using the DOI record of every reference (where available). Due to technical problems beyond our control, the information is not always accurate. Please use the CrossRef link to visit the respective publisher site.

Copyright ©2001-2024
Faculty of Electrical Engineering and Computer Science
Stefan cel Mare University of Suceava, Romania

All rights reserved: Advances in Electrical and Computer Engineering is a registered trademark of the Stefan cel Mare University of Suceava. No part of this publication may be reproduced, stored in a retrieval system, photocopied, recorded or archived, without the written permission from the Editor. When authors submit their papers for publication, they agree that the copyright for their article be transferred to the Faculty of Electrical Engineering and Computer Science, Stefan cel Mare University of Suceava, Romania, if and only if the articles are accepted for publication. The copyright covers the exclusive rights to reproduce and distribute the article, including reprints and translations.

Permission for other use: The copyright owner's consent does not extend to copying for general distribution, for promotion, for creating new works, or for resale. Specific written permission must be obtained from the Editor for such copying. Direct linking to files hosted on this website is strictly prohibited.

Disclaimer: Whilst every effort is made by the publishers and editorial board to see that no inaccurate or misleading data, opinions or statements appear in this journal, they wish to make it clear that all information and opinions formulated in the articles, as well as linguistic accuracy, are the sole responsibility of the author.

Menu:

Speech Segregation based on Pitch Track Correction and Music-Speech Classification