Voice activity detection using smoothed-fuzzy entropy (smFuzzyEn) and support vector machine

In this paper a novel voice activity detection approach using smoothed fuzzy entropy (smFuzzyEn) feature using support vector machine is proposed. The proposed approach (smFESVM) uses total variation filter and Savitzky-Golay filter to smooth the FuzzyEn feature extracted from the noisy speech signa...

Descripción completa

Detalles Bibliográficos
Autores: Elton, R. Johny, Vasuki, P., Mohanalin, J., Gnanasekaran, J. S.
Tipo de recurso: artículo
Estado:Versión publicada
Fecha de publicación:2019
País:México
Institución:UNIVERSIDAD NACIONAL AUTÓNOMA DE MÉXICO
Repositorio:Journal of Applied Research and Technology
Idioma:inglés
OAI Identifier:oai:ojs2.localhost:article/754
Acceso en línea:https://jart.icat.unam.mx/index.php/jart/article/view/754
Access Level:acceso abierto
Palabra clave:Voiced Activity Detection
Fuzzy Entropy
Support Vector Machine
Savitzky-Golay filter
Total variation filter
Descripción
Sumario:In this paper a novel voice activity detection approach using smoothed fuzzy entropy (smFuzzyEn) feature using support vector machine is proposed. The proposed approach (smFESVM) uses total variation filter and Savitzky-Golay filter to smooth the FuzzyEn feature extracted from the noisy speech signals. Also, convolution of the first order difference of TV filter and noisy fuzzy entropy feature (conFETV') is also proposed. The obtained smoothed feature vectors are further normalized using min-max normalization and the normalized feature vectors train SVM model for speech/non-speech classification. The proposed smFESVM method shows better discrimination of noise and noisy speech when tested under various nonstationary background noises of different signal-to-noise ratio levels. 10 – fold cross validation was used to validate the efficacy of the SVM classifier. The performance of the smFESVM is compared against various algorithms and comparison suggests that the results obtained by the smFESVM is efficient in detecting speech under low SNR conditions with an accuracy of 93.88%.