期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
Whisper intelligibility enhancement based on noise robust feature and SVM 被引量:2
1
作者 周健 赵力 +1 位作者 梁瑞宇 方贤勇 《Journal of Southeast University(English Edition)》 EI CAS 2012年第3期261-265,共5页
A machine learning based speech enhancement method is proposed to improve the intelligibility of whispered speech. A binary mask estimated by a two-class support vector machine (SVM) classifier is used to synthesize... A machine learning based speech enhancement method is proposed to improve the intelligibility of whispered speech. A binary mask estimated by a two-class support vector machine (SVM) classifier is used to synthesize the enhanced whisper. A novel noise robust feature called Gammatone feature cosine coefficients (GFCCs) extracted by an auditory periphery model is derived and used for the binary mask estimation. The intelligibility performance of the proposed method is evaluated and compared with the traditional speech enhancement methods. Objective and subjective evaluation results indicate that the proposed method can effectively improve the intelligibility of whispered speech which is contaminated by noise. Compared with the power subtract algorithm and the log-MMSE algorithm, both of which do not improve the intelligibility in lower signal-to-noise ratio (SNR) environments, the proposed method has good performance in improving the intelligibility of noisy whisper. Additionally, the intelligibility of the enhanced whispered speech using the proposed method also outperforms that of the corresponding unprocessed noisy whispered speech. 展开更多
关键词 whispered speech intelligibility enhancement noise robust feature machine learning
下载PDF
基于CycleGAN的语音可懂度关键技术
2
作者 肖晶 刘佳奇 +2 位作者 李登实 赵兰馨 王前瑞 《计算机系统应用》 2022年第6期1-9,共9页
语音可懂度增强是一种在嘈杂环境中再现清晰语音的感知增强技术.许多研究通过说话风格转换(SSC)来增强语音可懂度,这种方法仅依靠伦巴第效应,因此在强噪声干扰下效果不佳. SSC还利用简单的线性变换对基频(F0)的转换进行建模,并且只映射... 语音可懂度增强是一种在嘈杂环境中再现清晰语音的感知增强技术.许多研究通过说话风格转换(SSC)来增强语音可懂度,这种方法仅依靠伦巴第效应,因此在强噪声干扰下效果不佳. SSC还利用简单的线性变换对基频(F0)的转换进行建模,并且只映射很少维的梅尔倒谱系数(MCEPs).因为F0和MCEPs是语音的两个重要特征,对这些特征进行充分的建模是非常必要的.因此本文进行了一个创新性研究即通过连续小波变换(CWT)将F0分解为10维来描述不同时间尺度的语音,以实现F0的有效转换,而且使用20维表示MCEPs实现MCEPs的转换.除此之外,还利用iMetricGAN网络来优化强噪声中的语音可懂度指标.实验结果表明,提出的基于CycleGAN使用CWT和iMetricGAN的非平行语音风格转换方法 (NS-CiC)在客观和主观评价上均显著提高了强噪声环境下的语音可懂度. 展开更多
关键词 深度学习 可懂度增强 连续小波变换 iMetricGAN CycleGAN
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部