Audio‐visual wake word spotting is a challenging multi‐modal task that exploits visual information of lip motion patterns to supplement acoustic speech to improve overall detection performance.However,most audio‐vi...Audio‐visual wake word spotting is a challenging multi‐modal task that exploits visual information of lip motion patterns to supplement acoustic speech to improve overall detection performance.However,most audio‐visual wake word spotting models are only suitable for simple single‐speaker scenarios and require high computational complexity.Further development is hindered by complex multi‐person scenarios and computational limitations in mobile environments.In this paper,a novel audio‐visual model is proposed for on‐device multi‐person wake word spotting.Firstly,an attention‐based audio‐visual voice activity detection module is presented,which generates an attention score matrix of audio and visual representations to derive active speaker representation.Secondly,the knowledge distillation method is introduced to transfer knowledge from the large model to the on‐device model to control the size of our model.Moreover,a new audio‐visual dataset,PKU‐KWS,is collected for sentence‐level multi‐person wake word spotting.Experimental results on the PKU‐KWS dataset show that this approach outperforms the previous state‐of‐the‐art methods.展开更多
N400 is an objective electrophysiological index in semantic processing for brain.This study focuses on the sensitivity of N400 effect during speech comprehension under the uni-and bi-modality conditions.Varying the Si...N400 is an objective electrophysiological index in semantic processing for brain.This study focuses on the sensitivity of N400 effect during speech comprehension under the uni-and bi-modality conditions.Varying the Signal-to-Noise Ratio(SNR) of speech signal under the conditions of Audio-only(A),Visual-only(V,i.e.,lip-reading),and Audio-Visual(AV),the semantic priming paradigm is used to evoke N400 effect and measure the speech recognition rate.For the conditions A and high SNR AV,the N400 amplitudes in the central region are larger;for the conditions of V and low SNR AV,the N400 amplitudes in the left-frontal region are larger.The N400 amplitudes of frontal and central regions under the conditions of A,AV,and V are consistent with speech recognition rate of behavioral results.These results indicate that audio-cognition is better than visual-cognition at high SNR,and visual-cognition is better than audio-cognition at low SNR.展开更多
Introduction:Osteoporosis and osteopenia are progressive disorders characterized by decreased bone mass,especially in postmenopausal women.These can be associated with body pain,fractures,hearing loss and balance diso...Introduction:Osteoporosis and osteopenia are progressive disorders characterized by decreased bone mass,especially in postmenopausal women.These can be associated with body pain,fractures,hearing loss and balance disorders.The present study aims to evaluate audio-vestibular function in postmenopausal patients with osteopenia or osteoporosis.Methods:The study included 48 postmenopausal women(new subjects)diagnosed with osteoporosis(n=23)or osteopenia(n=25)in the age range of 50e66 years,as well as 28 normal women as controls.Audiological testing included pure tone audiometry(conventional and extended high-frequency audiometry),speech audiometry,impedance audiometry and otoacoustic emissions,including both transient evoked otoacoustic emissions(TEOAEs)and distortion product otoacoustic emissions(DPOAEs).All subjects also underwent vestibular evoked myogenic potentials testing(both ocular and cervical VEMPs).Results:In the present study,hearing was worse at all frequencies in the osteoporosis group in comparison with the osteopenia and control groups,with worse speech recognition and discrimination scores and OAEs.Vestibular function was affected in 95.65%of women with osteoporosis and 76%of those with osteopenia.Conclusion:Osteoporosis and osteopenia are risk factors for vestibular dysfunction and hearing deficits in postmenopausal women.Thus,hearing and vestibular function should be monitored by audiological and vestibular testing periodically in these individuals.展开更多
A filter algorithm based on cochlear mechanics and neuron filter mechanism is proposed from the view point of vibration.It helps to solve the problem that the non-linear amplification is rarely considered in studying ...A filter algorithm based on cochlear mechanics and neuron filter mechanism is proposed from the view point of vibration.It helps to solve the problem that the non-linear amplification is rarely considered in studying the auditory filters.A cochlear mechanical transduction model is built to illustrate the audio signals processing procedure in cochlea,and then the neuron filter mechanism is modeled to indirectly obtain the outputs with the cochlear properties of frequency tuning and non-linear amplification.The mathematic description of the proposed algorithm is derived by the two models.The parameter space,the parameter selection rules and the error correction of the proposed algorithm are discussed.The unit impulse responses in the time domain and the frequency domain are simulated and compared to probe into the characteristics of the proposed algorithm.Then a 24-channel filter bank is built based on the proposed algorithm and applied to the enhancements of the audio signals.The experiments and comparisons verify that,the proposed algorithm can effectively divide the audio signals into different frequencies,significantly enhance the high frequency parts,and provide positive impacts on the performance of speech enhancement in different noise environments,especially for the babble noise and the volvo noise.展开更多
基金supported by the National Key R&D Program of China(No.2020AAA0108904)the Science and Technology Plan of Shenzhen(No.JCYJ20200109140410340).
文摘Audio‐visual wake word spotting is a challenging multi‐modal task that exploits visual information of lip motion patterns to supplement acoustic speech to improve overall detection performance.However,most audio‐visual wake word spotting models are only suitable for simple single‐speaker scenarios and require high computational complexity.Further development is hindered by complex multi‐person scenarios and computational limitations in mobile environments.In this paper,a novel audio‐visual model is proposed for on‐device multi‐person wake word spotting.Firstly,an attention‐based audio‐visual voice activity detection module is presented,which generates an attention score matrix of audio and visual representations to derive active speaker representation.Secondly,the knowledge distillation method is introduced to transfer knowledge from the large model to the on‐device model to control the size of our model.Moreover,a new audio‐visual dataset,PKU‐KWS,is collected for sentence‐level multi‐person wake word spotting.Experimental results on the PKU‐KWS dataset show that this approach outperforms the previous state‐of‐the‐art methods.
基金supported by the National Natural Science Foundation of China (Nos. 61601028 and 61431007)the Key R&D Program of Guangdong Province of China (No.2018B030339001)the National Key R&D Program of China (No. 2017YFB1002505)。
文摘N400 is an objective electrophysiological index in semantic processing for brain.This study focuses on the sensitivity of N400 effect during speech comprehension under the uni-and bi-modality conditions.Varying the Signal-to-Noise Ratio(SNR) of speech signal under the conditions of Audio-only(A),Visual-only(V,i.e.,lip-reading),and Audio-Visual(AV),the semantic priming paradigm is used to evoke N400 effect and measure the speech recognition rate.For the conditions A and high SNR AV,the N400 amplitudes in the central region are larger;for the conditions of V and low SNR AV,the N400 amplitudes in the left-frontal region are larger.The N400 amplitudes of frontal and central regions under the conditions of A,AV,and V are consistent with speech recognition rate of behavioral results.These results indicate that audio-cognition is better than visual-cognition at high SNR,and visual-cognition is better than audio-cognition at low SNR.
文摘Introduction:Osteoporosis and osteopenia are progressive disorders characterized by decreased bone mass,especially in postmenopausal women.These can be associated with body pain,fractures,hearing loss and balance disorders.The present study aims to evaluate audio-vestibular function in postmenopausal patients with osteopenia or osteoporosis.Methods:The study included 48 postmenopausal women(new subjects)diagnosed with osteoporosis(n=23)or osteopenia(n=25)in the age range of 50e66 years,as well as 28 normal women as controls.Audiological testing included pure tone audiometry(conventional and extended high-frequency audiometry),speech audiometry,impedance audiometry and otoacoustic emissions,including both transient evoked otoacoustic emissions(TEOAEs)and distortion product otoacoustic emissions(DPOAEs).All subjects also underwent vestibular evoked myogenic potentials testing(both ocular and cervical VEMPs).Results:In the present study,hearing was worse at all frequencies in the osteoporosis group in comparison with the osteopenia and control groups,with worse speech recognition and discrimination scores and OAEs.Vestibular function was affected in 95.65%of women with osteoporosis and 76%of those with osteopenia.Conclusion:Osteoporosis and osteopenia are risk factors for vestibular dysfunction and hearing deficits in postmenopausal women.Thus,hearing and vestibular function should be monitored by audiological and vestibular testing periodically in these individuals.
基金Project(17KJB510029)supported by the Natural Science Foundation of the Jiangsu Higher Education Institutions,ChinaProject(GXL2017004)supported by the Scientific Research Foundation of Nanjing Forestry University,China+3 种基金Project(202102210132)supported by the Important Project of Science and Technology of Henan Province,ChinaProject(B2019-51)supported by the Scientific Research Foundation of Henan Polytechnic University,ChinaProject(51521003)supported by the Foundation for Innovative Research Groups of the National Natural Science Foundation of ChinaProject(KQTD2016112515134654)supported by Shenzhen Science and Technology Program,China。
文摘A filter algorithm based on cochlear mechanics and neuron filter mechanism is proposed from the view point of vibration.It helps to solve the problem that the non-linear amplification is rarely considered in studying the auditory filters.A cochlear mechanical transduction model is built to illustrate the audio signals processing procedure in cochlea,and then the neuron filter mechanism is modeled to indirectly obtain the outputs with the cochlear properties of frequency tuning and non-linear amplification.The mathematic description of the proposed algorithm is derived by the two models.The parameter space,the parameter selection rules and the error correction of the proposed algorithm are discussed.The unit impulse responses in the time domain and the frequency domain are simulated and compared to probe into the characteristics of the proposed algorithm.Then a 24-channel filter bank is built based on the proposed algorithm and applied to the enhancements of the audio signals.The experiments and comparisons verify that,the proposed algorithm can effectively divide the audio signals into different frequencies,significantly enhance the high frequency parts,and provide positive impacts on the performance of speech enhancement in different noise environments,especially for the babble noise and the volvo noise.