A new scheme is presented to detect a large number ofKeywordsin voice controlled switchboard tasks. The new scheme is based on two stages. In the first stage, N best syllable candidates with their corresponding acous...A new scheme is presented to detect a large number ofKeywordsin voice controlled switchboard tasks. The new scheme is based on two stages. In the first stage, N best syllable candidates with their corresponding acoustic scores are generated by an acoustic recognizer. In the second stage, a semantic model based parser is applied to determine the optimum keywords by searching through the lattice of N best candidates. The experimental results show that when the spoken input deviates from the predefined syntactic constraints, the parser can also demonstrate high performance. For comparison purposes, the most common way to incorporate the syntactic knowledge of the task directly into the acoustic recognizer in the form of a finite state network is also investigated. Furthermore, to address the sparse data problems, out of domain data in the form of newspaper text are used to obtain a more robust combined semantic model. The experiments show that the combined semantic model can improve the keywords detection rate from 90.07% to 92.91% when 80 ungrammatical sentences which do not conform to the task grammar are used as testing material.展开更多
基金the State High-Tech Developments Planof China !( No. 863 -3 0 6-0 2 -1) Chinese211Engineering Project!( No.9610 3 -2 )
文摘A new scheme is presented to detect a large number ofKeywordsin voice controlled switchboard tasks. The new scheme is based on two stages. In the first stage, N best syllable candidates with their corresponding acoustic scores are generated by an acoustic recognizer. In the second stage, a semantic model based parser is applied to determine the optimum keywords by searching through the lattice of N best candidates. The experimental results show that when the spoken input deviates from the predefined syntactic constraints, the parser can also demonstrate high performance. For comparison purposes, the most common way to incorporate the syntactic knowledge of the task directly into the acoustic recognizer in the form of a finite state network is also investigated. Furthermore, to address the sparse data problems, out of domain data in the form of newspaper text are used to obtain a more robust combined semantic model. The experiments show that the combined semantic model can improve the keywords detection rate from 90.07% to 92.91% when 80 ungrammatical sentences which do not conform to the task grammar are used as testing material.