摘要
基于对中文文本信息语法构成尤其是词性搭配的概率特征的分析,提出一种基于双层隐马尔科夫模型的中文泛术语识别和提取的思路和系统框架,并实现相关系统,基于训练语料对多个领域的文本信息进行术语提取测试。实验结果表明,所提出的基于隐马尔科夫模型的中文泛术语识别和提取思想具有较好的实践参考意义。
After a perceptive analysis of probabilistic characteristics of syntax composition especially P0S matching of Chinese textual information, a system framework for Chinese term recognition and extraction based on dual layer HMM is presented and implemented. The method proposed shows a good performance in the tests with textual information from different domain, and the terms recognized and extracted by the implemented system can be treated as candidate terms for false - eliminating and optimizing combining with parameters of mutual information, log likelihood and domain dependency.
出处
《现代图书情报技术》
CSSCI
北大核心
2008年第12期54-58,共5页
New Technology of Library and Information Service