期刊文献+

基于生存法则的稳定新词识别方法的研究 被引量:1

Research on the Method of Stable New Words Identification Based on the Law of Survival
下载PDF
导出
摘要 针对新词识别过程中出现大量噪声词和伪新词的问题,提出一种基于生存法则模型的稳定新词识别方法.该方法借鉴自然法则和遗忘定律,分析候选词串在时序分布中的词频变化,通过词串在语言环境中表现的综合竞争力淘汰突发性特征的噪声词以及词义不稳定的伪新词,识别网络短文本中出现的稳定新词.该方法可以保证网络新词的新颖性和稳定性,可为舆情本体新概念的抽取提供基础支持,有助于提高舆情本体概念抽取的准确率和查全率. Aiming at the problem of the emergence of a large number of garbage words and false words in the process of word detection, this paper proposes a novel detection algorithm to identify stable new words based the survival rule model. In this algorithm,the study tries to find stable new words from large scale network short texts by refer to the natural selection rule and forgetting law. This algorithm analysis of frequency changes in temporal distribution of candidate strings, eliminate the garbage words and pseudo new words by the comprehensive competitiveness of words in the language environment. The algorithm can guarantee the novelty and stability of the new words provide basic support for the extraction of new concept of public opinion ontology, and can be useful to improve the accuracy and recall of the concept extraction.
出处 《新疆大学学报(自然科学版)》 CAS 2018年第1期73-79,共7页 Journal of Xinjiang University(Natural Science Edition)
基金 国家自然科学基金重点项目(61331011) 新疆自治区自然科学基金项目(2014211A016) 国家社会科学基金项目(13BYY062)
关键词 新词识别 稳定新词 时序分析 新词生存法则 new word recognition stable new words time series analysis survival rule of new words
  • 相关文献

参考文献14

二级参考文献80

共引文献180

同被引文献46

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部