期刊文献+

一种结合关键词与共现词对的向量空间模型 被引量:4

Vector space model based on keywords and co-occurrence word pairs
下载PDF
导出
摘要 提出了一种结合关键词特征和共现词对特征的向量空间模型。首先,通过分词和去除停用词提取文本中的候选关键词,利用文本频率筛选关键词特征。然后,基于获得的关键词特征两两构造候选共现词对,定义支持度和置信度筛选共现词对特征。最后,结合关键词特征和共现词对特征构建向量空间模型。文本分类实验结果表明,提出的模型具有更强的文本分类能力。 A new vector space model is proposed, which uses both keyword and co-occurrence term as the representation features of documents. Firstly, the keyword candidates are extracted from docu- ments by segmenting texts and removing stop words,and the keyword features are filtered by document frequency. Secondly, based on the obtained keyword features, the co-occurrence word pairs are construc- ted,and support degree and confidence degree are defined to filter the features of co-occurrence word pairs. Finally, the keyword features and the features of co-occurrence word pairs are combined to construct the vector space model. The text-classification experiments show that the proposed model has better ability of text classification.
出处 《计算机工程与科学》 CSCD 北大核心 2014年第5期971-976,共6页 Computer Engineering & Science
基金 十二五科技支撑课题(2011BAH10B04)
关键词 向量空间模型 共现词对 语义相关性 文本分类 vector space model co-occurrence word semantical relationship text classification
  • 相关文献

参考文献9

二级参考文献100

共引文献110

同被引文献64

引证文献4

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部