摘要
研究了现有的关联分类算法在文本分类中的应用,发现对于有结构的文本数据,关联分类算法未考虑文本的语义信息导致分类精度不够理想,为此提出了基于规则重构的关联文本分类方法。该方法利用词共现模型,在已挖掘的分类规则基础上,将具有高共现程度的词对组合在一起进行规则重构,形成了有结构的带有文本语义信息的分类规则,再利用它们对新文本进行分类。实验结果表明,该方法在分类精度上优于其它的关联文本分类方法(ARC)。
After studying the application of association-classification method in text classification, we discovered that the associationclassification' s classification accuracy isn' t reasonable enough in structured text data because of ignoring semantic information of the text. Therefore, a new associative text classification algorithm based on restructured rule (RARC) is presented. It took advantage of the term co-occurrence model to combine the term pairs with high s degree together to restructure the mined rules into structured classification rules with semantic information of text data, and then using them to build classifier. The result of experiments show that RARC had better classification accuracy than other associative classification method (ARC).
出处
《计算机工程与设计》
CSCD
北大核心
2009年第3期624-626,630,共4页
Computer Engineering and Design
基金
国家火炬计划基金项目(2004EB33006)
江苏省高校自然科学指导性计划基金项目(05JKD520050)
关键词
词共现
向量空间模型
规则重构
关联分类
文本分类
term co-occurrence
vector space model
rule restructured
associative classification
text classification