期刊文献+

挖掘重要项集的关联文本分类 被引量:2

Association text classification of mining ItemSet significance
下载PDF
导出
摘要 针对在关联规则分类算法的构造分类器阶段中只考虑特征词是否存在,忽略了文本特征权重的问题,基于关联规则的文本分类方法(ARC-BC)的基础上提出一种可以提高关联文本分类准确率的ISARC(ItemSet Significance-based ARC)算法.该算法利用特征项权重定义了k-项集重要度,通过挖掘重要项集来产生关联规则,并考虑提升度对待分类文本的影响.实验结果表明,挖掘重要项集的ISARC算法可以提高关联文本分类的准确率. Text classification technology is an important basis of information retrieval and text mining,and its main task is to mark category according to a given category set.Text classification has a wide range of applications in natural language processing and understanding、information organization and management、information filtering and other areas.At present,text classification can be mainly divided into three groups: based on statistical methods、based on connection method and the method based on rules. The basic idea of the traditional association text classification algorithm associative rule-based classifier by category(ARC-BC) is to use the association rule mining algorithm Apriori which generates frequent items that appear frequently feature items or itemsets,and then use these frequent items as rule antecedent and category is used as rule consequent to form the rule set and then make these rules constitute a classifier.During classifying the test samples,if the test sample matches the rule antecedent,put the rule that belongs to the class counterm to the cumulative confidence.If the confidence of the category counter is the maximum,then determine the test sample belongs to that category. However,ARC-BC algorithm has two main drawbacks:(1) During the structure classifier,it only considers the existence of feature words and ignores the weight of text features for mining frequent itemsets and generated association rules may affect the classification results;(2) In the class prediction stage,it gives too much emphasis on rule confidence.In the mining process,there will be ruels that have the same antecedent but different consequent,and if only considering the rules' confidence in predicting the impact of text classification,without considering the correlation between rules antecedent and consequdent,it will also affect the classification accuracy.In order to solve the two problems,in this paper,a new algorithm itemset significance-based a ssociation rule-based categorizer(ISARC) is propo
出处 《南京大学学报(自然科学版)》 CAS CSCD 北大核心 2011年第5期544-550,共7页 Journal of Nanjing University(Natural Science)
基金 国家自然科学基金(61070062)
关键词 文本分类 基于关联规则的分类算法 权重 重要项集 text classification association rule-based categorizer by category weight itemset significance
  • 相关文献

参考文献16

  • 1Liu B, Hsu W, Ma Y. Integrating classification and association rule mining. Proceedings of ACM International Conference on Knowledge Discovery and Data Mining. New York: ACM, 1998, 80-86. 被引量:1
  • 2I.iW, HanJ, PeiJ. Accurate and efficient classification based on muhiple class-association rules. Proceedings of the 2001 IEEE International Conference on Data Mining. California, 2001, 369-376. 被引量:1
  • 3Aaiane O R, Antonie M. Classifying text documents by associating terms with text categories Proceedings of 13^th Australasian Database Conference. Melbournel Australian Computer Society, 2002, 24(2):215-222. 被引量:1
  • 4Gourab K, Md. Monirul I, Sirajum M. ACN An associative classifier with negative rules IEEE International Conference on Computational Science and Engineering, 2008, 369--325. 被引量:1
  • 5Cheng H, Yan X F, Han J W, et al. Direct discriminative pattern mining for effective classification. 2008 IEEE 24^th International Conference on Data Engineering, 2008,169- 178. 被引量:1
  • 6Elena B, Silvia C, Paolo G. A lazy approach to associative classification. IEEE Transactions on Knowledge and Data Engineering, 2008, 20 (2) :156-171. 被引量:1
  • 7陈晓云,胡运发.基于自适应加权的文本关联分类[J].小型微型计算机系统,2007,28(1):116-121. 被引量:6
  • 8商炳章 白清源.基于特征项权重改进的关联文本分类[J].计算机研究与发展,2008,45:252-256. 被引量:1
  • 9陈东亮 白清源.基于词频向量的关联文本分类[J].计算机研究与发展,2009,46:464-469. 被引量:1
  • 10赵志宏,骆斌,林海.一种分类挖掘算法及其应用[J].南京大学学报(自然科学版),2001,37(2):142-147. 被引量:1

二级参考文献35

共引文献104

同被引文献8

引证文献2

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部