期刊文献+

海量文本数据库中的高效并行频繁项集挖掘方法 被引量:2

An Efficient Method for the Parallel Mining of Frequent Itemsets in Very Large Text Databases
下载PDF
导出
摘要 针对大规模文本数据库中频繁项集挖掘的特殊要求,本文提出了一种新的并行挖掘算法parFIM。parFIM以一种简单的数据结构H-Struct为基础,对数据进行纵向划分从而实现并行挖掘。算法同时考虑了去除短模式和减少重复模式。实验结果表明,parFIM能够很好地适用于大规模文本数据库中的频繁项集挖掘任务。 Frequent itemset mining is a common and useful task in data mining. It is also important in text mining. But most of the current mining algorithms can not be used in very large text databases. In order to solve the special problems in frequent itemsets mining in very large text databases,we propose a new parallel mining algorithm parFIM. Based on a simple data structure H-Struct, parFIM mines in parallel by partitioning data vertically. Removing short patterns and reducing duplicated patterns are also considered. Our experiment shows parFIM can suit the frequent itemset mining task well in very large text databases.
出处 《计算机工程与科学》 CSCD 2007年第9期110-113,119,共5页 Computer Engineering & Science
基金 国家863计划资助项目(2004AA112020 2003AA115210 2003AA111020)
关键词 文本挖掘 海量文本数据库 频繁项集 并行 text mining very large text database frequent itemset parallel
  • 相关文献

参考文献9

  • 1Antonie M L,Za¨iane O R.Text Document Categorization by Term Association[A].Proc of the IEEE 2002 Int'l Conf on Data Mining[C].2002.19-26. 被引量:1
  • 2Beil F,Ester M,Xu X.Frequent Term-Based Text Clustering[A].Proc of the Int'l Conf on Knowledge Discovery and Data Mining[C].2002.436-442. 被引量:1
  • 3Agrawal R,Srikant R.Fast Algorithms for Mining Association Rules[A].Proc of the 20th Int '1 Conf Very Large Data Bases[C].1994.487-499. 被引量:1
  • 4Zaki M J,Hsiao C J.CHARM:An Efficient Algorithm for Closed Itemset Mining[A].Proc of the 2nd SIAM Int'l Conf on Data Mining[C].2002.12-28. 被引量:1
  • 5Han J,Pei J,Yin Y.Mining Frequent Patterns Without Candidate Generation[A].Proc of the Special Interest Group on Management of Data[C].2000.1-12. 被引量:1
  • 6Pei J,Han J,Lu H,et al.H-Mine:Hyper-Structure Mining of Frequent Patterns in Large Databases[A].Proc of the 2001IEEE ICDM Conf[C].2001. 被引量:1
  • 7Agrawal R,Shafer J.Parallel Mining of Association Rules[J].IEEE Trans on Knowledge and Data Engineering,1996,8(6):962-969. 被引量:1
  • 8Zheng Z,Kohavi R,Mason L.Real World Performance of Association Rule Algorithms[A].Proc of KDD'01[C].2001. 被引量:1
  • 9Oracle Text 10g Technical Overview[EB/OL].http://www.oracle.com/technology/products/text/x/ 10g _ tech _ overview.html,2005-10. 被引量:1

同被引文献15

引证文献2

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部