期刊文献+

一个面向大规模数据库的数据挖掘系统 被引量:28

A Data Mining System for Very Large Databases
下载PDF
导出
摘要 数据挖掘融合了数据库技术、人工智能和统计学,是目前的研究热点.为了能够集成当前数据挖掘的主要技术并使它们协同工作,在进行数据挖掘基本算法研究的基础上研制开发了一个数据挖掘系统——Golden-Eye.系统实现了在数据挖掘研究中的一些最新成果,集成了泛化、数据清洗这两个数据准备操作以及关联规则发现、例外规则发现、时序模式发现、分类器构造、聚类分析等基本数据挖掘操作,并实现了对挖掘操作的基本管理和结果的图形化显示.整个框架设计充分体现了系统的完整性、协调性和高效性:自底向上将存储控制模块、数据预处理模块、挖掘操作模块、挖掘库管理模块有机地结合在一起,在底层实现了对包括中间结果在内的数据的统一管理,在上层为用户提供了可视化的界面.实验结果表明,该系统能够在大规模数据库上成功地完成用户所指定的数据挖掘操作. Data mining is a hotspot that combines the techniques in databases, artificial intelligence and statistics areas. On the basis of the research on some data mining algorithms and their implementation, a data mining system, Golden-Eye, is developed to incorporate primary data mining techniques and coordinate their operations. As the integration of several existing techniques including some improved algorithms as well as some newly proposed operations in data mining area, the system implements a wide spectrum of data mining functions such as generalization, data cleaning, association rule mining, exception rule mining, sequential pattern mining, classification and clustering. By tightly integrating different functional modules such as storage management, data preprocessing, mining operations and mining base management, the system succeeds in managing all kinds of data including midterm results uniformly and providing a user-friendly, visualized interface, which makes Golden-Eye a complete and efficient system with good performance. Experimental results show that the system can successfully fulfill the mining tasks specified by users on very large databases.
出处 《软件学报》 EI CSCD 北大核心 2002年第8期1540-1545,共6页 Journal of Software
基金 ~~国家自然科学基金资助项目(60003016) 国家重点基础研究发展规划973资助项目(G1998030414)
关键词 大规模数据库 数据挖掘系统 数据预处理 存储控制 知识发现 data mining system data preprocessing storage control mining base
  • 相关文献

参考文献16

  • 1[1]Carter, C.L., Hamilton, H.J. Efficient attribute-oriented algorithms for knowledge discovery from large databases. IEEE Transactions on Knowledge and Data Engineering, 1998,10(2):193~208. 被引量:1
  • 2[2]Kukich, K. Techniques for automatically correcting words in text. ACM Computing Surveys, 1992,24(4):377~439. 被引量:1
  • 3[3]Tian, Zeng-ping, Lu, Hong-jun, Ji, Wen-yun, et al. An n-gram-based pproach for detecting approximately duplicate database records. International Journal on Igital Library, 2001,5(3):325~331. 被引量:1
  • 4[4]Agrawal, R., Srikant, R. Fast algorithms for mining association rules in large databases. In: Proceedings of the VLDB. 1994. 487~499. 被引量:1
  • 5[5]Yu, Fang, Jin, Wen. An effective approach to mining exeption class association rules. In: Proceedings of the Web-Age Information Management 2000. 2000. 145~150. 被引量:1
  • 6[6]Agrawal, R., Srikant, R. Mining sequential patterns. In: Proceedings of the ICDE. 1995. 3~14. 被引量:1
  • 7[7]Agrawal, R., Ghosh, S., Imielinski, T., et al. An interval classifier for database mining applications. In: Proceedings of the VLDB. 1992. 560~573. 被引量:1
  • 8[8]Zhou, Ao-ying, Qian, Wei-ning, Qian, Hai-lei, et al. A hybrid approach to clustering in very large databases. In: Proceedings of the 5th PAKDD. 2001. 519~524. 被引量:1
  • 9[9]Ester, M., Kriegel, H.P., Sander, J., et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the KDD. 1996. 226~231. 被引量:1
  • 10[10]Zhou, Ao-ying, Zhou, Shui-geng, Cao, Jing, et al. Approaches for scaling DBSCAN algorithm to large spatial databases. Journal of Computer Science and Technology, 2000,15(6):509~527. 被引量:1

共引文献4

同被引文献200

引证文献28

二级引证文献105

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部