摘要
Web页面包含复杂的、无结构的、动态的数据信息,包含大量的、不完全的、有噪声的、模糊的、随机的数据,干扰了正常的提取过程。为此提出一种改进Apriori算法的海量Web数据高效挖掘方法。在自然连接产生候选集以前先进行一个修剪过程,减少参加连接的项集数量,因而减小生成的候选项集规模,减少了循环迭代次数和运行时间,同时在连接判断步骤中减少多余的判断次数。实验表明,该方法能够迅速排除冗余数据干扰,提高了挖掘的准确性。
SWeb page includes complex,no structure,dynamic data information,contains a large amount of,incomplete,noisy,and fuzzy,random data.Interfere with normal extracting process.Therefore proposed an improved algorithm of mass Apriori web data efficient mining method.In the natural connection to generate candidate set before a clip process,reduce the number of items in connection with,and reduce the candidate itemsets generated scale,reduce the iterative times and running time,at the same time in the connection of the redundant judgement steps to reduce number of judgment.The experiment results show that the method can quickly ruled out redundant data interference,improve the accuracy of the mining.
出处
《科技通报》
北大核心
2012年第12期161-163,共3页
Bulletin of Science and Technology
基金
国家自然科学基金项目(633442)