摘要
为了在Web使用挖掘中挖掘网站服务器日志数据库的热点Web页面访问集及发现其关联规则,提出了一种新的基于GEP(gene expression programming,基因表达式编程)的适用于挖掘多层关联规则的算法。将泛化技术应用于GEP作为它的适应性函数度量,引入GEP强大的自搜索功能,进化到较优的种群后,再利用传统的支持度-置信度的方法在子数据库的多个层及层间挖掘频繁项及关联规则。该算法改进了传统多层关联规则挖掘框架,实验结果表明了该算法在大数据库中的有效性和高效性。
To mine popular accessed web pages items and find out their association rule from the web server log database in WUM (web usage mining). A novel GEP-based algorithm for mining multiple-layers association rules is presented. Firstly, generalizing technology is taken as a way to value fitness function in GEP (gene expression programming). Then, relying on the significant self-search function of GEP, the most optional species is evolved. The frequent items and association rules in the next deeper layers can be mined by using traditional support-confidence method in sub-database . The algorithm improves on the frame of traditional association rule mining. Finally, the validity and efficiency of the presented method is demonstrated by the application in big database.
出处
《计算机工程与设计》
CSCD
北大核心
2010年第1期137-140,共4页
Computer Engineering and Design
基金
国家自然科学基金项目(60763012)
广西高等学校优秀人才计划基金项目(RC2007022)
广西研究生教育创新计划基金项目(2009106030774M03)