摘要
Web日志分析系统不仅能改进Web网站结构,提高Web服务器性能,而且能识别用户的喜好、满意度,发现潜在用户,增强网站服务竞争力。介绍了Web日志挖掘的各个阶段,设计并实现了一个Web日志分析系统。分析了传统的频繁项集挖掘算法与序列模式挖掘算法的不足之处,根据日志数据的特性,将用户属性引入频繁项目集的生成过程,有效地减少了候选项集的数目,并根据候选集的特点,逐轮压缩数据库。将连续序列引入到ApiroriAll算法的候选集合并过程中,实现了改进算法。通过实验比较了改进算法与传统算法的效率,证明了改进算法的有效性。
Web log analysis system can not only improve the Web site structure and improve Web server performance,but also identify the user's preferences,satisfaction,identify potential customers and enhance the competitiveness of Web services.The stages of Web log mining are described,and a Web log analysis system is designed and implemented.The shortcomings of traditional frequent itemsets mining algorithm and sequential pattern mining algorithm are analyzed.According to the characteristics of log data,the user attributes are added into the generation process of frequent item sets,effectively reducing the number of candidate items.According to the characteristics of the candidate set,by round of compressed database.ApiroriAll continuous sequence introduced into the algorithm and the process of candidate set.An improved algorithm is implemented.In the experiment,the efficiency of improved algorithm and traditional algorithm is compared,the effectiveness of the improved algorithm is proved.
出处
《计算机技术与发展》
2011年第9期211-215,共5页
Computer Technology and Development
基金
湖北省自然科学基金项目(2010CDB11102)
关键词
日志分析
数据预处理
频繁项目集
序列模式
log analysis
data preprocessing
frequent itemsets
sequential patterns