摘要
从Web日志数据中发现用户的频繁访问模式,可分为两步进行。首先把经过预处理后的日志数据转换为最大前向引用的集合,然后使用Apriori算法挖掘出频繁访问模式。本文针对挖掘的第二步提出了一种基于缩减数据库(ReducedDatabase)的RD_Apriori算法,此算法能够准确、高效地挖掘各种长度不同的频繁访问模式。
The process of discovering user frequent access patterns can be divided into two steps. First, the original sequence of log data preprocessed are converted into a set of maximal forward references. Second, using the algorithm similar to the Apriori to discover frequent access patterns. The paper focus on the second step of the mining, presents an algorithm based on reduced database called RD_Apriori which can discover frequent access patterns of different lengths exactly and effectively.
出处
《微电子学与计算机》
CSCD
北大核心
2005年第5期4-7,共4页
Microelectronics & Computer
基金
合肥工业大学科研发展基金项目(030503F)
关键词
WEB挖掘
访问模式
频繁访问模式
相邻访问模式
连续度
相邻访问模式集
Web mining, Access pattern, Frequent access pattern, Adjoining access pattern, Consecution, Set of adjoining access patterns