摘要
目前用于用户会话识别的方法主要有两类:基于时限的会话识别和与拓扑结构(超链接)结合的会话识别,这两类方法都是在用户识别的基础上对用户活动作出猜测而得到的。该文提出了一套用于对这些启发式方法所获得的数据的准确程度进行量化的评测系统,不同的估测方法反映不同的数据挖掘应用的需要。最后通过一个实际站点的数据说明了评测系统的识别结果是准确的。
This paper describes timeout-based sessionizing mechanisms and topology-aware heuristics which now used to identify user sessions. The Sessionizing tools are based on heuristic rules and on assumptions about the sites usage ,and therefore prone to error.The paper proposes a formal framework composed of a set of measures for the evaluation the accuracy of sessionizing tools.The different measures reflect the requitements of different web usage analysis applications.Experiment using the log data of a real web site shows the use of the measures.
出处
《电子科技大学学报》
EI
CAS
CSCD
北大核心
2002年第3期281-285,共5页
Journal of University of Electronic Science and Technology of China
关键词
WEB
会话识别
准确度
计算机网络
Web usage mining
identifying session
log
heuristic rule