摘要
为了降低Web日志数据的规模,并能从预处理后的数据中发现更有价值的访问模式,在引入知识的信息量的基础上,给出了单个属性相对于属性集的重要性量化值的概念,并采用了操作系统中LRU页面置换算法的思想,提出了基于属性重要性的WUM数据预处理方式。实验证明:该方式可以删除不具有挖掘价值的、因用户短期行为而访问的Web日志记录,剔除掉噪音数据,从而有效减小了日志挖掘的复杂度。
To reduce the Web log data scale and discover more recommendable access patterns from data preprocessed,with knowledge based on amount of information,the concept of quantify value of importance of every property in relation to property set was proposed,and used the idea of LRU page replacement algorithm in the operating system,a new data preprocessing method based on importance of property was proposed.The experiments show that the method could delete Web log records which were caused by user short-behavior and have not mining value,and filter out the noise data.Accordingly it can reduce the complexity of log mining effectively.
出处
《计算机系统应用》
2011年第5期219-222,247,共5页
Computer Systems & Applications
基金
安徽科技学院青年基金(ZIC2011117)
安徽科技学院教研课题(X201014)