摘要
目前,大多数孤立点检测算法仅仅考虑了数据集本身,而没有考虑数据集所蕴涵的语义知识。本文我们通过分析隐藏在Web日志中的语义知识来进行孤立点检测,提出了一种基于语义的孤立点挖掘方法。该方法基于Web日志中记录的各个项满足的数值关系来分析其中隐含的语义信息,并根据这些语义信息的重要性给出一个综合衡量其相关性的指标。实验结果表明,该方法是可行的、有效的。
Existing proposals on outlier detection didnt take the semantic knowledge of the dataset into consideration. They only tried to find outliers from dataset itself, which prevents finding more meaningful outliers. In this paper, we consider the problem of outlier detection integrating semantic relations hidden in Web logs. We give a new definition of semantic outlier. A measure for identifying the degree of each object being an outlier is presented, which is called Likelihood of Semantic Outlier (LSO). A semantic outlier is a data point, which behaves differently with other data points in the same cluster, while looks normal with respect to data points in another cluster. An efficient algorithm for mining semantic outliers based on LSO is also proposed. The effectiveness of the algorithm is demonstrated on the real data, and the experimental results show that the proposed algorithm is efficient and effective.
出处
《内蒙古煤炭经济》
2011年第7期19-21,共3页
Inner Mongolia Coal Economy