基于语义的孤立点检测

Based on the semantic outlier detection

下载PDF

导出

摘要目前,大多数孤立点检测算法仅仅考虑了数据集本身,而没有考虑数据集所蕴涵的语义知识。本文我们通过分析隐藏在Web日志中的语义知识来进行孤立点检测,提出了一种基于语义的孤立点挖掘方法。该方法基于Web日志中记录的各个项满足的数值关系来分析其中隐含的语义信息,并根据这些语义信息的重要性给出一个综合衡量其相关性的指标。实验结果表明,该方法是可行的、有效的。 Existing proposals on outlier detection didnt take the semantic knowledge of the dataset into consideration. They only tried to find outliers from dataset itself, which prevents finding more meaningful outliers. In this paper, we consider the problem of outlier detection integrating semantic relations hidden in Web logs. We give a new definition of semantic outlier. A measure for identifying the degree of each object being an outlier is presented, which is called Likelihood of Semantic Outlier （LSO）. A semantic outlier is a data point, which behaves differently with other data points in the same cluster, while looks normal with respect to data points in another cluster. An efficient algorithm for mining semantic outliers based on LSO is also proposed. The effectiveness of the algorithm is demonstrated on the real data, and the experimental results show that the proposed algorithm is efficient and effective.

作者樊世财

机构地区神东煤炭集团车辆管理中心

出处《内蒙古煤炭经济》 2011年第7期19-21,共3页 Inner Mongolia Coal Economy

关键词语义孤立点用户查询行为 LSO WEB日志 semantic outlier user query behavior LSO Web logs

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献1

1ZHOU Hongfang,FENG Boqin,LU Lintao.Mining Representative Subset Based on Fuzzy Clustering[J].Wuhan University Journal of Natural Sciences,2007,12(5):799-803. 被引量：1

二级参考文献12

1ZHOUHong-fang,FENGBo-qin,HEIXin-hong,LULin-tao.Mining Interesting Knowledge from Web-Log[J].Wuhan University Journal of Natural Sciences,2004,9(5):569-574. 被引量：1
2Hochbaum D S,Pathria A.Analysis of the Greedy Approach in the Problems of Maximum k-Coverage[].Naval Research Logistics.1998 被引量：1
3Kannan R,Vempala S,Vetta A.On Clusterings: Good, Bad, and Spectral[].ACM.2004 被引量：1
4Pan Feng,Wang Wei,Anthony K H, et al.Finding Represen- tative Set from Massive Data[].Proceedings of the Fifth IEEE International Conference on Data Mining.2005 被引量：1
5Kantardzic M.Data Mining Concepts, Models, Methods, and Algorithms[]..2003 被引量：1
6Zhou Hongfang,Feng Boqin,Lv Lintao, et al.LQRA: A New Method to Improve Web Searching Quality[].Proceedings of the th Joint International Computer Conference.2005 被引量：1
7Zhou Hongfang,Feng Boqin,Lv Lintao, et al.A New Integrated Personalized Recommendation Algorithm[].Proceedings of Computational Intelligence and Security.2005 被引量：1
8Ali K,Manganaris S,Srikant R.Partial Classification Using Association Rules[].Proc of the rd Int’l Conf on Knowledge Discovery in Databases and Data Mining.1997 被引量：1
9Clark P,Boswell P.Rule Induction with CN2: Some Recent Improvements[].Machine Learning: Proc of the Fifth European Conference.1991 被引量：1
10Dhar V,Tuzhilin A.Abstract-Driven Pattern Discovery in Databases[].IEEE Transactions on Knowledge and Data Engineering.1993 被引量：1

1周红芳,冯博琴,岳辉,吕林涛.基于语义模型的Web挖掘算法研究[J].哈尔滨工业大学学报,2009,41(11):212-214. 被引量：1
2周继恩,张春阳,刘贵全,蔡庆生.OLAP系统中用户浏览行为模型分析[J].小型微型计算机系统,2003,24(6):1025-1029. 被引量：1
3陈宇,陈治平.基于混沌神经网络模型的查询扩展[J].计算机应用,2007,27(8):2069-2071. 被引量：1
4乔丽,姜慧霖.一种基于用户查询行为模型的案例查询算法[J].计算机工程与应用,2012,48(6):139-142. 被引量：3
5明均仁.基于本体图的文本聚类模型研究[J].情报科学,2013,31(2):29-33. 被引量：6
6夏克寒,许化龙.弹载计算机中的实时操作系统研究[J].微电子学与计算机,2004,21(12):97-99. 被引量：3
7电脑装机方案推荐[J].计算机与网络,2010(22):17-17.
8蔡家盛.从打印速度、打印品质与整体成本三方面综合衡量我们如何测试低价激光打印机[J].电子测试,2002(2):69-69.
9杨文峰,李星.网络搜索引擎的用户查询分析[J].计算机工程,2001,27(6):20-21. 被引量：20
10黄海峰.NetEvents亚太峰会狮城开幕IoT、SDN、LSO受热议[J].通信世界,2015,0(15):35-36.

内蒙古煤炭经济

2011年第7期

浏览历史

内容加载中请稍等...

基于语义的孤立点检测

参考文献1

二级参考文献12

相关作者

相关机构

相关主题

浏览历史