摘要
传统基于离群度的坏点检测方法,无法解决小数据冲突过程中存在的震荡波动以及数据特征不明显的问题,获取的坏点结果存在较大的偏差。提出了一种基于小数据冲突检测的坏点数据挖据模型,通过小区域异常因子LOD描述小数据冲突数据集中不同数据对象的局部异常程度,采用小数据冲突数据的邻域查询优化算法,获取初步坏点数据集,通过运算小区域异常因子的方法,在邻域搜索优化后获取的对象邻域中,基于两个对象间的加权考斯基距离,采用去一划分信息熵增量获取小数据冲突对象的权值,运算初步坏点数据集中小数据冲突对象的损坏程度,获取小数据冲突中的坏点数据。实验结果说明,所提方法在挖据小数据冲突中的坏点数据过程中,在繁琐度和差异性方面较传统模型都具有较高的优越性。
Traditional bad point detection method based on outlier degree, can solve the small dataconflicts exist in the process of shock wave and the characteristics of the data is not obvious, to get theresults of the bad points larger deviation. Data collision detection is proposed based on a small bad pointdata digging, according to the model through small regional anomaly factor LOD description data conflictdegree of local anomalies data set different data object, USES the small neighborhood of data conflictquery optimization algorithm, to obtain the preliminary bad point data sets, through operation smallregional anomaly factor method, after the neighborhood search optimization to obtain the field of object,based on the weighted exam, distance between two objects, to adopt a classified information entropyincrement to obtain small data conflict object weight, initial bad point data set is small operation dataconflict the damage of the object, obtain small data conflict bad point in the data. Experimental resultsindicate that the proposed method in the dug according to small bad points in the process of the data inthe data conflict, in terms of complicated degree and difference compared with traditional model has theadvantages of high.
出处
《科技通报》
北大核心
2015年第1期213-216,共4页
Bulletin of Science and Technology
关键词
小数据
冲突检测
坏点数据
挖据模型
small data
conflict detection
bad point data
digging up according to the model