摘要
针对大多数离群数据检测方法依赖于用户确定参数以及维灾现象,给出了一种基于基尼指标加权的离群子空间与离群数据挖掘方法。该方法通过计算各个维上去一划分的基尼指标值来生成数据对象的离群子空间及属性权向量,在子空间中采用基于统计离群数据挖掘的思想来挖掘离群数据;不需输入参数,结果更具客观性,并且能够适应高维离群数据挖掘;最后采用恒星光谱数据集,验证了可行性和有效性。
For effect of the parameters that are artificially set in outlier mining algorithm and Dimension disaster phenomenon,Outlier subspace and outlier mining algorithm based on weighted Gini index are presented.The outlier subspace and attribute weighted vectors of the data sets are obtained by using Gini index value on every dimension,then outliers are mined by adopting statistics idea.Because the parameters are not artificially input,the effect of anthropogenic factor to the outlier mining result is avoided and can effectively respond to high dimension outlier mining.In the end,the experimental results validate the feasibility and efficiency of the algorithm by adopting the spectrum data sets.
出处
《电脑开发与应用》
2012年第10期35-37,共3页
Computer Development & Applications