摘要
对高维数据空间中维数对最近邻查询结果的影响作了研究,提出了对这种影响的评估方法,基于统计学,证明了在一定条件下,相似性查询的不稳定性,以及其随维数的增加恶化程度的分布规律。给出了两个关于距离的统计量的分布,可以对最近邻查询问题进行理论估计,并通过实验结果验证了理论的正确性。
This paper explores the effect of dimensionality on the "nearest neighbor" problem. Based on statistics, it shows that under some conditions, as dimensionality increases, the distances between query point and data points approach to each other. So the "nearest neighbor" is becoming meaningless. The way of how to evaluate the dimensionality effect is presented. From two distributions of statistics about distance, the effect of dimensionality on the "nearest neighbor" problem is evaluated. Empirical result is presented to demonstrate the two distributions.
出处
《计算机工程》
EI
CAS
CSCD
北大核心
2006年第21期6-8,共3页
Computer Engineering
关键词
不稳定性
统计
维数灾难
相似性
最近邻
Instability
Statistics
Dimensionality curse
Similarity
Nearest neighbor