摘要
函数型数据的稀疏性和无穷维特性使得传统聚类分析失效。针对此问题,本文在界定函数型数据概念与内涵的基础上提出了一种自适应迭代更新聚类分析。首先,基于数据参数信息实现无穷维函数空间向有限维多元空间的过渡;在此基础上,依据变量信息含量的差异构建自适应赋权聚类统计量,并依此为函数型数据的相似性测度进行初始类别划分;进一步,在给定阈值限制下,对所有函数的初始类别归属进行自适应迭代更新,将收敛的优化结果作为最终的类别划分。随机模拟和实证检验表明,与现有的同类函数型聚类分析相比,文中方法的分类正确率显著提高,体现了新方法的相对优良性和实际问题应用中的有效性。
Traditional clustering methods fail to analyze functional data with sparse property and infinite dimensional structure. To solve the problem, this paper presents an algorithm of adaptive iterative updating clustering analysis after defining the concept and connotation of functional data. First, construct a finite-dimensional functional multivariate space from infinite-dimensional one by using the parameter information of original data; On this basis, build an adaptive weighted clustering statistics as the criteria for giving initial clusters according to the difference of the information content implied in variables; Further, under the given threshold limit, gather the convergence optimal result to define the final categories by updating the ascription of initial clusters with adaptive iterative process. Stochastic simulation and empirical test shows the correct classification rate of our method is significantly higher than other clustering methods, reflecting the superiority and the effectiveness of our method in practical application.
出处
《统计研究》
CSSCI
北大核心
2015年第4期91-96,共6页
Statistical Research
基金
国家社会科学基金重大项目“大数据与统计学理论的发展研究”(13&ZD148)阶段性研究成果
国家社会科学基金项目“金融高频数据挖掘方法及应用研究”(11BTJ001)
国家自然科学基金青年项目“基于非参数随机森林的分类预测方法及其应用”(710201139)资助
关键词
函数型数据分析
自适应权重
迭代更新
聚类分析
functional data analysis
adaptive weighting
iterative update
clustering analysis