期刊文献+

基于类间距离参数估计的文本聚类评价方法 被引量:6

Text Clustering Evaluation Method Based on Parameter Estimation of Distances Between Clusters
下载PDF
导出
摘要 文本聚类评价算法运用统计学当中的参数估计方法,根据类间距离信息对其分布规律中的数字特征进行参数估计。基于估计的结果确定类间距离合理的取值范围,将不合理的聚类进行调整,并通过聚类有效性判断函数最终确认调整结果。该算法有效地提高聚类结果的准确性,并为聚类算法的选择与分析提供一种可行的方法。实验结果证明了其可行性与有效性。 The evaluation method of text clustering uses the parameter estimation technique in statistics. It takes parameter estimation to estimate the numerical characteristics of the distances' distribution according to the data of distances between clusters. According to the results of the estimation, the logical range of the distances between clusters is worked out. And the clusters between which the distances are not in the range should be rectified. But the final result must be validated by cluster validity test function. The method improves the text clustering algorithm's precision, and also provides a feasible method to choose and compare between different clustering algorithms. The final experiment results indicate that the evaluation method is feasible and effective.
出处 《计算机工程》 CAS CSCD 北大核心 2009年第9期37-39,42,共4页 Computer Engineering
基金 国家"242"信息安全计划基金资助项目(2007A15) 黑龙江省博士后基金资助项目(323630185)
关键词 聚类分析 文本聚类 聚类评价 极大似然估计 clustering analysis text clustering clustering evaluation maximum likelihood estimation
  • 相关文献

参考文献9

二级参考文献39

  • 1Hatzivassiloglou V, Klavans J L, Holcombe M L, et al.Simfinder: A flexible clustering tool for surmnarization. In: Proceedings of the NAACI, 2001 Workshop on Automatic Surrunarization, Pittsburgh, PA, 2001, 41-49 . 被引量:1
  • 2Jain A K,Dubes R C. Algorithms for clustering data. Englewood Cliffs NJ, USA: Prentice Hall, 1988. 被引量:1
  • 3Sneath P H, Sokal R R. Numerical Taxonomy. London, UK:Freeman. 1973. 被引量:1
  • 4King B. Step-wise clustering procedures. Journal of the Amercian Statistical Association , 1967, 69(8) :86-101. 被引量:1
  • 5Guha S, Rastogi R, Shim K. CURE: An efficient clustering algorithm for large databases. Information Systems, 2001, 26( 1 ) : 35-58. 被引量:1
  • 6Guha S, Rastogi R, Shim K. ROCK: a robust clustering algorithm for categorical attributes. In : Proceedings of the 15th International Cotfference on Data Engineering. Sydney: IEEE Computer Society Press, 1999. 512-521. 被引量:1
  • 7Karypis G, Han E H, Kumar V. Chameleon: A hierarchical clustering algorithm using dynamic modeling. IEEE Computer, 1999, 32(8) :68-75. 被引量:1
  • 8Han E H, Karypis G,Kumar V, et al. Clustering based on association rule hypergraphs. In: 1997 SIG-MOD Workshop on Research Issues on Data Mining and Knowledge Discovery, Tucson, Arizona, USA, 1997. 9-13. 被引量:1
  • 9MacQueen J B. Some methods for classification and analysis of multivariate observations. In : Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley: University of California Press, 1967. 281-297. 被引量:1
  • 10Yunjae J . Design and evaluation of clustering criterion for optimal hierarchicalagglomerative clustering:[PhD. Thesis].Minneapolis, Minnesota, USA: Department of Computer Science, University of Minnesota, 2001. 被引量:1

共引文献277

同被引文献39

  • 1曹林,韩立新,吴胜利.元搜索引擎排序技术综述[J].计算机应用研究,2009,26(2):411-414. 被引量:26
  • 2张惟皎,刘春煌,李芳玉.聚类质量的评价方法[J].计算机工程,2005,31(20):10-12. 被引量:61
  • 3刘远超,王晓龙,刘秉权.一种改进的k-means文档聚类初值选择算法[J].高技术通讯,2006,16(1):11-15. 被引量:23
  • 4Kuramochi M, Karypis G. Frequent Subgraph Discovery[C]//Proc. of IEEE International Conference on Data Mining. San Jose. California, USA: [s. n.], 2001: 313-320. 被引量:1
  • 5Bebel B, Krolikowsld Z, Wrembel R. Formal Approach to Modeling a Multiversion Data Warehouse[J]. Bulletin of the Polish Academy of Sciences, 2006, 54( 1): 51-62. 被引量:1
  • 6Meidl W, Niederreiter H. Counting Functions and Expected Values for the k-error Linear Complexity[J]. Finite Fields and Their Applications, 2002, 8(2): 142-154. 被引量:1
  • 7Kim Yang Sok, Kang Byeong Ho, Compton P. Search Engine Retrieval of Changing Information[C]//Proc. of International World Wide Web Conference. Banff, Alberta, Canada:[s. n.], 2007:1195-1196. 被引量:1
  • 8KIM Yarn Sok, KANG Byeong Ho, COMPTON P, et al. Search engine retrieval of changing information[C]//Pro- eeedings of International Conference on World Wide Web, New York: ACM,2007:1195-1196. 被引量:1
  • 9Pickover C A. On the use of symmetrized dot pat- terns for the visual characterization of speech wave- forms and other sampled data. The Journal of the Acoustical Society of America, 1986,80 : 955-960. 被引量:1
  • 10Wua J D,Chuangb C Q. Fault Diagnosis of Internal Combustion Engines Using Visual Dot Patterns of A- coustic and Vibration Signals. NDT Int. , 2005,38 (8):605-614. 被引量:1

引证文献6

二级引证文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部