期刊文献+

多语言网络新闻媒体的聚类分析 被引量:1

Multilingual Document Clustering based on Internet News
原文传递
导出
摘要 互联网已经成为人们发布、获取、共享信息的首选方法,大量多语言媒体信息蕴含着人们关注的热点话题及情感倾向。因此,多语言文本聚类研究对于了解民意倾向、引导舆论具有重要意义。文中提出融合时间影响因子的多语言文本复合聚类算法,用以研究互联网环境下,时间维度对聚类分析的影响。通过采集网络媒体英语、西班牙语、德语、法语新闻信息4000多条,实验证实,该算法取得了较好的聚类效果。 The Internet has become the preferred method for people to release, access and share information. Most muhilingual information contains hot topics and emotional tendencies concerned by the people. Therefore, multi- language text clustering research is important in understanding the tendency of the public and guiding the public opinion as well. This paper proposes the integration of time variable in the complex multi-language text clustering algorithm for better understanding the impact of time dimensions on the cluster analysis. The experiments by collecting more than 4000 pieces of English, Spanish, German, French news from authoritative online media confirm that the proposed clustering algorithm could achieve fairly good results significance.
出处 《信息安全与通信保密》 2014年第5期103-107,110,共6页 Information Security and Communications Privacy
基金 国家科技支撑计划资助项目(编号:2012BAH38B04) 西安交通大学机械制造系统工程国家重点实验室开放课题(编号:sklms2012005)
关键词 多语言文本 文本聚类 时间因子 复合聚类算法 multi-language text text clustering time variable composite clustering algorithm
  • 相关文献

参考文献8

  • 1Top Ten I_anguages Used in the Web[EB/OL]. [2011-5-31]. http: //www.internetworldstats.com/stats7.htm. 被引量:1
  • 2MONTALVO S, MARTINEZ R, CASILLAS A, et ai. Multilingual Document Clustering: an Heuristic Approach Based on Cognate Named Entities[C]//Annual Meeting-Association for Computational Linguistics. 2006, 44(2) : 1145. 被引量:1
  • 3CHEN H H, LIN C J. A Multihngual News Summarizer[C]// Proceedings of the 18th Conference on Computational Linguistics-Volume 1. Association for Computational Linguistics, 2000: 159-165. 被引量:1
  • 4LAWRENCE J L. Newsblaster Russian-English Clustering Performance Analysis. Columbia Computer Science Technical Reports[J]. 2003. 被引量:1
  • 5WU K, LU B L. Cross-lingual Document Clustering[M]// Advances in Knowledge Discovery and Data Mining. Springer Berlin Heidelberg, 2007: 956-963. 被引量:1
  • 6WEI C P, YANG C C, LIN C M. A Latent Semantic Indexing- based Approach to Multilingual Document Clustering[J]. Decision SupportSystems, 2008, 45(3): 606-620. 被引量:1
  • 7Translator language Codes[EB/OL]. [January 2012]. http: // msdn.microsoft.corn/en-us/library/hh456380.aspx. 被引量:1
  • 8KCIS社会舆情2011年度观察[EB/OL].http://special.kdnet.net/2011data/KCIS_2011.pdf. 被引量:1

同被引文献2

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部