期刊文献+

结合语义与统计的特征降维短文本聚类 被引量:7

Feature Dimension Reduction Short Text Clustering Combined with Semantic and Statistics
下载PDF
导出
摘要 为解决文本聚类时文本的高维稀疏性问题,提出一种语义和统计特征相结合的短文本聚类算法。该算法通过语义词典对词汇的语义相关性分析实现一次降维,结合统计方法进行特征选择实现二次降维,并融合二次降维特征实现短文本聚类。实验结果表明,该算法具有较好的短文本聚类效果和效率。 The primary difficulty of text clustering lies in the multi-dimensional sparseness of texts. A short text clustering algorithm which takes semantic and statistic features into account is proposed. A dimensionality reduction is achieved via the semantic relativity analysis of lexical semantics by semantic dictionary. The second dimension reduction is completed after a feature selection through statistical methods. The short text clustering is obtained with the combination of the two reductions. Experimental result shows that the algorithm has better clustering effect and efficiency on short text.
出处 《计算机工程》 CAS CSCD 2012年第22期171-175,共5页 Computer Engineering
基金 国家“863”计划基金资助项目(2011AA010704,2012AA011004) 清华大学自主科研基金资助项目“跨媒体分布式垂直搜索及舆情分析的关键技术”(20111081023)
关键词 特征选择 聚类 短文本 向量空间模型 语义 降维 feature selection clustering short text Vector Space ModeI(VSM) semantic dimension reduction
  • 相关文献

参考文献16

  • 1Hotho A, Maedche A, Staab S. Ontologies Improve Text Document Clustering[C]//Proc. of the IEEE International Conference on Data Mining. Melbourne, Australia: [s. n.], 2003: 541-544. 被引量:1
  • 2Choudhary B, Bhattacharyya P. Text Clustering Using Semantics[C]// Proc. of the llth International World Wide Web Conference. Hawaii, USA: [s. n.], 2002. 被引量:1
  • 3赵鹏,耿焕同,蔡庆生.一种基于语义和统计特征的中文文本特征表示方法[J].小型微型计算机系统,2007,28(7):1311-1313. 被引量:8
  • 4谭松波,王月粉,中文文本分类语料库--TanCorp V1.O[EB/OL].(201O-05-18).http://www.searchforum.org.cn/tansongbo/corpus.htm. 被引量:1
  • 5Rogati M, Yang Yiming. High-performing Feature Text Classification[C]//Proc. of the llth ACM Conference on Information and Knowledge New York, USA: ACM Press, 2002: 659-661. 被引量:1
  • 6Makrehchi M, Kamel M S. Text Classification Selection for International Management. Using SmallNumber of Features[C]//Proc. of the 4th International Conference on Machine Leaming and Data Mining in Pattern Recognition. [S. 1.]: ACM Press, 2005:580-589. 被引量:1
  • 7Mladenic D, Brank J, Grobelnik M, et al. Feature Selection Using Linear Classifier Weights: Interaction with Classification Models[C]//Proc. of the 27th ACM International Conference on Research and Development in Information Retrieval. [S. 1.]: ACM Press, 2004: 234-241. 被引量:1
  • 8王博..文本分类中特征选择技术的研究[D].国防科学技术大学,2009:
  • 9陈彬,洪家荣,王亚东.最优特征子集选择问题[J].计算机学报,1997,20(2):133-138. 被引量:96
  • 10陈友,程学旗,李洋,戴磊.基于特征选择的轻量级入侵检测系统[J].软件学报,2007,18(7):1639-1651. 被引量:78

二级参考文献33

共引文献207

同被引文献56

  • 1中国互联网络信息中心.第32次《中国互联网络发展状况统计报告》[EB/OL]. http://www. cnnic. net. cn. 被引量:4
  • 2Yan Tao,Wang Xiwei. Feature extension for short text [C ]//Proceedings of the Third International Symposium on ComputerScience and Computational Technology.Jiaozuo: ACM,2010: 338-341. 被引量:1
  • 3Hu Xia,Sun Nan,Zhang Chao,et al. Exploiting in- ternal and external semantics for the clustering of short texts us- ing world knowledge [C ]//Proceedings of the 18th ACM Confer- ence on Information and Knowledge Management. New York: ACM, 2009 : 919-928. 被引量:1
  • 4Shah Lu, Cuiyou Yao. The Research of Internet Public Opinion' s Tracking Algorithm [ C ]//Electric Information and Control Engineering(ICEICE), 2011: 5536-5538. 被引量:1
  • 5Liu Hong. Intemet public opinion hotspot detection and analysis based on Kmeans and SVM algorithm [C]//ISME, 2010,Vol.1:257-261. 被引量:1
  • 6Shuangyong Song,Qiudan Li,Xiaolong Zheng. De- tecting Popular Topicsin Micro-blogging Based on a User Inter- est-Based Model [C]//NeuralNetworks (IJCNN), the 2012 In- ternational Joint Conference, Brisbane, QLD, 2012. 被引量:1
  • 7Feifei Peng,Xu Qian,Gaoren Li. A Research of Hot Topic DetectionthroughMieroblogging [C]//In the 4th Interna- tional Conference onIntelligent Human-Machine Systems and Cybernetics.IEEE, 2012. 被引量:1
  • 8Jaime T,Daniel R,Ringel M M.TwitterSearch : A comparison of microblog search and Web search [C]//King I, Nejdl W,Li Hang. Proceedings of the Fourth ACM International Conference on Web Search and Data Mining. New York :ACM, 2011 : 35-44. 被引量:1
  • 9张鲁民,贾焰,周斌.基于情感计算的微博突发事件检测方法研究[J].第27次全国计算机安全学术交流会论文集,2012(8):143-145. 被引量:1
  • 10乔良.文本挖掘技术研究及其在信息检索中的应用[J].软件导刊,2009,8(4):160-161. 被引量:7

引证文献7

二级引证文献40

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部