期刊文献+

基于核主成分分析与小波变换的高质量微博提取 被引量:5

High Quality Microblog Extraction Based on Kernel Principal Component Analysis and Wavelet Transformation
下载PDF
导出
摘要 在线社交媒体中存在大量的噪音和冗余信息,为对其进行过滤和筛选,获取高质量的信息,提出基于核主分析和小波变换的高质量微博提取框架,并设计一种基于多特征融合的高质量信息的提取算法,将信息特征转换到小波域以更好地捕获信号间的细节差异。利用最大期望算法度量各个特征的权值,进一步融合得到特征综合值。为降低噪声特征对信息质量提取的影响并提高算法运算速度,引入核主成分分析对特征进行变换。实验结果表明,该框架能够提取出更高质量的微博,并且大幅减少运算时间。 Massive social event relevant messages are generated in online social media,which makes the filtering and screening of them be a challenge.In order to obtain massages with high quality,a high quality information extraction framework based on Kernel Principal Component Analysis and Wavelet Transformation(KPCA-WT) is proposed.Based on multiple features fusion,the paper designs an algorithm to extract the microblogs of high quality,which transforms the features into wavelet domain to capture the details differences between the feature signals.The features weights are evaluated by employing Expectation Maximization(EM) algorithm and fused further to get a comprehensive value of each message,in order to reduce the effect of noise features,and to speed up the operation,the features are transformed through KPCA.Experimental results show that the proposed framework can extract information with higher quality and greatly reduce the time consumption.
出处 《计算机工程》 CAS CSCD 北大核心 2016年第1期180-186,共7页 Computer Engineering
基金 国家自然科学基金资助项目(61472291 61303115) 2013年深圳知识创新计划基础研究基金资助项目
关键词 信息提取 特征融合 小波变换 期望最大算法 核主成分分析 information extraction feature fusion wavelet transformation Expectation Maximization(EM) algorithm Kernel Principal Component Analysis(KPCA)
  • 相关文献

参考文献25

  • 1Jindal N,Liu B.Opinion Spam and Analysis[C]//Proceedings of the 2008 International Conference on Web Search and Data Mining.Los Angeles,USA:ACM Press,2008:219-230. 被引量:1
  • 2Lim E P,Nguyen V A,Jindal N,et al.Detecting Product Review SpammersUsing Rating Behaviors[C]// Proceedings of the 19th ACM International Conference on Information and Knowledge Management.Toronto,Canada:ACM Press,2010:939-948. 被引量:1
  • 3Becker H,Naaman M,Gravano L.Selecting Quality Twitter Content for Events[C]//Proceedings of the 5th International AAAI Conference on Weblogs and Social Media.Barcelona,Spain:AAAI Press,2011:442-445. 被引量:1
  • 4Choudhury M D,Counts S,Czerwinski M.Find Me the Right Content! Diversity-based Sampling of Social Media Spaces for Topic-centric Search[C]//Pro-ceedings of the 5th International AAAI Conference on Weblogs and Social Media.Barcelona,Spain:AAAI Press,2011:129-136. 被引量:1
  • 5Sharifi B,Hutton M A,Kalita J K.Experiments in Micro-blog Summarization[C]//Proceedings of the 2nd Inter-national Conference on Social Computing.Minneapolis,USA:IEEE Press,2010:49-56. 被引量:1
  • 6Ramage D,Dumais S,Liebling D.Characterizing Micro-blogs with Topic Models[C]//Proceedings of the International AAAI Conference on Weblogs and Social Media.Barcelona,Spain:AAAI Press,2010:130-137. 被引量:1
  • 7Xia Wei,He Yanxiang,Tian Ye,et al.Feature Expansion for Microblogging Text Based on Latent Dirichlet Allo-cation with User Feature[C]//Proceedings of the 6th Joint International Technology and Artificial Intelligence Conference.Chongqing,China:[s.n],2011:228-232. 被引量:1
  • 8Titov I,McDonald R.ModelingOnline Reviews with Multi-grain Topic Models[C]//Proceedings of the 17th International Conference on World Wide Web.Beijing,China:[s.n.],2008:111-120. 被引量:1
  • 9Li P,Jiang J,Wang Y.Generating Templates of Entity Summaries with an Entity-aspect Model and Pattern Mining[C]//Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics.Uppsala,Sweden:Association for Computational Linguistics,2010:640-649. 被引量:1
  • 10Daubechies I.TenLectures on Wavelets[M].Philadelphia,USA:[s.n.],1992:213-222. 被引量:1

二级参考文献9

共引文献143

同被引文献21

引证文献5

二级引证文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部