期刊文献+

基于Twitter数据的地点分类方法研究

Research on Location Classification Based on Twitter Data
下载PDF
导出
摘要 城市化进程中,新的地点不断出现且地点类型不断更新,导致大量未知地点产生,为城市形态的理解和掌控造成障碍。本文综合多种空间分析及文本挖掘技术,创新性地融合Twitter数据中的时间记录与Tweets(用户在Twitter中发表的文本内容)用于地点分类。设计抽取精细的人群活动的时空-内容信息的方法,并通过监督学习方法,利用少量标记样本,自动识别未知地点的类型。最终识别出教育、娱乐、商店、社会服务、交通五种类型的地点,整体精度达67.6%,表明方法的可行性,为社交数据在地点分类研究中的有效利用提供了新的思路。 In the process of urbanization,new locations constantly appear and the categories of locations are usually updated,resulting in great number of unknown locations and obstacles to the comprehension and grasp of urban functional structures.In this paper,various spatial analysis and text mining techniques are combined to integrate time records and Tweets(text content published by users in Twitter)in Twitter data for location classification innovatively.A method for extracting the detailed spatiotemporal-content information from crowds’activities is designed,and supervised learning techniques are utilized to automatically identify the type of unknown locations using a small number of labeled samples.At last,five types of locations including education,entertainment,shops,social services,and transportation are identified,with an overall accuracy of 67.6%,which shows the feasibility of this method and provides a novel idea for the effective application of social media data in location classification researches.
作者 邱小宇 林杰 Qiu Xiaoyu;Lin Jie(School of Earth Sciences,Zhejiang University,Hangzhou City,Zhejiang 310027,China)
出处 《科技通报》 2020年第4期67-71,共5页 Bulletin of Science and Technology
基金 国家自然科学基金项目(41501423)
关键词 地点分类 社交网络数据 Twitter数据 空间分析 文本挖掘 location classification social media data Twitter data spatial analysis text mining
  • 相关文献

参考文献3

二级参考文献36

  • 1王妙娅.国内图书馆微博应用现状及建议[J].图书馆学研究(应用版),2010(12):37-41. 被引量:123
  • 2PARK E K, RA D Y, JANG M G. Techniques for improving Web retrieval effectiveness[J]. Information Processing Management, 2005, 41(5): 1207 -1223. 被引量:1
  • 3LIU W Y, HAO T Y, CHEN W, et al. A Web-based platform for user-interactive question-answering[J]. World Wide Web, 2009, 12(2): 107 -124. 被引量:1
  • 4SALTON G, WONG A, YANG C S. A vector space model for auto-matic indexing[J]. Communications of the ACM, 1975, 18 ( 11) : 613 -620. 被引量:1
  • 5PHAN X H, NGUYEN M L, HORIGUCHI S. Learning to classify short and sparse text & Web with hidden topics from large-scale data collections[C] / / Proceedings of the 17 th Conference on World Wide Web. New York: ACM, 2008: 91 -100. 被引量:1
  • 6WANG L, JIA Y, HAN W H. Instant message clustering based on extended vector space model[C] / / Proceedings of the 2nd Interna-tional Conference on Advances in Computation and Intelligence. Berlin: Springer-Verlag, 2007: 435 - 443. 被引量:1
  • 7SAHAMI M, HEILMAN T D. A Web - based kernel function for measuring the similarity of short text snippets[C] / / Proceedings of the 15th Conference on World Wide Web. New York: ACM, 2006: 377 -386. 被引量:1
  • 8YIH W, MEEK C. Improving similarity measures for short segments of text[C] / / Proceedings of the 22nd Conference on Artificial Intel-ligence. Menlo Park: AAAI Press, 2007: 1489 -1494. 被引量:1
  • 9BANERJEE S, RAMANATHAN K, GUPTA A. Clustering short texts using Wikipedia[C] / / Proceedings of the 30th Annual Inter-national ACM SIGIR Conference on on Research and Development in Information Retrieval. New York: ACM, 2007: 787 -788. 被引量:1
  • 10BLEI D M, NG A Y, JORDAN M I. Latent Dirichlet allocation[J]. Journal of Machine Learning Research, 2003, 3 ( 3): 993 - 1022. 被引量:1

共引文献79

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部