基于Twitter数据的地点分类方法研究

Research on Location Classification Based on Twitter Data

下载PDF

导出

摘要城市化进程中,新的地点不断出现且地点类型不断更新,导致大量未知地点产生,为城市形态的理解和掌控造成障碍。本文综合多种空间分析及文本挖掘技术,创新性地融合Twitter数据中的时间记录与Tweets(用户在Twitter中发表的文本内容)用于地点分类。设计抽取精细的人群活动的时空-内容信息的方法,并通过监督学习方法,利用少量标记样本,自动识别未知地点的类型。最终识别出教育、娱乐、商店、社会服务、交通五种类型的地点,整体精度达67.6%,表明方法的可行性,为社交数据在地点分类研究中的有效利用提供了新的思路。 In the process of urbanization,new locations constantly appear and the categories of locations are usually updated,resulting in great number of unknown locations and obstacles to the comprehension and grasp of urban functional structures.In this paper,various spatial analysis and text mining techniques are combined to integrate time records and Tweets(text content published by users in Twitter)in Twitter data for location classification innovatively.A method for extracting the detailed spatiotemporal-content information from crowds’activities is designed,and supervised learning techniques are utilized to automatically identify the type of unknown locations using a small number of labeled samples.At last,five types of locations including education,entertainment,shops,social services,and transportation are identified,with an overall accuracy of 67.6%,which shows the feasibility of this method and provides a novel idea for the effective application of social media data in location classification researches.

作者邱小宇林杰 Qiu Xiaoyu;Lin Jie(School of Earth Sciences,Zhejiang University,Hangzhou City,Zhejiang 310027,China)

机构地区浙江大学地球科学学院

出处《科技通报》 2020年第4期67-71,共5页 Bulletin of Science and Technology

基金国家自然科学基金项目(41501423)

关键词地点分类社交网络数据 Twitter数据空间分析文本挖掘 location classification social media data Twitter data spatial analysis text mining

分类号 TP391.1 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献3

1迟铭宇.社交地点分类算法设计与实现[J].现代计算机（中旬刊）,2017(7):17-20. 被引量：1
2余容,李光强,尹健.微博文本挖掘研究综述[J].情报探索,2017(5):97-103. 被引量：5
3张志飞,苗夺谦,高灿.基于LDA主题模型的短文本分类方法[J].计算机应用,2013,33(6):1587-1590. 被引量：76

二级参考文献36

1王妙娅.国内图书馆微博应用现状及建议[J].图书馆学研究（应用版）,2010(12):37-41. 被引量：123
2PARK E K, RA D Y, JANG M G. Techniques for improving Web retrieval effectiveness[J]. Information Processing Management, 2005, 41(5): 1207 -1223. 被引量：1
3LIU W Y, HAO T Y, CHEN W, et al. A Web-based platform for user-interactive question-answering[J]. World Wide Web, 2009, 12(2): 107 -124. 被引量：1
4SALTON G, WONG A, YANG C S. A vector space model for auto-matic indexing[J]. Communications of the ACM, 1975, 18 ( 11) : 613 -620. 被引量：1
5PHAN X H, NGUYEN M L, HORIGUCHI S. Learning to classify short and sparse text & Web with hidden topics from large-scale data collections[C] / / Proceedings of the 17 th Conference on World Wide Web. New York: ACM, 2008: 91 -100. 被引量：1
6WANG L, JIA Y, HAN W H. Instant message clustering based on extended vector space model[C] / / Proceedings of the 2nd Interna-tional Conference on Advances in Computation and Intelligence. Berlin: Springer-Verlag, 2007: 435 - 443. 被引量：1
7SAHAMI M, HEILMAN T D. A Web - based kernel function for measuring the similarity of short text snippets[C] / / Proceedings of the 15th Conference on World Wide Web. New York: ACM, 2006: 377 -386. 被引量：1
8YIH W, MEEK C. Improving similarity measures for short segments of text[C] / / Proceedings of the 22nd Conference on Artificial Intel-ligence. Menlo Park: AAAI Press, 2007: 1489 -1494. 被引量：1
9BANERJEE S, RAMANATHAN K, GUPTA A. Clustering short texts using Wikipedia[C] / / Proceedings of the 30th Annual Inter-national ACM SIGIR Conference on on Research and Development in Information Retrieval. New York: ACM, 2007: 787 -788. 被引量：1
10BLEI D M, NG A Y, JORDAN M I. Latent Dirichlet allocation[J]. Journal of Machine Learning Research, 2003, 3 ( 3): 993 - 1022. 被引量：1

共引文献79

1熊祖涛.基于稀疏特征的中文微博短文本聚类方法研究[J].软件导刊,2014,13(1):133-135. 被引量：4
2李湘东,廖香鹏,黄莉.LDA模型下书目信息分类系统的研究与实现[J].现代图书情报技术,2014(5):18-25. 被引量：12
3唐晓丽,白宇,张桂平,蔡东风.一种面向聚类的文本建模方法[J].山西大学学报（自然科学版）,2014,37(4):595-600. 被引量：8
4张大伟.煤矿安全隐患治理知识库的建立与应用[J].煤矿安全,2015,46(1):230-232. 被引量：8
5陈千,桂志国,郭鑫,向阳.基于特征本体的文本流主题演化[J].计算机应用,2015,35(2):456-460. 被引量：3
6郑併斌,范新南,李敏,张继.基于轨迹分段LDA主题模型的视频异常行为检测方法[J].计算机应用,2015,35(2):515-518. 被引量：9
7盖森,刘建忠,熊伟,孙晨,张心悦.一种结合LDA主题分析的地理信息检索方法[J].测绘科学技术学报,2015,32(3):315-320. 被引量：4
8郑祥云,陈志刚,黄瑞,李博.基于主题模型的个性化图书推荐算法[J].计算机应用,2015,35(9):2569-2573. 被引量：35
9王小宾,邹梦宇,史建军.基于LDA模型的微博话题识别方法研究[J].数字技术与应用,2015,33(10):81-81.
10丁恒,陆伟.基于相关性的跨模态信息检索研究[J].现代图书情报技术,2016(1):17-23. 被引量：7

1李玉婉,刘丰军,李韶霞.无障碍设施在城市公园中的应用情况浅析[J].城市建筑,2020,17(16):192-195. 被引量：2
2王忠余.探讨集团企业财务预算管控[J].财经界,2020(10):161-162.
3周楠,张倍齐,覃薇,蓝毓营.数字医学技术在壮医药学研究领域的应用[J].中华中医药杂志,2020,35(6):2977-2979. 被引量：2
4Madichetty Sreenivasulu,M.Sridevi.Comparative Study of Statistical Features to Detect the Target Event During Disaster[J].Big Data Mining and Analytics,2020,3(2):121-130. 被引量：1
5朱迪亚·珀尔,吴小安(译).服务于经验研究的因果图[J].清华西方哲学研究,2019(2):144-173.
6朱映璇,刘梦阳,刘悦,李志伟,郭秀花,张进军.2011―2017年北京市院前急救创伤患者的流行病学特征[J].中华疾病控制杂志,2020,24(7):860-864. 被引量：15
7郝希春.关于关键词标引的要求[J].安徽医药,2020,24(8):1654-1654.
8周峻岭,王伯勋.基于空间句法的澳门直街空间型态研究[J].装饰,2020(4):136-137. 被引量：9
9王根生,潘方正.融合多元异构信息的矩阵分解推荐算法[J].小型微型计算机系统,2020,41(7):1406-1412. 被引量：3
10刘凌云,钱辉,邢红杰,董春茹,张峰.一种基于Q-学习算法的增量分类模型[J].计算机科学,2020,47(8):171-177. 被引量：4

科技通报

2020年第4期

浏览历史

内容加载中请稍等...

基于Twitter数据的地点分类方法研究

参考文献3

二级参考文献36

共引文献79

相关作者

相关机构

相关主题

浏览历史