摘要
社交用户的文本具有地理差异性,并且社交关系密切的用户之间居住位置更近,因而文本和社交网络均可用于推断用户常驻位置。现有基于文本和社交网络的用户常驻位置预测方法对文本的位置指示性特征挖掘不充分,而用户文本中地名等位置指示信息却提供了最有用的位置信号。因此,本文提出一种基于地理命名实体识别(GER)和图卷积神经网络(GCN)的社交用户位置预测方法。首先,通过地理命名实体识别方法对用户文本进行过滤以凸显位置指示性特征;其次,基于提及关系和关注与被关注关系抽取社交网络;再次,结合社交网络和用户文本内容,采用基于图卷积神经网络的方法进行用户常驻位置预测;最后,将GER-GCN与GCN以及最新研究成果进行比较,并探究该模型的小样本学习能力及其影响因素。基于Geotext数据集和2个微博数据集的实验表明:①GER文本过滤方法可显著提升用户位置预测精度;②在所有实验中,GER-GCN的预测精度最高,并在基准数据集GeoText上比最新研究成果提升1%~2%;③在最小监督的现实场景中,本文印证了GER-GCN模型的小样本学习能力,并发现社交网络质量对其小样本学习能力起到决定性作用。实验结果验证了GER-GCN方法的先进性,且该方法符合社交媒体现实场景的应用需求。
The home locations of social media users are essential for a wide range of applications in real-world.The social media text published by users from different regions possesses quite a few differences in expression mode,semantics,and other contents.In general,users with close social relationships live closer to each other.Therefore,both text and social network can be used to infer the home locations of users.The existing user’s home location prediction methods based on social network and text are not sufficient to mine the location indicative features in user text,while the location indicative information such as toponym in text provides the most useful location signals.Therefore,we proposed a location prediction method for social media users based on Geographic Entity Recognition(GER)and Graph Convolutional Network(GCN).Firstly,the user text was filtered by the geographic entity recognition method to highlight the location indicative words.Then,the social networks were extracted based on mentioned relationships and following relationships.After that,we combined social network and user text content that contains location indicative words.The method based on graph convolutional network was used to predict the user's home location.Finally,we compared the GER-GCN method with the GCN method and the latest research results,and explored the small sample learning ability of the model and its influencing factors.Experiment results based on the GeoText dataset and two datasets of microblog show that,firstly,GER text filtering method can significantly improve the accuracy of user location prediction.The improvement effect of this method is more significant for the dataset with more microblogs of users,which indicates that the GER text filtering method is more suitable for the social media dataset with more microblogs of users.Secondly,in the experiments of different datasets,the prediction accuracy of GER-GCN method is invariably the highest among all methods.In the experiment of GeoText benchmark dataset,the prediction a
作者
王海起
孔浩然
李学伟
WANG Haiqi;KONG Haoran;LI Xuewei(College of Oceanography and Space Informatics,China University of Petroleum(East China),Qingdao 266580,China)
出处
《地球信息科学学报》
CSCD
北大核心
2021年第10期1778-1786,共9页
Journal of Geo-information Science
基金
国家自然科学基金项目(41471322)。
关键词
社交用户
常驻位置
地名
社交网络
多视图
地理命名实体识别
图卷积神经网络
小样本学习
social users
home location
Toponym
social networks
multi-view
geographic entity recognition
graph convolutional network
small sample learning