摘要
【目的】构建基于多维小波聚类的空间文本数据情感分析模型,实现文本情感和空间位置的综合分析。【方法】将Yelp数据集进行整合以构建空间文本数据库,使用基于词典的情感分析方法构建特征向量。提出使用多维小波聚类的混合算法和文本–空间算法两种模型并进行分析。【结果】实验结果验证了使用db2和bior2.2小波基函数的多维小波聚类算法比DBSCAN和K-means算法在空间文本数据挖掘中能识别出更精确的聚类集合,且在十万级至千万级数据聚类中速度最佳。【局限】情感分析部分使用一元语言模型,缺乏对语句层面意义的分析。【结论】本文所提文本–空间算法模型能有效挖掘多维空间文本数据的情感倾向分布;混合算法模型为空间文本数据推荐系统提供了同时计算空间接近性和情感相似性的有效方案。
[Objective] This paper builds a spatial-textual sentiment analyzing model based on multi-dimensional WaveCluster, aiming to analyze text sentiment and spatial position effectively.[Methods] First, we integrated several datasets from Yelp to build spatial-textual database. Then, we used lexicon-based sentiment analysis to generate feature vector. Third, we proposed a new method using Hybrid model, Textual-Spatial model, as well as multi-dimensional clustering model to analyze the data.[Results] We found that multi-dimensional clustering based on db2 or bior2.2 wavelet can recognize clusters more accurately than DBSCAN and K-means on spatial-textual feature mining. It also achieved the highest speed for data at 100 thousand to 10 million levels.[Limitations] We used unigram model for sentiment analysis, which cannot analyze sentences.[Conclusions] The proposed Textual-Spatial model could find out sentiment tendency distribution from spatial-textual data effectively. The Hybrid model provides a new approach for spatial-textual recommend system to calculate sentiment similarity and spatial proximity simultaneously.
作者
李柯
佐々木勇和
Li Ke;Sasaki Yuya(School of Information Management, Nanjing University, Nanjing 210046, China;Graduate School of Information Science and Technology, Osaka University, Osaka 565-0871, Japan)
出处
《数据分析与知识发现》
CSSCI
CSCD
北大核心
2019年第7期14-22,共9页
Data Analysis and Knowledge Discovery
关键词
空间文本数据
情感分布分析
小波变换
聚类
Spatial-Textual Data
Sentiment Distribution Analysis
Wavelet Transform
Clustering