摘要
针对在百万级用户计算与目标号码轨迹最相似的用户存在时间复杂度高,推荐精度低的情况,提出了一种轨迹相似度的快速计算框架。这一框架将时间地理位置组合形成字符串,能够对时空金字塔匹配等多种经典的轨迹分析模型进行效果增强和性能增强,并基于增强后的轨迹分析模型完成轨迹相似度判别。该框架能够针对手机信令数据的特点进行去噪,清洗其中的冗余数据、无效数据及乒乓切换数据,将手机信令应用于轨迹重合度比对的实践性框架,其优势在于灵活适配广泛的相似度度量模型,并兼顾了算法准确度和计算效率。本文对优化后的时空金字塔匹配模型(STPM)、最长公共子序列LCS算法、MinHash算法、SimHash算法及动态时间归整DTW算法5种不同的轨迹相似度算法进行量化对比,在效率和准确度两方面进行讨论。在本文的实验数据集上,优化后的时空金字塔匹配算法效果优于其他几种算法。
In response to the situation that there is high time complexity and low recommendation accuracy in calculating the the most similar trajectory of the target in millions of users,this paper proposes a fast framework for calculating trajectory similarity.This framework combines temporal geographic locations to form strings,which can enhance the effect and performance of various classical trajectory analysis models such as spatio-temporal pyramid matching,and complete the trajectory similarity discrimination based on the enhanced trajectory analysis models.The framework is able to denoise and clean the redundant data,invalid data and ping-pong switching data for the characteristics of cell phone signaling data,and apply cell phone signaling to a practical framework for trajectory overlap comparison,which has the advantage of flexible adaptation to a wide range of similarity measurement models,and takes into account the accuracy and computational efficiency of the algorithm.In this paper,five different trajectory similarity algorithms are quantitatively compared in terms of efficiency and accuracy with the optimized spatio-temporal pyramid matching model(STPM),Longest Common Subsequence(LCS)algorithm,MinHash algorithm,SimHash algorithm and Dynamic Time Warping(DTW)algorithm.On the experimental dataset of this paper,the optimized spatio-temporal pyramid matching accounting method outperforms several other algorithms.
作者
李欣桐
崔丙维
李明哲
LI Xintong;CUI Bingwei;LI Mingzhe(Chang’an Communication Technology Co.Ltd,Innovation Services R&D Center,Beijing,102209,China;National computer network emergency technology processing and coordination center,Laboratory,Beijing,102209,China)
出处
《网络新媒体技术》
2022年第5期15-23,共9页
Network New Media Technology
基金
基于机器学习的高精度大规模手机去重与区域人口预测技术研究与示范(编号:2020YFF0304901)。
关键词
手机信令数据
轨迹清洗
轨迹相似度
轨迹伴随
Geohash
mobile signaling data
trajectory cleaning
trajectory similarity
trajectory accompanying
Geohash