摘要
基于卷积神经网络(CNN)获得回归密度图的方法已成为人群计数与定位的主流方法,但现有方法仍存在两个问题:首先传统方法获得的密度图在人群密集区域存在粘连和重叠问题,导致网络最终人群计数和定位错误;其次,常规卷积由于其权重不变,无法实现对图像特征的自适应提取,难以处理复杂背景和人群密度分布不均匀的图像。为解决上述问题,提出一种基于像素距离图(PDMap)和四维动态卷积网络(FDDCNet)的密集人群计数与定位方法。首先定义了一种新的PDMap,利用像素级标注点之间的空间距离关系,通过取反操作提高人头中心点周围像素的平滑度,避免人群密集区域的粘连重叠;其次,设计了一种FDDC模块,自适应地改变卷积四个维度的权重,提取不同视图提供的先验知识,应对复杂场景和分布不均匀导致的计数与定位困难,提高网络模型的泛化能力和鲁棒性;最后,采用阈值过滤局部不确定预测值,进一步提高计数与定位的准确性。在NWPU-Crowd数据集的测试集上:在人群计数方面,所提方法的平均绝对误差(MAE)和均方误差(MSE)分别为82.4和334.7,比MFP-Net(Multi-scale Feature Pyramid Network)分别降低了8.7%和26.9%;在人群定位方面,所提方法的综合评价指标F1值和精确率分别为71.2%和73.6%,比TopoCount(Topological Count)方法分别提升了3.0%和5.9%。实验结果表明,所提方法能够处理复杂背景的密集人群图像,取得了更高的计数准确率和定位精准度。
The method of obtaining regression density map based on Convolutional Neural Network(CNN)has become the mainstream method of crowd counting and locating,however,there are still two problems in the existing methods.Firstly,density maps obtained by traditional methods have adhesion and overlap problems in crowded areas,which leads to mistakes in final crowd counting and locating of the network.Secondly,due to weight invariance of conventional convolution,it is difficult to achieve adaptive extraction of image features and to process images with complex background and uneven crowd density distribution.To solve these above problems,a method for counting and locating dense crowds was proposed based on Pixel Distance Map(PDMap)and Four-Dimensional Dynamic Convolutional Network(FDDCNet).Firstly,a new PDMap was defined,which used the spatial distance relationship between pixel level points to enhance the smoothness of pixels around the center point of human head through reverse operation,hence solving the problem of adhesion and overlap in crowded areas.Secondly,an FDDC module was designed to adaptively change the weights of the fourdimensions of convolutions,extract the prior knowledge provided by different views to deal with the challenge of counting and locating difficulties caused by complex scenes and uneven distribution,improving the generalization ability and robustness of the model.Finally,the threshold value was used to filter local uncertain predicted value to further improve the accuracy of counting and locating.On the test set of NWPU-Crowd dataset:in terms of crowd counting,the Mean Absolute Error(MAE)and Mean Squared Error(MSE)of the proposed method were 82.4 and 334.7,respectively,which were 8.7%and 26.9%lower than those of Multi-scale Feature Pyramid Network(MFP-Net);and in terms of crowd locating,The comprehensive evaluation indicators F1 value and precision of the proposed method were 71.2%and 73.6%,respectively,which were 3.0%and 5.9%lower than those of Topological Count(TopoCount).The experimental res
作者
高阳峄
雷涛
杜晓刚
李岁永
王营博
闵重丹
GAO Yangyi;LEI Tao;DU Xiaogang;LI Suiyong;WANG Yingbo;MIN Chongdan(School of Electronic Information and Artificial Intelligence,Shaanxi University of Science and Technology,Xi’an Shaanxi 710021,China;Shaanxi Joint Laboratory of Artificial Intelligence(Shaanxi University of Science and Technology),Xi’an Shaanxi 710021,China;China Railway First Survey and Design Institute Group Company Limited,Xi’an Shaanxi 710043,China)
出处
《计算机应用》
CSCD
北大核心
2024年第7期2233-2242,共10页
journal of Computer Applications
基金
国家自然科学基金资助项目(62271296)
陕西省杰出青年基金资助项目(2021JC-47)。
关键词
卷积神经网络
人群计数
人群定位
距离变化
动态卷积
局部极大值检测
Convolutional Neural Network(CNN)
crowd counting
crowd locating
distance variation
dynamic convolution
local maximum detection