步态识别具有对图像分辨率要求低、可远距离识别、无需受试者合作、难以隐藏或伪装等优势,在安防监控和调查取证等领域有着广阔的应用前景。然而在实际应用中,步态识别的性能常受到视角、着装、携物和遮挡等协变量的影响,其中视角变化...步态识别具有对图像分辨率要求低、可远距离识别、无需受试者合作、难以隐藏或伪装等优势,在安防监控和调查取证等领域有着广阔的应用前景。然而在实际应用中,步态识别的性能常受到视角、着装、携物和遮挡等协变量的影响,其中视角变化最为普遍,并且会使行人的外观发生显著改变。因此,提高步态识别对视角的鲁棒性一直是该领域的研究热点。为了全面认识现有的跨视角步态识别方法,本文对相关研究工作进行了梳理和综述。首先,从基本概念、数据采集方式和发展历程等角度简要介绍了该领域的研究背景,在此基础上,整理并分析了基于视频的主流跨视角步态数据库;然后,从基于3维步态信息的识别方法、基于视角转换模型的识别方法、基于视角不变特征的识别方法和基于深度学习的识别方法4个方面详细介绍了跨视角步态识别方法。最后,在CASIA-B(CASIA gait database,dataset B)、OU-ISIR LP(OU-ISIR gait database,large population dataset)和OU-MVLP(OU-ISIR gait database,multi-view large population dataset)3个数据库上对该领域代表性方法的性能进行了对比分析,并指出跨视角步态识别的未来研究方向。展开更多
目的针对目前基于生成式的步态识别方法采用特定视角的步态模板转换、识别率随视角跨度增大而不断下降的问题,本文提出融合自注意力机制的生成对抗网络的跨视角步态识别方法。方法该方法的网络结构由生成器、视角判别器和身份保持器构成...目的针对目前基于生成式的步态识别方法采用特定视角的步态模板转换、识别率随视角跨度增大而不断下降的问题,本文提出融合自注意力机制的生成对抗网络的跨视角步态识别方法。方法该方法的网络结构由生成器、视角判别器和身份保持器构成,建立可实现任意视角间步态转换的网络模型。生成网络采用编码器—解码器结构将输入的步态特征和视角指示器连接,进而实现不同视角域的转换,并通过对抗训练和像素级损失使生成的目标视角步态模板与真实的步态模板相似。在判别网络中,利用视角判别器来约束生成视角与目标视角相一致,并使用联合困难三元组损失的身份保持器以最大化保留输入模板的身份信息。同时,在生成网络和判别网络中加入自注意力机制,以捕捉特征的全局依赖关系,从而提高生成图像的质量,并引入谱规范化使网络稳定训练。结果在CASIA-B(Chinese Academy of Sciences’Institute of Automation gait database——dataset B)和OU-MVLP(OU-ISIR gait database-multi-view large population dataset)数据集上进行实验,当引入自注意力模块和身份保留损失训练网络时,在CASIA-B数据集上的识别率有显著提升,平均rank-1准确率比Gait GAN(gait generative adversarial network)方法高15%。所提方法在OU-MVLP大规模的跨视角步态数据库中仍具有较好的适用性,可以达到65.9%的平均识别精度。结论本文方法提升了生成步态模板的质量,提取的视角不变特征更具判别力,识别精度较现有方法有一定提升,能较好地解决跨视角步态识别问题。展开更多
The goal of street-to-aerial cross-view image geo-localization is to determine the location of the query street-view image by retrieving the aerial-view image from the same place.The drastic viewpoint and appearance g...The goal of street-to-aerial cross-view image geo-localization is to determine the location of the query street-view image by retrieving the aerial-view image from the same place.The drastic viewpoint and appearance gap between the aerial-view and the street-view images brings a huge challenge against this task.In this paper,we propose a novel multiscale attention encoder to capture the multiscale contextual information of the aerial/street-view images.To bridge the domain gap between these two view images,we first use an inverse polar transform to make the street-view images approximately aligned with the aerial-view images.Then,the explored multiscale attention encoder is applied to convert the image into feature representation with the guidance of the learnt multiscale information.Finally,we propose a novel global mining strategy to enable the network to pay more attention to hard negative exemplars.Experiments on standard benchmark datasets show that our approach obtains 81.39%top-1 recall rate on the CVUSA dataset and 71.52%on the CVACT dataset,achieving the state-of-the-art performance and outperforming most of the existing methods significantly.展开更多
Remarkable progress has been made in self-supervised monocular depth estimation (SS-MDE) by exploring cross-view consistency, e.g., photometric consistency and 3D point cloud consistency. However, they are very vulner...Remarkable progress has been made in self-supervised monocular depth estimation (SS-MDE) by exploring cross-view consistency, e.g., photometric consistency and 3D point cloud consistency. However, they are very vulnerable to illumination variance, occlusions, texture-less regions, as well as moving objects, making them not robust enough to deal with various scenes. To address this challenge, we study two kinds of robust cross-view consistency in this paper. Firstly, the spatial offset field between adjacent frames is obtained by reconstructing the reference frame from its neighbors via deformable alignment, which is used to align the temporal depth features via a depth feature alignment (DFA) loss. Secondly, the 3D point clouds of each reference frame and its nearby frames are calculated and transformed into voxel space, where the point density in each voxel is calculated and aligned via a voxel density alignment (VDA) loss. In this way, we exploit the temporal coherence in both depth feature space and 3D voxel space for SS-MDE, shifting the “point-to-point” alignment paradigm to the “region-to-region” one. Compared with the photometric consistency loss as well as the rigid point cloud alignment loss, the proposed DFA and VDA losses are more robust owing to the strong representation power of deep features as well as the high tolerance of voxel density to the aforementioned challenges. Experimental results on several outdoor benchmarks show that our method outperforms current state-of-the-art techniques. Extensive ablation study and analysis validate the effectiveness of the proposed losses, especially in challenging scenes. The code and models are available at https://github.com/sunnyHelen/RCVC-depth.展开更多
The appearance of pedestrians can vary greatly from image to image,and different pedestrians may look similar in a given image.Such similarities and variabilities in the appearance and clothing of individuals make the...The appearance of pedestrians can vary greatly from image to image,and different pedestrians may look similar in a given image.Such similarities and variabilities in the appearance and clothing of individuals make the task of pedestrian re-identification very challenging.Here,a pedestrian re-identification method based on the fusion of local features and gait energy image(GEI)features is proposed.In this method,the human body is divided into four regions according to joint points.The color and texture of each region of the human body are extracted as local features,and GEI features of the pedestrian gait are also obtained.These features are then fused with the local and GEI features of the person.Independent distance measure learning using the cross-view quadratic discriminant analysis(XQDA)method is used to obtain the similarity of the metric function of the image pairs,and the final similarity is acquired by weight matching.Evaluation of experimental results by cumulative matching characteristic(CMC)curves reveals that,after fusion of local and GEI features,the pedestrian re-identification effect is improved compared with existing methods and is notably better than the recognition rate of pedestrian re-identification with a single feature.展开更多
Matching remote sensing images taken by an unmanned aerial vehicle(UAV) with satellite remote sensing images with geolocation information. Thus, the specific geographic location of the target object captured by the UA...Matching remote sensing images taken by an unmanned aerial vehicle(UAV) with satellite remote sensing images with geolocation information. Thus, the specific geographic location of the target object captured by the UAV is determined. Its main challenge is the considerable differences in the visual content of remote sensing images acquired by satellites and UAVs, such as dramatic changes in viewpoint, unknown orientations, etc. Much of the previous work has focused on image matching of homologous data. To overcome the difficulties caused by the difference between these two data modes and maintain robustness in visual positioning, a quality-aware template matching method based on scale-adaptive deep convolutional features is proposed by deeply mining their common features. The template size feature map and the reference image feature map are first obtained. The two feature maps obtained are used to measure the similarity. Finally, a heat map representing the probability of matching is generated to determine the best match in the reference image. The method is applied to the latest UAV-based geolocation dataset(University-1652 dataset) and the real-scene campus data we collected with UAVs. The experimental results demonstrate the effectiveness and superiority of the method.展开更多
文摘步态识别具有对图像分辨率要求低、可远距离识别、无需受试者合作、难以隐藏或伪装等优势,在安防监控和调查取证等领域有着广阔的应用前景。然而在实际应用中,步态识别的性能常受到视角、着装、携物和遮挡等协变量的影响,其中视角变化最为普遍,并且会使行人的外观发生显著改变。因此,提高步态识别对视角的鲁棒性一直是该领域的研究热点。为了全面认识现有的跨视角步态识别方法,本文对相关研究工作进行了梳理和综述。首先,从基本概念、数据采集方式和发展历程等角度简要介绍了该领域的研究背景,在此基础上,整理并分析了基于视频的主流跨视角步态数据库;然后,从基于3维步态信息的识别方法、基于视角转换模型的识别方法、基于视角不变特征的识别方法和基于深度学习的识别方法4个方面详细介绍了跨视角步态识别方法。最后,在CASIA-B(CASIA gait database,dataset B)、OU-ISIR LP(OU-ISIR gait database,large population dataset)和OU-MVLP(OU-ISIR gait database,multi-view large population dataset)3个数据库上对该领域代表性方法的性能进行了对比分析,并指出跨视角步态识别的未来研究方向。
文摘目的针对目前基于生成式的步态识别方法采用特定视角的步态模板转换、识别率随视角跨度增大而不断下降的问题,本文提出融合自注意力机制的生成对抗网络的跨视角步态识别方法。方法该方法的网络结构由生成器、视角判别器和身份保持器构成,建立可实现任意视角间步态转换的网络模型。生成网络采用编码器—解码器结构将输入的步态特征和视角指示器连接,进而实现不同视角域的转换,并通过对抗训练和像素级损失使生成的目标视角步态模板与真实的步态模板相似。在判别网络中,利用视角判别器来约束生成视角与目标视角相一致,并使用联合困难三元组损失的身份保持器以最大化保留输入模板的身份信息。同时,在生成网络和判别网络中加入自注意力机制,以捕捉特征的全局依赖关系,从而提高生成图像的质量,并引入谱规范化使网络稳定训练。结果在CASIA-B(Chinese Academy of Sciences’Institute of Automation gait database——dataset B)和OU-MVLP(OU-ISIR gait database-multi-view large population dataset)数据集上进行实验,当引入自注意力模块和身份保留损失训练网络时,在CASIA-B数据集上的识别率有显著提升,平均rank-1准确率比Gait GAN(gait generative adversarial network)方法高15%。所提方法在OU-MVLP大规模的跨视角步态数据库中仍具有较好的适用性,可以达到65.9%的平均识别精度。结论本文方法提升了生成步态模板的质量,提取的视角不变特征更具判别力,识别精度较现有方法有一定提升,能较好地解决跨视角步态识别问题。
基金National Natural Science Foundation of China,Grant/Award Number:62106177supported by the Central University Basic Research Fund of China(No.2042020KF0016)supported by the supercomputing system in the Supercomputing Center of Wuhan University.
文摘The goal of street-to-aerial cross-view image geo-localization is to determine the location of the query street-view image by retrieving the aerial-view image from the same place.The drastic viewpoint and appearance gap between the aerial-view and the street-view images brings a huge challenge against this task.In this paper,we propose a novel multiscale attention encoder to capture the multiscale contextual information of the aerial/street-view images.To bridge the domain gap between these two view images,we first use an inverse polar transform to make the street-view images approximately aligned with the aerial-view images.Then,the explored multiscale attention encoder is applied to convert the image into feature representation with the guidance of the learnt multiscale information.Finally,we propose a novel global mining strategy to enable the network to pay more attention to hard negative exemplars.Experiments on standard benchmark datasets show that our approach obtains 81.39%top-1 recall rate on the CVUSA dataset and 71.52%on the CVACT dataset,achieving the state-of-the-art performance and outperforming most of the existing methods significantly.
文摘Remarkable progress has been made in self-supervised monocular depth estimation (SS-MDE) by exploring cross-view consistency, e.g., photometric consistency and 3D point cloud consistency. However, they are very vulnerable to illumination variance, occlusions, texture-less regions, as well as moving objects, making them not robust enough to deal with various scenes. To address this challenge, we study two kinds of robust cross-view consistency in this paper. Firstly, the spatial offset field between adjacent frames is obtained by reconstructing the reference frame from its neighbors via deformable alignment, which is used to align the temporal depth features via a depth feature alignment (DFA) loss. Secondly, the 3D point clouds of each reference frame and its nearby frames are calculated and transformed into voxel space, where the point density in each voxel is calculated and aligned via a voxel density alignment (VDA) loss. In this way, we exploit the temporal coherence in both depth feature space and 3D voxel space for SS-MDE, shifting the “point-to-point” alignment paradigm to the “region-to-region” one. Compared with the photometric consistency loss as well as the rigid point cloud alignment loss, the proposed DFA and VDA losses are more robust owing to the strong representation power of deep features as well as the high tolerance of voxel density to the aforementioned challenges. Experimental results on several outdoor benchmarks show that our method outperforms current state-of-the-art techniques. Extensive ablation study and analysis validate the effectiveness of the proposed losses, especially in challenging scenes. The code and models are available at https://github.com/sunnyHelen/RCVC-depth.
基金This research was funded by the Science and Technology Support Plan Project of Hebei Province(grant numbers 17210803D and 19273703D)the Science and Technology Spark Project of the Hebei Seismological Bureau(grant number DZ20180402056)+1 种基金the Education Department of Hebei Province(grant number QN2018095)the Polytechnic College of Hebei University of Science and Technology.
文摘The appearance of pedestrians can vary greatly from image to image,and different pedestrians may look similar in a given image.Such similarities and variabilities in the appearance and clothing of individuals make the task of pedestrian re-identification very challenging.Here,a pedestrian re-identification method based on the fusion of local features and gait energy image(GEI)features is proposed.In this method,the human body is divided into four regions according to joint points.The color and texture of each region of the human body are extracted as local features,and GEI features of the pedestrian gait are also obtained.These features are then fused with the local and GEI features of the person.Independent distance measure learning using the cross-view quadratic discriminant analysis(XQDA)method is used to obtain the similarity of the metric function of the image pairs,and the final similarity is acquired by weight matching.Evaluation of experimental results by cumulative matching characteristic(CMC)curves reveals that,after fusion of local and GEI features,the pedestrian re-identification effect is improved compared with existing methods and is notably better than the recognition rate of pedestrian re-identification with a single feature.
基金co-supported by the National Natural Science Foundations of China(Nos.62175111 and 62001234)。
文摘Matching remote sensing images taken by an unmanned aerial vehicle(UAV) with satellite remote sensing images with geolocation information. Thus, the specific geographic location of the target object captured by the UAV is determined. Its main challenge is the considerable differences in the visual content of remote sensing images acquired by satellites and UAVs, such as dramatic changes in viewpoint, unknown orientations, etc. Much of the previous work has focused on image matching of homologous data. To overcome the difficulties caused by the difference between these two data modes and maintain robustness in visual positioning, a quality-aware template matching method based on scale-adaptive deep convolutional features is proposed by deeply mining their common features. The template size feature map and the reference image feature map are first obtained. The two feature maps obtained are used to measure the similarity. Finally, a heat map representing the probability of matching is generated to determine the best match in the reference image. The method is applied to the latest UAV-based geolocation dataset(University-1652 dataset) and the real-scene campus data we collected with UAVs. The experimental results demonstrate the effectiveness and superiority of the method.