Despite significant developments in 3D multi-view multi-person (3D MM) tracking, current frameworks separately target footprint tracking, or pose tracking. Frameworks designed for the former cannot be used for the lat...Despite significant developments in 3D multi-view multi-person (3D MM) tracking, current frameworks separately target footprint tracking, or pose tracking. Frameworks designed for the former cannot be used for the latter, because they directly obtain 3D positions on the ground plane via a homography projection, which is inapplicable to 3D poses above the ground. In contrast, frameworks designed for pose tracking generally isolate multi-view and multi-frame associations and may not be sufficiently robust for footprint tracking, which utilizes fewer key points than pose tracking, weakening multi-view association cues in a single frame. This study presents a unified multi-view multi-person tracking framework to bridge the gap between footprint tracking and pose tracking. Without additional modifications, the framework can adopt monocular 2D bounding boxes and 2D poses as its input to produce robust 3D trajectories for multiple persons. Importantly, multi-frame and multi-view information are jointly employed to improve association and triangulation. Our framework is shown to provide state-of-the-art performance on the Campus and Shelf datasets for 3D pose tracking, with comparable results on the WILDTRACK and MMPTRACK datasets for 3D footprint tracking.展开更多
To improve the tracking accuracy of persons in the surveillance video,we proposed an algorithm for multi-target tracking persons based on deep learning.In this paper,we used You Only Look Once v5(YOLOv5)to obtain pers...To improve the tracking accuracy of persons in the surveillance video,we proposed an algorithm for multi-target tracking persons based on deep learning.In this paper,we used You Only Look Once v5(YOLOv5)to obtain person targets of each frame in the video and used Simple Online and Realtime Tracking with a Deep Association Metric(DeepSORT)to do cascade matching and Intersection Over Union(IOU)matching of person targets between different frames.To solve the IDSwitch problem caused by the low feature extraction ability of the Re-Identification(ReID)network in the process of cascade matching,we introduced Spatial Relation-aware Global Attention(RGA-S)and Channel Relation-aware Global Attention(RGA-C)attention mechanisms into the network structure.The pre-training weights are loaded for Transfer Learning training on the dataset CUHK03.To enhance the discrimination performance of the network,we proposed a new loss function design method,which introduces the Hard-Negative-Mining way into the benchmark triplet loss.To improve the classification accuracy of the network,we introduced a Label-Smoothing regularization method to the cross-entropy loss.To facilitate the model’s convergence stability and convergence speed at the early training stage and to prevent the model from oscillating around the global optimum due to excessive learning rate at the later stage of training,this paper proposed a learning rate regulation method combining Linear-Warmup and exponential decay.The experimental results on CUHK03 show that the mean Average Precision(mAP)of the improved ReID network is 76.5%.The Top 1 is 42.5%,the Top 5 is 65.4%,and the Top 10 is 74.3%in Cumulative Matching Characteristics(CMC);Compared with the original algorithm,the tracking accuracy of the optimized DeepSORT tracking algorithm is improved by 2.5%,the tracking precision is improved by 3.8%.The number of identity switching is reduced by 25%.The algorithm effectively alleviates the IDSwitch problem,improves the tracking accuracy of persons,and has a high practical展开更多
This paper presents a multi-person vision tracking approach based on human body localization features to address the problem of interactive object localization and tracking in a home monitoring scenario.Firstly,the hu...This paper presents a multi-person vision tracking approach based on human body localization features to address the problem of interactive object localization and tracking in a home monitoring scenario.Firstly,the human body localization model is used to obtain the 3D position of the human body,which is then used to construct the human body motion model based on the Kalman filter method.At the same time,the human appearance model is constructed by fusing human color features and features of the histogram of oriented gradient to better characterize the human body.Secondly,the human body observation model is constructed based on the human body motion model and appearance model to measure the similarities between the human body state sequence in the historical frame and the human body observation result in the current frame,and the cost matrix is then obtained.Thirdly,the Hungarian maximum matching algorithm is employed to match each human body in the current and historical frames,and the exception detection mechanism is simultaneously constructed to further reduce the probability of human tracking and matching failure.Finally,a multi-person vision tracking verification platform was constructed,and the achieved average accuracy was 96.6%in the case of human body overlapping,occlusion,disappearance,and appearance;this verifies the feasibility and effectiveness of the proposed method.展开更多
文摘Despite significant developments in 3D multi-view multi-person (3D MM) tracking, current frameworks separately target footprint tracking, or pose tracking. Frameworks designed for the former cannot be used for the latter, because they directly obtain 3D positions on the ground plane via a homography projection, which is inapplicable to 3D poses above the ground. In contrast, frameworks designed for pose tracking generally isolate multi-view and multi-frame associations and may not be sufficiently robust for footprint tracking, which utilizes fewer key points than pose tracking, weakening multi-view association cues in a single frame. This study presents a unified multi-view multi-person tracking framework to bridge the gap between footprint tracking and pose tracking. Without additional modifications, the framework can adopt monocular 2D bounding boxes and 2D poses as its input to produce robust 3D trajectories for multiple persons. Importantly, multi-frame and multi-view information are jointly employed to improve association and triangulation. Our framework is shown to provide state-of-the-art performance on the Campus and Shelf datasets for 3D pose tracking, with comparable results on the WILDTRACK and MMPTRACK datasets for 3D footprint tracking.
文摘To improve the tracking accuracy of persons in the surveillance video,we proposed an algorithm for multi-target tracking persons based on deep learning.In this paper,we used You Only Look Once v5(YOLOv5)to obtain person targets of each frame in the video and used Simple Online and Realtime Tracking with a Deep Association Metric(DeepSORT)to do cascade matching and Intersection Over Union(IOU)matching of person targets between different frames.To solve the IDSwitch problem caused by the low feature extraction ability of the Re-Identification(ReID)network in the process of cascade matching,we introduced Spatial Relation-aware Global Attention(RGA-S)and Channel Relation-aware Global Attention(RGA-C)attention mechanisms into the network structure.The pre-training weights are loaded for Transfer Learning training on the dataset CUHK03.To enhance the discrimination performance of the network,we proposed a new loss function design method,which introduces the Hard-Negative-Mining way into the benchmark triplet loss.To improve the classification accuracy of the network,we introduced a Label-Smoothing regularization method to the cross-entropy loss.To facilitate the model’s convergence stability and convergence speed at the early training stage and to prevent the model from oscillating around the global optimum due to excessive learning rate at the later stage of training,this paper proposed a learning rate regulation method combining Linear-Warmup and exponential decay.The experimental results on CUHK03 show that the mean Average Precision(mAP)of the improved ReID network is 76.5%.The Top 1 is 42.5%,the Top 5 is 65.4%,and the Top 10 is 74.3%in Cumulative Matching Characteristics(CMC);Compared with the original algorithm,the tracking accuracy of the optimized DeepSORT tracking algorithm is improved by 2.5%,the tracking precision is improved by 3.8%.The number of identity switching is reduced by 25%.The algorithm effectively alleviates the IDSwitch problem,improves the tracking accuracy of persons,and has a high practical
基金the Natural Science Foundation of Shanghai Municipality(Grant No.18ZR1415100)the National Natural Science Foundation of China(Grant No.61703262)。
文摘This paper presents a multi-person vision tracking approach based on human body localization features to address the problem of interactive object localization and tracking in a home monitoring scenario.Firstly,the human body localization model is used to obtain the 3D position of the human body,which is then used to construct the human body motion model based on the Kalman filter method.At the same time,the human appearance model is constructed by fusing human color features and features of the histogram of oriented gradient to better characterize the human body.Secondly,the human body observation model is constructed based on the human body motion model and appearance model to measure the similarities between the human body state sequence in the historical frame and the human body observation result in the current frame,and the cost matrix is then obtained.Thirdly,the Hungarian maximum matching algorithm is employed to match each human body in the current and historical frames,and the exception detection mechanism is simultaneously constructed to further reduce the probability of human tracking and matching failure.Finally,a multi-person vision tracking verification platform was constructed,and the achieved average accuracy was 96.6%in the case of human body overlapping,occlusion,disappearance,and appearance;this verifies the feasibility and effectiveness of the proposed method.