Crowd counting is recently becoming a hot research topic, which aims to count the number of the people in different crowded scenes. Existing methods are mainly based on training-testing pattern and rely on large data ...Crowd counting is recently becoming a hot research topic, which aims to count the number of the people in different crowded scenes. Existing methods are mainly based on training-testing pattern and rely on large data training, which fails to accurately count the crowd in real-world scenes because of the limitation of model’s generalization capability. To alleviate this issue, a scene-adaptive crowd counting method based on meta-learning with Dual-illumination Merging Network (DMNet) is proposed in this paper. The proposed method based on learning-to-learn and few-shot learning is able to adapt different scenes which only contain a few labeled images. To generate high quality density map and count the crowd in low-lighting scene, the DMNet is proposed, which contains Multi-scale Feature Extraction module and Element-wise Fusion Module. The Multi-scale Feature Extraction module is used to extract the image feature by multi-scale convolutions, which helps to improve network accuracy. The Element-wise Fusion module fuses the low-lighting feature and illumination-enhanced feature, which supplements the missing illumination in low-lighting environments. Experimental results on benchmarks, WorldExpo’10, DISCO, USCD, and Mall, show that the proposed method outperforms the existing state-of-the-art methods in accuracy and gets satisfied results.展开更多
Object detection is one of the hottest research directions in computer vision,has already made impressive progress in academia,and has many valuable applications in the industry.However,the mainstream detection method...Object detection is one of the hottest research directions in computer vision,has already made impressive progress in academia,and has many valuable applications in the industry.However,the mainstream detection methods still have two shortcomings:(1)even a model that is well trained using large amounts of data still cannot generally be used across different kinds of scenes;(2)once a model is deployed,it cannot autonomously evolve along with the accumulated unlabeled scene data.To address these problems,and inspired by visual knowledge theory,we propose a novel scene-adaptive evolution unsupervised video object detection algorithm that can decrease the impact of scene changes through the concept of object groups.We first extract a large number of object proposals from unlabeled data through a pre-trained detection model.Second,we build the visual knowledge dictionary of object concepts by clustering the proposals,in which each cluster center represents an object prototype.Third,we look into the relations between different clusters and the object information of different groups,and propose a graph-based group information propagation strategy to determine the category of an object concept,which can effectively distinguish positive and negative proposals.With these pseudo labels,we can easily fine-tune the pretrained model.The effectiveness of the proposed method is verified by performing different experiments,and the significant improvements are achieved.展开更多
Indoor multi-tracking is more challenging compared with outdoor tasks due to frequent occlusion, view-truncation, severe scale change and pose variation, which may bring considerable unreliability and ambiguity to tar...Indoor multi-tracking is more challenging compared with outdoor tasks due to frequent occlusion, view-truncation, severe scale change and pose variation, which may bring considerable unreliability and ambiguity to target representation and data association. So discriminative and reliable target representation is vital for accurate data association in multi-tracking. Pervious works always combine bunch of features to increase the discriminative power, but this is prone to error accumulation and unnecessary computational cost, which may increase ambiguity on the contrary. Moreover, reliability of a same feature in different scenes may vary a lot, especially for currently widespread network cameras, which are settled in various and complex indoor scenes, previous fixed feature selection schemes cannot meet general requirements. To properly handle these problems, first, we propose a scene-adaptive hierarchical data association scheme, which adaptively selects features with higher reliability on target representation in the applied scene, and gradually combines features to the minimum requirement of discriminating ambiguous targets; second, a novel depth-invariant part-based appearance model using RGB-D data is proposed which makes the appearance model robust to scale change, partial occlusion and view-truncation. The introduce of RGB-D data increases the diversity of features, which provides more types of features for feature selection in data association and enhances the final multi-tracking performance. We validate our method from several aspects including scene-adaptive feature selection scheme, hierarchical data association scheme and RGB-D based appearance modeling scheme in various indoor scenes, which demonstrates its effectiveness and efficiency on improving multi-tracking performances in various indoor scenes.展开更多
基金supported by the National Natural Science Foundation of China(Grant Nos.62076117 and 61762061)the Natural Science Foundation of Jiangxi Province,China(20161ACB20004)Jiangxi Key Laboratory of Smart City(20192BCD40002).
文摘Crowd counting is recently becoming a hot research topic, which aims to count the number of the people in different crowded scenes. Existing methods are mainly based on training-testing pattern and rely on large data training, which fails to accurately count the crowd in real-world scenes because of the limitation of model’s generalization capability. To alleviate this issue, a scene-adaptive crowd counting method based on meta-learning with Dual-illumination Merging Network (DMNet) is proposed in this paper. The proposed method based on learning-to-learn and few-shot learning is able to adapt different scenes which only contain a few labeled images. To generate high quality density map and count the crowd in low-lighting scene, the DMNet is proposed, which contains Multi-scale Feature Extraction module and Element-wise Fusion Module. The Multi-scale Feature Extraction module is used to extract the image feature by multi-scale convolutions, which helps to improve network accuracy. The Element-wise Fusion module fuses the low-lighting feature and illumination-enhanced feature, which supplements the missing illumination in low-lighting environments. Experimental results on benchmarks, WorldExpo’10, DISCO, USCD, and Mall, show that the proposed method outperforms the existing state-of-the-art methods in accuracy and gets satisfied results.
基金Project supported by the National Key R&D Program of China(No.2020AAA010400X)and the Hikvision Open Fund,China。
文摘Object detection is one of the hottest research directions in computer vision,has already made impressive progress in academia,and has many valuable applications in the industry.However,the mainstream detection methods still have two shortcomings:(1)even a model that is well trained using large amounts of data still cannot generally be used across different kinds of scenes;(2)once a model is deployed,it cannot autonomously evolve along with the accumulated unlabeled scene data.To address these problems,and inspired by visual knowledge theory,we propose a novel scene-adaptive evolution unsupervised video object detection algorithm that can decrease the impact of scene changes through the concept of object groups.We first extract a large number of object proposals from unlabeled data through a pre-trained detection model.Second,we build the visual knowledge dictionary of object concepts by clustering the proposals,in which each cluster center represents an object prototype.Third,we look into the relations between different clusters and the object information of different groups,and propose a graph-based group information propagation strategy to determine the category of an object concept,which can effectively distinguish positive and negative proposals.With these pseudo labels,we can easily fine-tune the pretrained model.The effectiveness of the proposed method is verified by performing different experiments,and the significant improvements are achieved.
基金This work is supported by National Natural Science Foundation of China (NSFC, No. 61340046), National High Technology Research and Development Program of China (863 Program, No. 2006AA04Z247), Scientific and Technical Innovation Commission of Shenzhen Municipality (JCYJ20130331144631730, JCYJ20130331144716089), Specialized Research Fund for the Doctoral Program of Higher Education (No. 20130001110011).
文摘Indoor multi-tracking is more challenging compared with outdoor tasks due to frequent occlusion, view-truncation, severe scale change and pose variation, which may bring considerable unreliability and ambiguity to target representation and data association. So discriminative and reliable target representation is vital for accurate data association in multi-tracking. Pervious works always combine bunch of features to increase the discriminative power, but this is prone to error accumulation and unnecessary computational cost, which may increase ambiguity on the contrary. Moreover, reliability of a same feature in different scenes may vary a lot, especially for currently widespread network cameras, which are settled in various and complex indoor scenes, previous fixed feature selection schemes cannot meet general requirements. To properly handle these problems, first, we propose a scene-adaptive hierarchical data association scheme, which adaptively selects features with higher reliability on target representation in the applied scene, and gradually combines features to the minimum requirement of discriminating ambiguous targets; second, a novel depth-invariant part-based appearance model using RGB-D data is proposed which makes the appearance model robust to scale change, partial occlusion and view-truncation. The introduce of RGB-D data increases the diversity of features, which provides more types of features for feature selection in data association and enhances the final multi-tracking performance. We validate our method from several aspects including scene-adaptive feature selection scheme, hierarchical data association scheme and RGB-D based appearance modeling scheme in various indoor scenes, which demonstrates its effectiveness and efficiency on improving multi-tracking performances in various indoor scenes.