A new scheme named personalized image retrieval technique based on visual perception is proposed in this letter, whose motive is to narrow the semantic gap by directly perceiving user's visual information. It uses...A new scheme named personalized image retrieval technique based on visual perception is proposed in this letter, whose motive is to narrow the semantic gap by directly perceiving user's visual information. It uses visual attention model to segment image regions and eye-tracking technique to record fixations. Visual perception is obtained by analyzing the fixations in regions to measure gaze interests. Integrating visual perception into attention model is to detect the Regions Of Interest (ROIs), whose features are extracted and analyzed, then feedback interests to optimize the results and construct user profiles.展开更多
The traditional search engines don’t consider that the users interest are different, and they don’t provide personalized retrieval service, so the retrieval efficiency is not high. In order to solve the problem, a m...The traditional search engines don’t consider that the users interest are different, and they don’t provide personalized retrieval service, so the retrieval efficiency is not high. In order to solve the problem, a method for personalized web image retrieval based on user interest model is proposed. Firstly, the formalized definition of user interest model is provided. Then the user interest model combines the methods of explicit tracking and implicit tracking to improve user’s interest information and provide personalized web image retrieval. Experimental results show that the user interest model can be successfully applied in web image retrieval.展开更多
目的文本到图像的行人重识别是一个图像文本跨模态检索的子任务,现有方法大都采用在全局特征匹配的基础上加入多个局部特征进行跨模态匹配。这些局部特征匹配的方法都过分复杂且在检索时会大幅减慢速度,因此需要一种更简洁有效的方法提...目的文本到图像的行人重识别是一个图像文本跨模态检索的子任务,现有方法大都采用在全局特征匹配的基础上加入多个局部特征进行跨模态匹配。这些局部特征匹配的方法都过分复杂且在检索时会大幅减慢速度,因此需要一种更简洁有效的方法提升文本到图像的行人重识别模型的跨模态对齐能力。对此,本文基于通用图像文本对大规模数据集预训练模型,对比语言—图像预训练(contrastive language-image pretraining,CLIP),提出了一种温度投影匹配结合CLIP的文本到图像行人重识别方法。方法借助CLIP预训练模型的跨模态图像文本对齐的能力,本文模型仅使用全局特征进行细粒度的图像文本语义特征对齐。此外,本文提出了温度缩放跨模态投影匹配(temperature-scaled cross modal projection matching,TCMPM)损失函数来进行图像文本跨模态特征匹配。结果在本领域的两个数据集上与最新的文本到图像行人重识别方法进行实验对比,在CUHK-PEDES(CUHK person description)和ICFG-PEDES(identity-centric and fine-grained person description)数据集中,相比于现有性能较好的局部匹配模型,本文方法Rank-1值分别提高了5.92%和1.21%。结论本文提出的基于双流Transformer的文本到图像行人重识别方法可以直接迁移CLIP的跨模态匹配知识,无须冻结模型参数训练或接入其他小模型辅助训练。结合提出的TCMPM损失函数,本文方法仅使用全局特征匹配就在检索性能上大幅超过了现有局部特征方法。展开更多
Cross-modality pedestrian re-identification has important appli-cations in the field of surveillance.Due to variations in posture,camera per-spective,and camera modality,some salient pedestrian features are difficult ...Cross-modality pedestrian re-identification has important appli-cations in the field of surveillance.Due to variations in posture,camera per-spective,and camera modality,some salient pedestrian features are difficult to provide effective retrieval cues.Therefore,it becomes a challenge to design an effective strategy to extract more discriminative pedestrian detail.Although many effective methods for detailed feature extraction are proposed,there are still some shortcomings in filtering background and modality noise.To further purify the features,a pure detail feature extraction network(PDFENet)is proposed for VI-ReID.PDFENet includes three modules,adaptive detail mask generation module(ADMG),inter-detail interaction module(IDI)and cross-modality cross-entropy(CMCE).ADMG and IDI use human joints and their semantic associations to suppress background noise in features.CMCE guides the model to ignore modality noise by generating modality-shared feature labels.Specifically,ADMG generates masks for pedestrian details based on pose estimation.Masks are used to suppress background information and enhance pedestrian detail information.Besides,IDI mines the semantic relations among details to further refine the features.Finally,CMCE cross-combines classifiers and features to generate modality-shared feature labels to guide model training.Extensive ablation experiments as well as visualization results have demonstrated the effectiveness of PDFENet in eliminating background and modality noise.In addition,comparison experi-ments in two publicly available datasets also show the competitiveness of our approach.展开更多
基金Supported by the National Natural Science Foundation of China (No.60472036, No.60431020, No.60402036)the Natural Science Foundation of Beijing (No.4042008)and Ph.D. Foundation of Ministry of Education (No.20040005015).
文摘A new scheme named personalized image retrieval technique based on visual perception is proposed in this letter, whose motive is to narrow the semantic gap by directly perceiving user's visual information. It uses visual attention model to segment image regions and eye-tracking technique to record fixations. Visual perception is obtained by analyzing the fixations in regions to measure gaze interests. Integrating visual perception into attention model is to detect the Regions Of Interest (ROIs), whose features are extracted and analyzed, then feedback interests to optimize the results and construct user profiles.
文摘The traditional search engines don’t consider that the users interest are different, and they don’t provide personalized retrieval service, so the retrieval efficiency is not high. In order to solve the problem, a method for personalized web image retrieval based on user interest model is proposed. Firstly, the formalized definition of user interest model is provided. Then the user interest model combines the methods of explicit tracking and implicit tracking to improve user’s interest information and provide personalized web image retrieval. Experimental results show that the user interest model can be successfully applied in web image retrieval.
文摘目的文本到图像的行人重识别是一个图像文本跨模态检索的子任务,现有方法大都采用在全局特征匹配的基础上加入多个局部特征进行跨模态匹配。这些局部特征匹配的方法都过分复杂且在检索时会大幅减慢速度,因此需要一种更简洁有效的方法提升文本到图像的行人重识别模型的跨模态对齐能力。对此,本文基于通用图像文本对大规模数据集预训练模型,对比语言—图像预训练(contrastive language-image pretraining,CLIP),提出了一种温度投影匹配结合CLIP的文本到图像行人重识别方法。方法借助CLIP预训练模型的跨模态图像文本对齐的能力,本文模型仅使用全局特征进行细粒度的图像文本语义特征对齐。此外,本文提出了温度缩放跨模态投影匹配(temperature-scaled cross modal projection matching,TCMPM)损失函数来进行图像文本跨模态特征匹配。结果在本领域的两个数据集上与最新的文本到图像行人重识别方法进行实验对比,在CUHK-PEDES(CUHK person description)和ICFG-PEDES(identity-centric and fine-grained person description)数据集中,相比于现有性能较好的局部匹配模型,本文方法Rank-1值分别提高了5.92%和1.21%。结论本文提出的基于双流Transformer的文本到图像行人重识别方法可以直接迁移CLIP的跨模态匹配知识,无须冻结模型参数训练或接入其他小模型辅助训练。结合提出的TCMPM损失函数,本文方法仅使用全局特征匹配就在检索性能上大幅超过了现有局部特征方法。
基金supported by the National Natural Science Foundation of China (Grant No.61906168,62202429)Zhejiang Provincial Natural Science Foundation of China (Grant No.LY23F020023)Construction of Hubei Provincial Key Laboratory for Intelligent Visual Monitoring of Hydropower Projects (2022SDSJ01).
文摘Cross-modality pedestrian re-identification has important appli-cations in the field of surveillance.Due to variations in posture,camera per-spective,and camera modality,some salient pedestrian features are difficult to provide effective retrieval cues.Therefore,it becomes a challenge to design an effective strategy to extract more discriminative pedestrian detail.Although many effective methods for detailed feature extraction are proposed,there are still some shortcomings in filtering background and modality noise.To further purify the features,a pure detail feature extraction network(PDFENet)is proposed for VI-ReID.PDFENet includes three modules,adaptive detail mask generation module(ADMG),inter-detail interaction module(IDI)and cross-modality cross-entropy(CMCE).ADMG and IDI use human joints and their semantic associations to suppress background noise in features.CMCE guides the model to ignore modality noise by generating modality-shared feature labels.Specifically,ADMG generates masks for pedestrian details based on pose estimation.Masks are used to suppress background information and enhance pedestrian detail information.Besides,IDI mines the semantic relations among details to further refine the features.Finally,CMCE cross-combines classifiers and features to generate modality-shared feature labels to guide model training.Extensive ablation experiments as well as visualization results have demonstrated the effectiveness of PDFENet in eliminating background and modality noise.In addition,comparison experi-ments in two publicly available datasets also show the competitiveness of our approach.