Image classification based on bag-of-words(BOW)has a broad application prospect in pattern recognition field but the shortcomings such as single feature and low classification accuracy are apparent.To deal with this...Image classification based on bag-of-words(BOW)has a broad application prospect in pattern recognition field but the shortcomings such as single feature and low classification accuracy are apparent.To deal with this problem,this paper proposes to combine two ingredients:(i)Three features with functions of mutual complementation are adopted to describe the images,including pyramid histogram of words(PHOW),pyramid histogram of color(PHOC)and pyramid histogram of orientated gradients(PHOG).(ii)An adaptive feature-weight adjusted image categorization algorithm based on the SVM and the decision level fusion of multiple features are employed.Experiments are carried out on the Caltech101 database,which confirms the validity of the proposed approach.The experimental results show that the classification accuracy rate of the proposed method is improved by 7%-14%higher than that of the traditional BOW methods.With full utilization of global,local and spatial information,the algorithm is much more complete and flexible to describe the feature information of the image through the multi-feature fusion and the pyramid structure composed by image spatial multi-resolution decomposition.Significant improvements to the classification accuracy are achieved as the result.展开更多
that are duplicate or near duplicate to a query image.One of the most popular and practical methods in near-duplicate image retrieval is based on bag-of-words(BoW)model.However,the fundamental deficiency of current Bo...that are duplicate or near duplicate to a query image.One of the most popular and practical methods in near-duplicate image retrieval is based on bag-of-words(BoW)model.However,the fundamental deficiency of current BoW method is the gap between visual word and image’s semantic meaning.Similar problem also plagues existing text retrieval.A prevalent method against such issue in text retrieval is to eliminate text synonymy and polysemy and therefore improve the whole performance.Our proposed approach borrows ideas from text retrieval and tries to overcome these deficiencies of BoW model by treating the semantic gap problem as visual synonymy and polysemy issues.We use visual synonymy in a very general sense to describe the fact that there are many different visual words referring to the same visual meaning.By visual polysemy,we refer to the general fact that most visual words have more than one distinct meaning.To eliminate visual synonymy,we present an extended similarity function to implicitly extend query visual words.To eliminate visual polysemy,we use visual pattern and prove that the most efficient way of using visual pattern is merging visual word vector together with visual pattern vector and obtain the similarity score by cosine function.In addition,we observe that there is a high possibility that duplicates visual words occur in an adjacent area.Therefore,we modify traditional Apriori algorithm to mine quantitative pattern that can be defined as patterns containing duplicate items.Experiments prove quantitative patterns improving mean average precision(MAP)significantly.展开更多
目的随着图像检索所依赖的特征愈发精细化,在提高检索精度的同时,也不可避免地产生众多非相关和冗余的特征。针对在大规模图像检索和分类中高维度特征所带来的时间和空间挑战,从减少特征数量这一简单思路出发,提出了一种有效的连通图特...目的随着图像检索所依赖的特征愈发精细化,在提高检索精度的同时,也不可避免地产生众多非相关和冗余的特征。针对在大规模图像检索和分类中高维度特征所带来的时间和空间挑战,从减少特征数量这一简单思路出发,提出了一种有效的连通图特征点选择方法,探寻图像检索精度和特征选择间的平衡。方法基于词袋模型(bag of words,BOW)的图像检索机制,结合最近邻单词交叉核、特征距离和特征尺度等属性,构建包含若干个连通分支和平凡图的像素级特征分离图,利用子图特征点的逆文本频率修正边权值,从各连通分量的节点数量和孤立点最近邻单词相关性两个方面开展特征选择,将问题转化为在保证图像匹配精度情况下,最小化特征分离图的阶。结果实验采用Oxford和Paris公开数据集,在特征存储容量、时间复杂度集和检索精度等方面进行评估,并对不同特征抽取和选择方法进行了对比。实验结果表明选择后的特征数量和存储容量有效约简50%以上;100 k词典的KD-Tree查询时间减少近58%;相对于其他编码方法和全连接层特征,Oxford数据集检索精度平均提升近7.5%;Paris数据集中检索精度平均高于其他编码方法4%,但检索效果不如全连接层特征。大量实验表明了大连通域的冗余性和孤立点的可选择性。结论通过构建特征分离图,摒弃大连通域的冗余特征点,保留具有最近邻单词相关性的孤立特征点,最终形成图像的精简特征点集。整体检索效果稳定,其检索精度基本与原始特征点集持平,且部分类别效果优于原始特征和其他方法。同时,选择后特征的重用性好,方便进一步聚合集成。展开更多
针对传统BOW(Bag of Words)模型用于场景图像分类时的不足,通过引入关联规则的MFI(Maximum Frequent Itemsets)和Topology模型对其进行改进。为了突出同类图像的视觉单词,提取同类图像的MFI后,对其中频繁出现的视觉单词进行加权处理,增...针对传统BOW(Bag of Words)模型用于场景图像分类时的不足,通过引入关联规则的MFI(Maximum Frequent Itemsets)和Topology模型对其进行改进。为了突出同类图像的视觉单词,提取同类图像的MFI后,对其中频繁出现的视觉单词进行加权处理,增强同类图像的共有特征。同时,为了提高视觉词典的生成效率,利用Topology模型对原始模型进行分工并行处理。通过COREL和Caltech-256图像库的实验,证明改进后的模型提高了对场景图像的分类性能,并验证了其Topology模型的有效性和可行性。展开更多
基金Supported by Foundation for Innovative Research Groups of the National Natural Science Foundation of China(61321002)Projects of Major International(Regional)Jiont Research Program NSFC(61120106010)+1 种基金Beijing Education Committee Cooperation Building Foundation ProjectProgram for Changjiang Scholars and Innovative Research Team in University(IRT1208)
文摘Image classification based on bag-of-words(BOW)has a broad application prospect in pattern recognition field but the shortcomings such as single feature and low classification accuracy are apparent.To deal with this problem,this paper proposes to combine two ingredients:(i)Three features with functions of mutual complementation are adopted to describe the images,including pyramid histogram of words(PHOW),pyramid histogram of color(PHOC)and pyramid histogram of orientated gradients(PHOG).(ii)An adaptive feature-weight adjusted image categorization algorithm based on the SVM and the decision level fusion of multiple features are employed.Experiments are carried out on the Caltech101 database,which confirms the validity of the proposed approach.The experimental results show that the classification accuracy rate of the proposed method is improved by 7%-14%higher than that of the traditional BOW methods.With full utilization of global,local and spatial information,the algorithm is much more complete and flexible to describe the feature information of the image through the multi-feature fusion and the pyramid structure composed by image spatial multi-resolution decomposition.Significant improvements to the classification accuracy are achieved as the result.
文摘that are duplicate or near duplicate to a query image.One of the most popular and practical methods in near-duplicate image retrieval is based on bag-of-words(BoW)model.However,the fundamental deficiency of current BoW method is the gap between visual word and image’s semantic meaning.Similar problem also plagues existing text retrieval.A prevalent method against such issue in text retrieval is to eliminate text synonymy and polysemy and therefore improve the whole performance.Our proposed approach borrows ideas from text retrieval and tries to overcome these deficiencies of BoW model by treating the semantic gap problem as visual synonymy and polysemy issues.We use visual synonymy in a very general sense to describe the fact that there are many different visual words referring to the same visual meaning.By visual polysemy,we refer to the general fact that most visual words have more than one distinct meaning.To eliminate visual synonymy,we present an extended similarity function to implicitly extend query visual words.To eliminate visual polysemy,we use visual pattern and prove that the most efficient way of using visual pattern is merging visual word vector together with visual pattern vector and obtain the similarity score by cosine function.In addition,we observe that there is a high possibility that duplicates visual words occur in an adjacent area.Therefore,we modify traditional Apriori algorithm to mine quantitative pattern that can be defined as patterns containing duplicate items.Experiments prove quantitative patterns improving mean average precision(MAP)significantly.
文摘目的随着图像检索所依赖的特征愈发精细化,在提高检索精度的同时,也不可避免地产生众多非相关和冗余的特征。针对在大规模图像检索和分类中高维度特征所带来的时间和空间挑战,从减少特征数量这一简单思路出发,提出了一种有效的连通图特征点选择方法,探寻图像检索精度和特征选择间的平衡。方法基于词袋模型(bag of words,BOW)的图像检索机制,结合最近邻单词交叉核、特征距离和特征尺度等属性,构建包含若干个连通分支和平凡图的像素级特征分离图,利用子图特征点的逆文本频率修正边权值,从各连通分量的节点数量和孤立点最近邻单词相关性两个方面开展特征选择,将问题转化为在保证图像匹配精度情况下,最小化特征分离图的阶。结果实验采用Oxford和Paris公开数据集,在特征存储容量、时间复杂度集和检索精度等方面进行评估,并对不同特征抽取和选择方法进行了对比。实验结果表明选择后的特征数量和存储容量有效约简50%以上;100 k词典的KD-Tree查询时间减少近58%;相对于其他编码方法和全连接层特征,Oxford数据集检索精度平均提升近7.5%;Paris数据集中检索精度平均高于其他编码方法4%,但检索效果不如全连接层特征。大量实验表明了大连通域的冗余性和孤立点的可选择性。结论通过构建特征分离图,摒弃大连通域的冗余特征点,保留具有最近邻单词相关性的孤立特征点,最终形成图像的精简特征点集。整体检索效果稳定,其检索精度基本与原始特征点集持平,且部分类别效果优于原始特征和其他方法。同时,选择后特征的重用性好,方便进一步聚合集成。
文摘针对传统BOW(Bag of Words)模型用于场景图像分类时的不足,通过引入关联规则的MFI(Maximum Frequent Itemsets)和Topology模型对其进行改进。为了突出同类图像的视觉单词,提取同类图像的MFI后,对其中频繁出现的视觉单词进行加权处理,增强同类图像的共有特征。同时,为了提高视觉词典的生成效率,利用Topology模型对原始模型进行分工并行处理。通过COREL和Caltech-256图像库的实验,证明改进后的模型提高了对场景图像的分类性能,并验证了其Topology模型的有效性和可行性。