摘要
针对零样本多标签图像分类,提出了一种基于关键词生成与标签匹配的分类方法,在无须额外训练的情况下,通过输入图像来预测其关键词信息及标签概率。图像关键词生成模块利用视觉编码器和文本解码器生成图像语义描述,清洗并提取相关的关键词及权重信息。标签匹配模块使用词嵌入模型编码关键词和待预测标签,并结合权重计算图像关于任意标签的匹配概率,得到预测结果。在5个公开数据集上的实验结果表明,所提方法能够显著提高不同基线模型的图像分类性能与效果。
For zero-shot multi-label image classification,a method based on the keywords generation and label matching was proposed to predict the keywords information and label probabilities through input images without extra training.The image keywords generation module utilized a visual coder and a text decoder to generate a semantic description of the image,and cleaned and extracted the relevant keywords and weights information.Label matching module encoded the keywords and labels to be predicted using word embedding model,and combined the weights to calculate the matching probabilities of the image about any label to get the prediction result.The experimental results on five public datasets show that the proposed method can significantly improve the image classification performance and effectiveness of different baseline models.
作者
高立卫
吕学强
马登豪
GAO Liwei;L Xueqiang;MA Denghao(Beijing Key Laboratory of Internet Culture and Digital Dissemination Research,Beijing Information Science&Technology University,Beijing 100101,China)
出处
《北京信息科技大学学报(自然科学版)》
2024年第6期9-16,共8页
Journal of Beijing Information Science and Technology University(Science and Technology Edition)
基金
国家自然科学基金项目(621710431)
北京市自然科学基金项目(4232025)
青海省创新平台建设专项(2022-ZJ-T02)
北京市教委科研计划科技一般项目(KM202311232003,KM202311232002)。