摘要
场景识别是计算机视觉研究中的一项基本任务。与图像分类不同,场景识别需要综合考虑场景的背景信息、局部场景特征以及物体特征等因素,导致经典卷积神经网络在场景识别上性能欠佳。为解决此问题,文中提出了一种基于深度卷积特征的场景全局与局部表示方法。此方法对场景图片的卷积特征进行变换从而为每张图片生成一个综合的特征表示。使用CAM获取局部关键区域,利用LSTM对局部区域的卷积特征进行编码形成场景图片的局部表示;通过注意力机制融合场景特征与物体特征形成场景图片的全局表示。最后,在MIT indoor 67场景识别数据集上进行实验,结果显示采用文中所提方法取得了87.59%的识别准确度。
Scene Recognition is a fundamental task in computer vision.Different from image classification,scene recognition needs to take a comprehensive consideration of factors such as global layout information,local scene features,and object features,which leads to the poor performance of classic convolutional neural network for scene recognition.In order to solve this issue,this study proposes a global and local scene representation method based on deep convolutional features.The proposed method transforms deep convolutional features of scene image to generate a comprehensive representation for each image.Specifically,CAM is used to discovery local key regions,and LSTM is used to encode convolutional features extracted from local key regions to produce the local representation for scene images.Attention mechanism is adopted to fuse scene features and object features to form a global representation for scene images.Finally,the evaluation experiments are conducted on MIT indoor 67 data set and the results show that the test accuracy is up to 87.59%using the proposed method.
作者
林潮威
李菲菲
陈虬
LIN Chaowei;LI Feifei;CHEN Qiu(School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China)
出处
《电子科技》
2022年第4期20-27,共8页
Electronic Science and Technology
基金
上海市高校特聘教授(东方学者)岗位计划(ES2015XX)。
关键词
场景识别
卷积神经网络
卷积特征
特征变换
类激活图
长短期记忆
注意力机制
端到端网络
scene recognition
convolutional neural networks
convolutional features
feature transform
CAM
LSTM
attention mechanism
end-to-end network