期刊文献+

基于表示学习的跨模态检索模型与特征抽取研究综述 被引量:20

A Review of the Cross-Modal Retrieval Model and Feature Extraction Based on Representation Learning
下载PDF
导出
摘要 以深度学习为代表的表示学习在语音识别、图像分析和自然语言处理领域获得了广泛关注与应用,它不仅推动了人工智能的深入研究和快速发展,而且促使企业思索新的运营与盈利模式。本文拟通过综述的形式对这些研究进行梳理,形成较为完整的综述。通过对国内外相关文献的调查和整理,从信息抽取与表示、跨模态系统建模两维度评述了基于表示学习的跨模态检索与特征抽取方面的研究成果。文章首先概括了自动编码器、稀疏编码、限制玻尔兹曼机、深度信念网络、卷积神经网络等五个经典的表示学习算法,然后从基于共享层建立各模态间的关联、表示空间中各模态间的关联、以深度学习为基础的跨模态建模算法等三方面归纳跨模态系统建模研究的现状,最后总结了跨模态检索的评价指标。研究发现:已有检索研究对于单模态信息检索较为丰富,查询和候选集的内容均属于同一模态;跨模态检索也仅限于对图像、文本两个模态对齐的语料。未来需要增加语音、视频、图像、文本等多模态数据的检索,改进深度学习算法构建多模态检索模型,实现三种或以上的跨模态检索。此外,尚需建立适合多模态检索系统的评价指标。 Representation learning, particularly deep learning, has received wide attention and seen application in speech recognition, image analysis, and natural language processing fields. It not only promotes the research and development of artificial intelligence, but urges enterprises to consider new business and profit models. This paper aims to examine these studies in the form of reviews, and ultimately form a complete overview of the topic. Through the investigation and organization of relevant literature locally and internationally, this paper summarizes the research results of cross-modal retrieval and feature extraction based on representation learning from the two dimensions of information extraction and representation, and cross-modal system modeling. The main research includes summarizing five traditional representation learning algorithms, which are the autoencoder, sparse encoding, the restricted Boltzmann machine, deep belief networks, and convolutional neural networks. From the shared layer relationship between each mode, the representation space, and the correlation between each mode’s in-depth learning-based cross-modal modeling algorithm, the present state of research on modeling systems based on cross-modal modeling is summed up. Finally, the evaluation index of cross-modal retrieval is summarized. The study finds that the existing retrieval research is rich in single-modal information retrieval and that the content of queries and candidate sets belong to the same modality, whereas cross-modal retrieval is limited to two modal alignment languages of images and texts. Future research needs to see an increase of modal retrieval of audio, video, images, text, and other multimodal data, and using deeper constructing multimodal retrieval models and feature extraction algorithms to achieve three-orgreater cross-modal retrieval. In addition, an evaluation index of multimodal retrieval systems must be established.
作者 李志义 黄子风 许晓绵 Li Zhiyi;Huang Zifeng;Xu Xiaomian(Economic & Management College of South China Normal University,Guangzhou 510006)
出处 《情报学报》 CSSCI CSCD 北大核心 2018年第4期422-435,共14页 Journal of the China Society for Scientific and Technical Information
基金 国家社会科学基金项目"基于表示学习的跨模态检索模型与特征抽取研究"(17BTQ062)
关键词 表示学习 跨模态检索 特征抽取 模型 综述 representation learning cross modal retrieval feature extraction model review
  • 相关文献

参考文献32

二级参考文献352

  • 1段瑞雪,王小捷,孙月萍,李文峰.HDP主题模型的用户意图聚类[J].北京邮电大学学报,2011,34(S1):55-58. 被引量:6
  • 2Borga M, Knutsson H. Canonical correlation analysis in early vision Processing. In: Proc. of the 9th European Symp. on Artificial Neural Networks. 2001. 309-314. 被引量:1
  • 3Gao HB, Hong WX, Cui JX, Xu YH. Optimization of principal component analysis in feature extraction. In: Proc. of the IEEE Int'l Conf. on Mechatronics and Automation. 2007.3128-3132. 被引量:1
  • 4Zheng WM, Zhou XY, Zou CR, Zhao L. Facial expression recognition using kernel canonical correlation analysis (KCCA). IEEE Trans. on Neural Networks, 2006,17(1):233-238. 被引量:1
  • 5Loog M, B. van Ginneken B, Duin RPW. Dimensionality reduction by canonical contextual correlation projections. In: Proc. of the European Conf. on Computer Vision. 2004. 562-573. 被引量:1
  • 6Hel-Or Y. The canonical correlations of color images and their use for demosaicing. Technical Report, HPL-2003-164(R1), HP Labs., 2004. 被引量:1
  • 7Friman O, Carlsson J, Lundberg P, Borga M, Knutsson H. Detection of neural activity in functional MRI using canonical correlation analysis. Magnetic Resonance in Medicine, 2001,45(2):323-330. 被引量:1
  • 8Knutsson H, Borga M, Landelius T. Learning multidimensional signal processing. In: Proc. of the 14th Int'l Conf. on Pattern Recognition. 1998. 1416-1420. 被引量:1
  • 9Nielsen AA. Multiset canonical correlations analysis and multispectral, truly multitemporal remote sensing data. IEEE Trans. on Image Processing, 2002,11 (3):293-305. 被引量:1
  • 10Vlassis N, Motomura Y, Krose B. Supervised linear feature extraction for mobile robot localization. In: Proc. of the 2000 IEEE Int'l Conf. on Robotics and Automation. 2000. 2979-2984. 被引量:1

共引文献911

同被引文献180

引证文献20

二级引证文献79

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部