摘要
基于统计学理论,提出了一种视频多粒度语义分析的通用方法,使得多层次语义分析与多模式信息融合得到统一.为了对时域内容进行表示,首先提出一种具有时间语义语境约束的关键帧选取策略和注意力选择模型;在基本视觉语义识别后,采用一种多层视觉语义分析框架来抽取视觉语义;然后应用隐马尔可夫模型(HMM)和贝叶斯决策进行音频语义理解;最后用一种具有两层结构的仿生多模式融合方案进行语义信息融合.实验结果表明,该方法能有效融合多模式特征,并提取不同粒度的视频语义.
Based on statistics theory, a generic method for video multi-granularity semantic analysis is proposed in this paper, where multi-level semantics analysis and multi-modal information fusion are unified to represent temporal content, a key-frame selection strategy with temporal semantic context restriction and an attention selection mode[ are presented firstly. After recognizing basic visual semantics, a framework for multi-level visual semantics analysis is introduced for visual semantics extraction. Then, Hidden Markov model and Bayesian decision are applied to audio semantic understanding. Finally, a bionic muhimodal fusion scheme with two level structures is used for video semantic information fusion. Experimental results demonstrate the effectiveness of the proposed method to fuse multimodal features, as well as to extract video semantics with different granularity.
出处
《计算机辅助设计与图形学学报》
EI
CSCD
北大核心
2008年第1期85-92,共8页
Journal of Computer-Aided Design & Computer Graphics
基金
国家自然科学基金(60273035)
四川省教育厅青年基金(2006B063)
成都信息工程学院发展基金(KYTZ20060904)