相比于基于比特数据的信息处理及通信技术,人类通过语义处理和传递信息的方式,在面对智能体间传递处理海量信息这一问题时显得更为高效和自然.然而由于目前缺乏关于语义度量和刻画的数学描述,涉及语义的应用无法兼顾可解释性和泛化性,...相比于基于比特数据的信息处理及通信技术,人类通过语义处理和传递信息的方式,在面对智能体间传递处理海量信息这一问题时显得更为高效和自然.然而由于目前缺乏关于语义度量和刻画的数学描述,涉及语义的应用无法兼顾可解释性和泛化性,无法发挥语义的高效自然的优势.本文围绕语义的度量和刻画,首先依据信息科学和神经科学相关结论,讨论了语义的内涵,并指出语义具有模块化、多模态、层级化的特点;接着提出了一种多模态信号的语义刻画和度量的数学描述;然后为了验证所提信号语义的刻画和度量的可行性和有效性,在MNIST(Mixed National Institute of Standards and Technology database)手写数字识别和水声目标识别两个应用中进行了实验,获得比传统深度学习更好的性能;最后将语义用于视频编码,实现了远超传统方法的压缩比,展现了语义在通信领域的实用价值.这为未来建立以语义为基础的新型信息处理与通信技术奠定了理论和实践基础.展开更多
In order to make artificial intelligence smarter by detecting user emotions, this project analyzes and determines the current type of human emotions through computer vision, semantic recognition and audio feature clas...In order to make artificial intelligence smarter by detecting user emotions, this project analyzes and determines the current type of human emotions through computer vision, semantic recognition and audio feature classification. In facial expression recognition, for the problems of large number of parameters and poor real-time performance of expression recognition methods based on deep learning, Wang Weimin and Tang Yang Z. et al. proposed a face expression recognition method based on multilayer feature fusion with light-weight convolutional networks, which uses an improved inverted residual network as the basic unit to build a lightweight convolutional network model. Based on this method, this experiment optimizes the traditional CNN MobileNet model and finally constructs a new model framework ms_model_M, which has about 5% of the number of parameters of the traditional CNN MobileNet model. ms_model_M is tested on two commonly used real expression datasets, FER-2013 and AffectNet, the accuracy of ms_model_M is 74.35% and 56.67%, respectively, and the accuracy of the traditional MovbliNet model is 74.11% and 56.48% in the tests of these two datasets. This network structure well balances the recognition accuracy and recognition speed of the model. For semantic emotion detection and audio emotion detection, the existing models and APIs are used in this experiment.展开更多
文摘相比于基于比特数据的信息处理及通信技术,人类通过语义处理和传递信息的方式,在面对智能体间传递处理海量信息这一问题时显得更为高效和自然.然而由于目前缺乏关于语义度量和刻画的数学描述,涉及语义的应用无法兼顾可解释性和泛化性,无法发挥语义的高效自然的优势.本文围绕语义的度量和刻画,首先依据信息科学和神经科学相关结论,讨论了语义的内涵,并指出语义具有模块化、多模态、层级化的特点;接着提出了一种多模态信号的语义刻画和度量的数学描述;然后为了验证所提信号语义的刻画和度量的可行性和有效性,在MNIST(Mixed National Institute of Standards and Technology database)手写数字识别和水声目标识别两个应用中进行了实验,获得比传统深度学习更好的性能;最后将语义用于视频编码,实现了远超传统方法的压缩比,展现了语义在通信领域的实用价值.这为未来建立以语义为基础的新型信息处理与通信技术奠定了理论和实践基础.
文摘In order to make artificial intelligence smarter by detecting user emotions, this project analyzes and determines the current type of human emotions through computer vision, semantic recognition and audio feature classification. In facial expression recognition, for the problems of large number of parameters and poor real-time performance of expression recognition methods based on deep learning, Wang Weimin and Tang Yang Z. et al. proposed a face expression recognition method based on multilayer feature fusion with light-weight convolutional networks, which uses an improved inverted residual network as the basic unit to build a lightweight convolutional network model. Based on this method, this experiment optimizes the traditional CNN MobileNet model and finally constructs a new model framework ms_model_M, which has about 5% of the number of parameters of the traditional CNN MobileNet model. ms_model_M is tested on two commonly used real expression datasets, FER-2013 and AffectNet, the accuracy of ms_model_M is 74.35% and 56.67%, respectively, and the accuracy of the traditional MovbliNet model is 74.11% and 56.48% in the tests of these two datasets. This network structure well balances the recognition accuracy and recognition speed of the model. For semantic emotion detection and audio emotion detection, the existing models and APIs are used in this experiment.