Image-based relocalization is a renewed interest in outdoor environments,because it is an important problem with many applications.PoseNet introduces Convolutional Neural Network(CNN)for the first time to realize the ...Image-based relocalization is a renewed interest in outdoor environments,because it is an important problem with many applications.PoseNet introduces Convolutional Neural Network(CNN)for the first time to realize the real-time camera pose solution based on a single image.In order to solve the problem of precision and robustness of PoseNet and its improved algorithms in complex environment,this paper proposes and implements a new visual relocation method based on deep convolutional neural networks(VNLSTM-PoseNet).Firstly,this method directly resizes the input image without cropping to increase the receptive field of the training image.Then,the image and the corresponding pose labels are put into the improved Long Short-Term Memory based(LSTM-based)PoseNet network for training and the network is optimized by the Nadam optimizer.Finally,the trained network is used for image localization to obtain the camera pose.Experimental results on outdoor public datasets show our VNLSTM-PoseNet can lead to drastic improvements in relocalization performance compared to existing state-of-theart CNN-based methods.展开更多
Hand gestures are a natural way for human-robot interaction.Vision based dynamic hand gesture recognition has become a hot research topic due to its various applications.This paper presents a novel deep learning netwo...Hand gestures are a natural way for human-robot interaction.Vision based dynamic hand gesture recognition has become a hot research topic due to its various applications.This paper presents a novel deep learning network for hand gesture recognition.The network integrates several well-proved modules together to learn both short-term and long-term features from video inputs and meanwhile avoid intensive computation.To learn short-term features,each video input is segmented into a fixed number of frame groups.A frame is randomly selected from each group and represented as an RGB image as well as an optical flow snapshot.These two entities are fused and fed into a convolutional neural network(Conv Net)for feature extraction.The Conv Nets for all groups share parameters.To learn longterm features,outputs from all Conv Nets are fed into a long short-term memory(LSTM)network,by which a final classification result is predicted.The new model has been tested with two popular hand gesture datasets,namely the Jester dataset and Nvidia dataset.Comparing with other models,our model produced very competitive results.The robustness of the new model has also been proved with an augmented dataset with enhanced diversity of hand gestures.展开更多
目前基于3D-ConvNet的行为识别算法普遍使用全局平均池化(global average pooling,GAP)压缩特征信息,但会产生信息损失、信息冗余和网络过拟合等问题。为了解决上述问题,更好地保留卷积层提取到的高级语义信息,提出了基于全局频域池化(g...目前基于3D-ConvNet的行为识别算法普遍使用全局平均池化(global average pooling,GAP)压缩特征信息,但会产生信息损失、信息冗余和网络过拟合等问题。为了解决上述问题,更好地保留卷积层提取到的高级语义信息,提出了基于全局频域池化(global frequency domain pooling,GFDP)的行为识别算法。首先,根据离散余弦变换(discrete cosine transform,DCT)看出,GAP是频域中特征分解的一种特例,从而引入更多频率分量增加特征通道间的特异性,减少信息压缩后的信息冗余;其次,为了更好地抑制过拟合问题,引入卷积层的批标准化策略,并将其拓展在以ERB(efficient residual block)-Res3D为骨架的行为识别模型的全连接层以优化数据分布;最后,将该方法在UCF101数据集上进行验证。结果表明,模型计算量为3.5 GFlops,参数量为7.4 M,最终的识别准确率在ERB-Res3D模型的基础上提升了3.9%,在原始Res3D模型基础上提升了17.4%,高效实现了更加准确的行为识别结果。展开更多
闭环检测是同时定位与建图(Simultaneous localization and mapping,SLAM)的重要组成部分,能够有效减小SLAM系统中的累积误差,并且如果在定位与建图过程中跟踪丢失,还可以利用闭环检测进行重定位。与传统的手动设计的特征(hand-crafted ...闭环检测是同时定位与建图(Simultaneous localization and mapping,SLAM)的重要组成部分,能够有效减小SLAM系统中的累积误差,并且如果在定位与建图过程中跟踪丢失,还可以利用闭环检测进行重定位。与传统的手动设计的特征(hand-crafted feature)相比,从神经网络中学习到的图像特征具有更好的环境不变性和语义识别能力。考虑到基于陆标(landmark)的卷积特征能够克服整个图像特征对视点变化敏感的缺陷,文中提出了一种新的闭环检测算法。其首先通过卷积神经网络的卷积层直接识别出图像的感兴趣区域生成陆标,然后对图像中识别出的每个陆标提取卷积特征,生成图像的最终表示以检测闭环。为了验证算法的有效性,在典型的数据集上进行了对比实验,结果表明所提算法具有优异的性能,且即使是在极端的视点和外观变化的情况下仍然具有高鲁棒性。展开更多
基金This work is supported by the National Key R&D Program of China[grant number 2018YFB0505400]the National Natural Science Foundation of China(NSFC)[grant num-ber 41901407]+1 种基金the LIESMARS Special Research Funding[grant number 2021]the College Students’Innovative Entrepreneurial Training Plan Program[grant number S2020634016].
文摘Image-based relocalization is a renewed interest in outdoor environments,because it is an important problem with many applications.PoseNet introduces Convolutional Neural Network(CNN)for the first time to realize the real-time camera pose solution based on a single image.In order to solve the problem of precision and robustness of PoseNet and its improved algorithms in complex environment,this paper proposes and implements a new visual relocation method based on deep convolutional neural networks(VNLSTM-PoseNet).Firstly,this method directly resizes the input image without cropping to increase the receptive field of the training image.Then,the image and the corresponding pose labels are put into the improved Long Short-Term Memory based(LSTM-based)PoseNet network for training and the network is optimized by the Nadam optimizer.Finally,the trained network is used for image localization to obtain the camera pose.Experimental results on outdoor public datasets show our VNLSTM-PoseNet can lead to drastic improvements in relocalization performance compared to existing state-of-theart CNN-based methods.
文摘Hand gestures are a natural way for human-robot interaction.Vision based dynamic hand gesture recognition has become a hot research topic due to its various applications.This paper presents a novel deep learning network for hand gesture recognition.The network integrates several well-proved modules together to learn both short-term and long-term features from video inputs and meanwhile avoid intensive computation.To learn short-term features,each video input is segmented into a fixed number of frame groups.A frame is randomly selected from each group and represented as an RGB image as well as an optical flow snapshot.These two entities are fused and fed into a convolutional neural network(Conv Net)for feature extraction.The Conv Nets for all groups share parameters.To learn longterm features,outputs from all Conv Nets are fed into a long short-term memory(LSTM)network,by which a final classification result is predicted.The new model has been tested with two popular hand gesture datasets,namely the Jester dataset and Nvidia dataset.Comparing with other models,our model produced very competitive results.The robustness of the new model has also been proved with an augmented dataset with enhanced diversity of hand gestures.
文摘闭环检测是同时定位与建图(Simultaneous localization and mapping,SLAM)的重要组成部分,能够有效减小SLAM系统中的累积误差,并且如果在定位与建图过程中跟踪丢失,还可以利用闭环检测进行重定位。与传统的手动设计的特征(hand-crafted feature)相比,从神经网络中学习到的图像特征具有更好的环境不变性和语义识别能力。考虑到基于陆标(landmark)的卷积特征能够克服整个图像特征对视点变化敏感的缺陷,文中提出了一种新的闭环检测算法。其首先通过卷积神经网络的卷积层直接识别出图像的感兴趣区域生成陆标,然后对图像中识别出的每个陆标提取卷积特征,生成图像的最终表示以检测闭环。为了验证算法的有效性,在典型的数据集上进行了对比实验,结果表明所提算法具有优异的性能,且即使是在极端的视点和外观变化的情况下仍然具有高鲁棒性。