Eye center localization is one of the most crucial and basic requirements for some human-computer interaction applications such as eye gaze estimation and eye tracking. There is a large body of works on this topic in ...Eye center localization is one of the most crucial and basic requirements for some human-computer interaction applications such as eye gaze estimation and eye tracking. There is a large body of works on this topic in recent years, but the accuracy still needs to be improved due to challenges in appearance such as the high variability of shapes, lighting conditions, viewing angles and possible occlusions. To address these problems and limitations, we propose a novel approach in this paper for the eye center localization with a fully convolutional network(FCN),which is an end-to-end and pixels-to-pixels network and can locate the eye center accurately. The key idea is to apply the FCN from the object semantic segmentation task to the eye center localization task since the problem of eye center localization can be regarded as a special semantic segmentation problem. We adapt contemporary FCN into a shallow structure with a large kernel convolutional block and transfer their performance from semantic segmentation to the eye center localization task by fine-tuning.Extensive experiments show that the proposed method outperforms the state-of-the-art methods in both accuracy and reliability of eye center localization. The proposed method has achieved a large performance improvement on the most challenging database and it thus provides a promising solution to some challenging applications.展开更多
目的场景文本检测是场景理解和文字识别领域的重要任务之一,尽管基于深度学习的算法显著提升了检测精度,但现有的方法由于对文字局部语义和文字实例间的全局语义的提取能力不足,导致缺乏文字多层语义的建模,从而检测精度不理想。针对此...目的场景文本检测是场景理解和文字识别领域的重要任务之一,尽管基于深度学习的算法显著提升了检测精度,但现有的方法由于对文字局部语义和文字实例间的全局语义的提取能力不足,导致缺乏文字多层语义的建模,从而检测精度不理想。针对此问题,提出了一种层级语义融合的场景文本检测算法。方法该方法包括基于文本片段的局部语义理解模块和基于文本实例的全局语义理解模块,以分别引导网络关注文字局部和文字实例间的多层级语义信息。首先,基于文本片段的局部语义理解模块根据相对位置将文本划分为多个片段,在细粒度优化目标的监督下增强网络对局部语义的感知能力。然后,基于文本实例的全局语义理解模块利用文本片段粗分割结果过滤背景区域并提取可靠的文字区域特征,进而通过注意力机制自适应地捕获任意形状文本的全局语义信息并得到最终分割结果。此外,为了降低边界区域的预测噪声对层级语义信息聚合的干扰,提出边界感知损失函数以降低边界区域特征的歧义性。结果算法在3个常用的场景文字检测数据集上实验并与其他算法进行了比较,所提方法在性能上获得了显著提升,在Totoal-Text数据集上,F值为87.0%,相比其他模型提升了1.0%;在MSRA-TD500(MSRA text detection 500 database)数据集上,F值为88.2%,相比其他模型提升了1.0%;在ICDAR 2015(International Conference on Document Analysis and Recognition)数据集上,F值为87.0%。结论提出的模型通过分别构建不同层级下的语义上下文和对歧义特征额外的惩罚解决了层级语义提取不充分的问题,获得了更高的检测精度。展开更多
基金supported by National Natural Science Foundation of China(61533019,U1811463)Open Fund of the State Key Laboratory for Management and Control of Complex Systems,Institute of Automation,Chinese Academy of Sciences(Y6S9011F51)in part by the EPSRC Project(EP/N025849/1)
文摘Eye center localization is one of the most crucial and basic requirements for some human-computer interaction applications such as eye gaze estimation and eye tracking. There is a large body of works on this topic in recent years, but the accuracy still needs to be improved due to challenges in appearance such as the high variability of shapes, lighting conditions, viewing angles and possible occlusions. To address these problems and limitations, we propose a novel approach in this paper for the eye center localization with a fully convolutional network(FCN),which is an end-to-end and pixels-to-pixels network and can locate the eye center accurately. The key idea is to apply the FCN from the object semantic segmentation task to the eye center localization task since the problem of eye center localization can be regarded as a special semantic segmentation problem. We adapt contemporary FCN into a shallow structure with a large kernel convolutional block and transfer their performance from semantic segmentation to the eye center localization task by fine-tuning.Extensive experiments show that the proposed method outperforms the state-of-the-art methods in both accuracy and reliability of eye center localization. The proposed method has achieved a large performance improvement on the most challenging database and it thus provides a promising solution to some challenging applications.
文摘目的场景文本检测是场景理解和文字识别领域的重要任务之一,尽管基于深度学习的算法显著提升了检测精度,但现有的方法由于对文字局部语义和文字实例间的全局语义的提取能力不足,导致缺乏文字多层语义的建模,从而检测精度不理想。针对此问题,提出了一种层级语义融合的场景文本检测算法。方法该方法包括基于文本片段的局部语义理解模块和基于文本实例的全局语义理解模块,以分别引导网络关注文字局部和文字实例间的多层级语义信息。首先,基于文本片段的局部语义理解模块根据相对位置将文本划分为多个片段,在细粒度优化目标的监督下增强网络对局部语义的感知能力。然后,基于文本实例的全局语义理解模块利用文本片段粗分割结果过滤背景区域并提取可靠的文字区域特征,进而通过注意力机制自适应地捕获任意形状文本的全局语义信息并得到最终分割结果。此外,为了降低边界区域的预测噪声对层级语义信息聚合的干扰,提出边界感知损失函数以降低边界区域特征的歧义性。结果算法在3个常用的场景文字检测数据集上实验并与其他算法进行了比较,所提方法在性能上获得了显著提升,在Totoal-Text数据集上,F值为87.0%,相比其他模型提升了1.0%;在MSRA-TD500(MSRA text detection 500 database)数据集上,F值为88.2%,相比其他模型提升了1.0%;在ICDAR 2015(International Conference on Document Analysis and Recognition)数据集上,F值为87.0%。结论提出的模型通过分别构建不同层级下的语义上下文和对歧义特征额外的惩罚解决了层级语义提取不充分的问题,获得了更高的检测精度。