Video text detection is a challenging problem, since video image background is generally complex and its subtitles often have the problems of color bleeding, fuzzy boundaries and low contrast due to video lossy compre...Video text detection is a challenging problem, since video image background is generally complex and its subtitles often have the problems of color bleeding, fuzzy boundaries and low contrast due to video lossy compression and low resolution. In this paper, we propose a robust framework to solve these problems. Firstly, we exploit gradient amplitude map (GAM) to enhance the edge of an input image, which can overcome the problems of color bleeding and fuzzy boundaries. Secondly, a two-direction morphological filtering is developed to filter background noise and enhance the contrast between background and text. Thirdly, maximally stable extremal region (MSER) is applied to detect text regions with two extreme colors, and we use the mean intensity of the regions as the graph cuts' label set, and the Euclidean distance of three channels in HSI color space as the graph cuts smooth term, to get optimal segmentations. Finally, we group them into text lines using the geometric characteristics of the text, and then corner detection, multi-frame verification, and some heuristic rules are used to eliminate non-text regions. We test our scheme with some challenging videos, and the results prove that our text detection framework is more robust than previous methods.展开更多
自然场景文本的背景复杂,很难确定文本位置,文本检测带来很大的挑战。从而提出一个基于目标检测的改进Yolo(You Only Look Once)自然场景多方向文本区域检测模型。在收集的2500张训练和500张测试多方向维吾尔文数据集上,通过改进的K-me...自然场景文本的背景复杂,很难确定文本位置,文本检测带来很大的挑战。从而提出一个基于目标检测的改进Yolo(You Only Look Once)自然场景多方向文本区域检测模型。在收集的2500张训练和500张测试多方向维吾尔文数据集上,通过改进的K-means算法生成3种固定宽度的预设Anchor,对文本区域进行分类和多个垂直的矩形预测框位置回归,生成多方向文本检测框。检测文本预测框不同的连接和融合方式,检测多方向维吾尔文本,减少对角线上多余背景。在测试集上的实验中,得到了77%的准确率。实验结果表明,改进的Yolo v3模型在多方向维吾尔文场景文字区域检测任务中具有鲁棒性和应用性。展开更多
文摘Video text detection is a challenging problem, since video image background is generally complex and its subtitles often have the problems of color bleeding, fuzzy boundaries and low contrast due to video lossy compression and low resolution. In this paper, we propose a robust framework to solve these problems. Firstly, we exploit gradient amplitude map (GAM) to enhance the edge of an input image, which can overcome the problems of color bleeding and fuzzy boundaries. Secondly, a two-direction morphological filtering is developed to filter background noise and enhance the contrast between background and text. Thirdly, maximally stable extremal region (MSER) is applied to detect text regions with two extreme colors, and we use the mean intensity of the regions as the graph cuts' label set, and the Euclidean distance of three channels in HSI color space as the graph cuts smooth term, to get optimal segmentations. Finally, we group them into text lines using the geometric characteristics of the text, and then corner detection, multi-frame verification, and some heuristic rules are used to eliminate non-text regions. We test our scheme with some challenging videos, and the results prove that our text detection framework is more robust than previous methods.
文摘自然场景文本的背景复杂,很难确定文本位置,文本检测带来很大的挑战。从而提出一个基于目标检测的改进Yolo(You Only Look Once)自然场景多方向文本区域检测模型。在收集的2500张训练和500张测试多方向维吾尔文数据集上,通过改进的K-means算法生成3种固定宽度的预设Anchor,对文本区域进行分类和多个垂直的矩形预测框位置回归,生成多方向文本检测框。检测文本预测框不同的连接和融合方式,检测多方向维吾尔文本,减少对角线上多余背景。在测试集上的实验中,得到了77%的准确率。实验结果表明,改进的Yolo v3模型在多方向维吾尔文场景文字区域检测任务中具有鲁棒性和应用性。