针对传统图像匹配算法计算量大、耗时长等缺陷,提出一种基于SURF(speeded up robust features)的图像特征点快速匹配算法.首先对图像采用SURF算法提取特征点;然后通过Haar小波变换确定特征点的主方向和特征点描述子,使用优化的最近邻搜...针对传统图像匹配算法计算量大、耗时长等缺陷,提出一种基于SURF(speeded up robust features)的图像特征点快速匹配算法.首先对图像采用SURF算法提取特征点;然后通过Haar小波变换确定特征点的主方向和特征点描述子,使用优化的最近邻搜索算法(best bin first,BBF)进行特征点匹配;最后根据实际需要选取相似度最高的前n对匹配点进行对比实验.实验结果表明:该算法鲁棒性强,速度快,匹配准确性高,具有较大的应用价值.展开更多
Long duration visual tracking of targets is quite challenging for computer vision, because the environments may be cluttered and distracting. Illumination variations and partial occlusions are two main difficulties in...Long duration visual tracking of targets is quite challenging for computer vision, because the environments may be cluttered and distracting. Illumination variations and partial occlusions are two main difficulties in real world visual tracking. Existing methods based on hostile appearance information cannot solve these problems effectively. This paper proposes a feature-based dynamic tracking approach that can track objects with partial occlusions and varying illumination. The method represents the tracked object by an invariant feature model. During the tracking, a new pyramid matching algorithm was used to match the object template with the observations to determine the observation likelihood. This matching is quite efficient in calculation and the spatial constraints among these features are also embedded. Instead of complicated optimization methods, the whole model is incorporated into a Bayesian filtering framework. The experiments on real world sequences demonstrate that the method can track objects accurately and robustly even with illumination variations and partial occlusions.展开更多
This work demonstrates the use of the nonlinear time-frequency distribution (NLTFD) of a discrete time energy operator (DTEO) based on amplitude modulation-frequency modulation demodulation techniques as a feature i...This work demonstrates the use of the nonlinear time-frequency distribution (NLTFD) of a discrete time energy operator (DTEO) based on amplitude modulation-frequency modulation demodulation techniques as a feature in speech recognition. The duration distribution based hidden Markov module in a speaker independent large vocabulary mandarin speech recognition system was reconstructed from the feature vectors in the front-end detection stage. The goal was to improve the performance of the existing system by combining new features to the baseline feature vector. This paper also deals with errors associated with using a pre-emphasis filter in the front end processing of the present scheme, which causes an increase in the noise energy at high frequencies above 4 kHz and in some cases degrades the recognition accuracy. The experimental results show that eliminating the pre-emphasis filters from the pre-processing stage and using NLTFD with compensated DTEO combined with Mel frequency cepstrum components give a 21.95% reduction in the relative error rate compared to the conventional technique with 25 candidates used in the test.展开更多
Automatic video mosaicking is a challenging task in computer vision. Current researches consider either panoramic or mapping tasks on short videos. In this paper, an automatic mosaicking algorithm is proposed for both...Automatic video mosaicking is a challenging task in computer vision. Current researches consider either panoramic or mapping tasks on short videos. In this paper, an automatic mosaicking algorithm is proposed for both mapping and panoramic tasks based on the adapted key-frame on videos of any length.The speeded up robust features(SURF) and the grid motion statistic(GMS) algorithm are used for feature extraction and matching between consecutive frames, which are used to compute the transformation. In order to reduce the influence of the accumulated error during image stitching, an evaluation metric is put forward for the transformation matrix. Besides, a self-growth method is employed to stitch the global image for long videos. The algorithm is evaluated by using aerial-view and panoramic videos respectively on the graphic processing unit(GPU) device, which can satisfy the real-time requirement. The experimental results demonstrate that the proposed algorithm is able to achieve a better performance than the state-of-art.展开更多
A machine learning based speech enhancement method is proposed to improve the intelligibility of whispered speech. A binary mask estimated by a two-class support vector machine (SVM) classifier is used to synthesize...A machine learning based speech enhancement method is proposed to improve the intelligibility of whispered speech. A binary mask estimated by a two-class support vector machine (SVM) classifier is used to synthesize the enhanced whisper. A novel noise robust feature called Gammatone feature cosine coefficients (GFCCs) extracted by an auditory periphery model is derived and used for the binary mask estimation. The intelligibility performance of the proposed method is evaluated and compared with the traditional speech enhancement methods. Objective and subjective evaluation results indicate that the proposed method can effectively improve the intelligibility of whispered speech which is contaminated by noise. Compared with the power subtract algorithm and the log-MMSE algorithm, both of which do not improve the intelligibility in lower signal-to-noise ratio (SNR) environments, the proposed method has good performance in improving the intelligibility of noisy whisper. Additionally, the intelligibility of the enhanced whispered speech using the proposed method also outperforms that of the corresponding unprocessed noisy whispered speech.展开更多
Different devices in the recent era generated a vast amount of digital video.Generally,it has been seen in recent years that people are forging the video to use it as proof of evidence in the court of justice.Many kin...Different devices in the recent era generated a vast amount of digital video.Generally,it has been seen in recent years that people are forging the video to use it as proof of evidence in the court of justice.Many kinds of researches on forensic detection have been presented,and it provides less accuracy.This paper proposed a novel forgery detection technique in image frames of the videos using enhanced Convolutional Neural Network(CNN).In the initial stage,the input video is taken as of the dataset and then converts the videos into image frames.Next,perform pre-sampling using the Adaptive Rood Pattern Search(ARPS)algorithm intended for reducing the useless frames.In the next stage,perform preprocessing for enhancing the image frames.Then,face detection is done as of the image utilizing the Viola-Jones algorithm.Finally,the improved Crow Search Algorithm(ICSA)has been used to select the extorted features and inputted to the Enhanced Convolutional Neural Network(ECNN)classifier for detecting the forged image frames.The experimental outcome of the proposed system has achieved 97.21%accuracy compared to other existing methods.展开更多
目的绝缘子检测是输电线路智能巡维工作的重要组成部分,然而大多数情况仅能获得单一类型的绝缘子样本。将单一类型的绝缘子样本训练得到的模型直接用于其他类型的绝缘子检测,会由于训练数据与目标数据之间存在的域偏移导致其检测性能急...目的绝缘子检测是输电线路智能巡维工作的重要组成部分,然而大多数情况仅能获得单一类型的绝缘子样本。将单一类型的绝缘子样本训练得到的模型直接用于其他类型的绝缘子检测,会由于训练数据与目标数据之间存在的域偏移导致其检测性能急剧下降。因此,提高模型的泛化能力以保持良好的检测性能显得尤为必要。为此,提出一种新颖的对抗一致性约束的无监督域自适应绝缘子检测算法。方法对源域样本与目标域样本分别设计了两个不同的分类器,并将网络的预测结果与对应的绝缘子进行类别约束,使模型能够提取到不同类型绝缘子独有的特征。此外,在对抗学习过程中引入一个额外的分类器用于将源域中绝缘子特征与从目标域中预测到的目标物特征分到同一类别下,从而使模型能提取不同类型绝缘子共有的鲁棒性特征。结果实验表明本文方法显著提高了模型的跨域检测性能。在glass→composite和composite→glass任务上的平均精度均值(mean average precision,m AP)分别达到55.1%和23.4%,优于主流的无监督域自适应目标检测方法。在公开数据集COCO(common objects in context)上的实验结果也较为优异,平均精度均值(mean average precision,mAP)达到61.5%。消融实验中,在glass→composite和composite→glass任务上,本文方法在基准性能上分别提升了11.5%和6.4%,表明了所提方法的有效性。结论本文方法减少了不同类型绝缘子间的差异带来的域偏移,提升了模型在跨域绝缘子检测任务中的泛化能力,提高了输电线路巡维工作的绝缘子检测效率。同时,在COCO数据集上的普适性实验表明本文方法同样适用于其他不同类物体的检测并且性能优异。展开更多
文摘针对传统图像匹配算法计算量大、耗时长等缺陷,提出一种基于SURF(speeded up robust features)的图像特征点快速匹配算法.首先对图像采用SURF算法提取特征点;然后通过Haar小波变换确定特征点的主方向和特征点描述子,使用优化的最近邻搜索算法(best bin first,BBF)进行特征点匹配;最后根据实际需要选取相似度最高的前n对匹配点进行对比实验.实验结果表明:该算法鲁棒性强,速度快,匹配准确性高,具有较大的应用价值.
文摘Long duration visual tracking of targets is quite challenging for computer vision, because the environments may be cluttered and distracting. Illumination variations and partial occlusions are two main difficulties in real world visual tracking. Existing methods based on hostile appearance information cannot solve these problems effectively. This paper proposes a feature-based dynamic tracking approach that can track objects with partial occlusions and varying illumination. The method represents the tracked object by an invariant feature model. During the tracking, a new pyramid matching algorithm was used to match the object template with the observations to determine the observation likelihood. This matching is quite efficient in calculation and the spatial constraints among these features are also embedded. Instead of complicated optimization methods, the whole model is incorporated into a Bayesian filtering framework. The experiments on real world sequences demonstrate that the method can track objects accurately and robustly even with illumination variations and partial occlusions.
基金the National High- Tech Research andDevelopm ent Program of China(No. 2 0 0 1AA114 0 71)
文摘This work demonstrates the use of the nonlinear time-frequency distribution (NLTFD) of a discrete time energy operator (DTEO) based on amplitude modulation-frequency modulation demodulation techniques as a feature in speech recognition. The duration distribution based hidden Markov module in a speaker independent large vocabulary mandarin speech recognition system was reconstructed from the feature vectors in the front-end detection stage. The goal was to improve the performance of the existing system by combining new features to the baseline feature vector. This paper also deals with errors associated with using a pre-emphasis filter in the front end processing of the present scheme, which causes an increase in the noise energy at high frequencies above 4 kHz and in some cases degrades the recognition accuracy. The experimental results show that eliminating the pre-emphasis filters from the pre-processing stage and using NLTFD with compensated DTEO combined with Mel frequency cepstrum components give a 21.95% reduction in the relative error rate compared to the conventional technique with 25 candidates used in the test.
基金supported by the National Science Foundation of China(61603040,61973036,61433003)。
文摘Automatic video mosaicking is a challenging task in computer vision. Current researches consider either panoramic or mapping tasks on short videos. In this paper, an automatic mosaicking algorithm is proposed for both mapping and panoramic tasks based on the adapted key-frame on videos of any length.The speeded up robust features(SURF) and the grid motion statistic(GMS) algorithm are used for feature extraction and matching between consecutive frames, which are used to compute the transformation. In order to reduce the influence of the accumulated error during image stitching, an evaluation metric is put forward for the transformation matrix. Besides, a self-growth method is employed to stitch the global image for long videos. The algorithm is evaluated by using aerial-view and panoramic videos respectively on the graphic processing unit(GPU) device, which can satisfy the real-time requirement. The experimental results demonstrate that the proposed algorithm is able to achieve a better performance than the state-of-art.
基金The National Natural Science Foundation of China (No.61231002,61273266,51075068,60872073,60975017, 61003131)the Ph.D.Programs Foundation of the Ministry of Education of China(No.20110092130004)+1 种基金the Science Foundation for Young Talents in the Educational Committee of Anhui Province(No. 2010SQRL018)the 211 Project of Anhui University(No.2009QN027B)
文摘A machine learning based speech enhancement method is proposed to improve the intelligibility of whispered speech. A binary mask estimated by a two-class support vector machine (SVM) classifier is used to synthesize the enhanced whisper. A novel noise robust feature called Gammatone feature cosine coefficients (GFCCs) extracted by an auditory periphery model is derived and used for the binary mask estimation. The intelligibility performance of the proposed method is evaluated and compared with the traditional speech enhancement methods. Objective and subjective evaluation results indicate that the proposed method can effectively improve the intelligibility of whispered speech which is contaminated by noise. Compared with the power subtract algorithm and the log-MMSE algorithm, both of which do not improve the intelligibility in lower signal-to-noise ratio (SNR) environments, the proposed method has good performance in improving the intelligibility of noisy whisper. Additionally, the intelligibility of the enhanced whispered speech using the proposed method also outperforms that of the corresponding unprocessed noisy whispered speech.
文摘Different devices in the recent era generated a vast amount of digital video.Generally,it has been seen in recent years that people are forging the video to use it as proof of evidence in the court of justice.Many kinds of researches on forensic detection have been presented,and it provides less accuracy.This paper proposed a novel forgery detection technique in image frames of the videos using enhanced Convolutional Neural Network(CNN).In the initial stage,the input video is taken as of the dataset and then converts the videos into image frames.Next,perform pre-sampling using the Adaptive Rood Pattern Search(ARPS)algorithm intended for reducing the useless frames.In the next stage,perform preprocessing for enhancing the image frames.Then,face detection is done as of the image utilizing the Viola-Jones algorithm.Finally,the improved Crow Search Algorithm(ICSA)has been used to select the extorted features and inputted to the Enhanced Convolutional Neural Network(ECNN)classifier for detecting the forged image frames.The experimental outcome of the proposed system has achieved 97.21%accuracy compared to other existing methods.
文摘目的绝缘子检测是输电线路智能巡维工作的重要组成部分,然而大多数情况仅能获得单一类型的绝缘子样本。将单一类型的绝缘子样本训练得到的模型直接用于其他类型的绝缘子检测,会由于训练数据与目标数据之间存在的域偏移导致其检测性能急剧下降。因此,提高模型的泛化能力以保持良好的检测性能显得尤为必要。为此,提出一种新颖的对抗一致性约束的无监督域自适应绝缘子检测算法。方法对源域样本与目标域样本分别设计了两个不同的分类器,并将网络的预测结果与对应的绝缘子进行类别约束,使模型能够提取到不同类型绝缘子独有的特征。此外,在对抗学习过程中引入一个额外的分类器用于将源域中绝缘子特征与从目标域中预测到的目标物特征分到同一类别下,从而使模型能提取不同类型绝缘子共有的鲁棒性特征。结果实验表明本文方法显著提高了模型的跨域检测性能。在glass→composite和composite→glass任务上的平均精度均值(mean average precision,m AP)分别达到55.1%和23.4%,优于主流的无监督域自适应目标检测方法。在公开数据集COCO(common objects in context)上的实验结果也较为优异,平均精度均值(mean average precision,mAP)达到61.5%。消融实验中,在glass→composite和composite→glass任务上,本文方法在基准性能上分别提升了11.5%和6.4%,表明了所提方法的有效性。结论本文方法减少了不同类型绝缘子间的差异带来的域偏移,提升了模型在跨域绝缘子检测任务中的泛化能力,提高了输电线路巡维工作的绝缘子检测效率。同时,在COCO数据集上的普适性实验表明本文方法同样适用于其他不同类物体的检测并且性能优异。