采用自主设计特征图案的方法,研究了一种基于视觉导引实现无人直升机自主着舰的快速算法.分析了制约位姿参数估计实时性的瓶颈,提出了一种基于信息分块的图像处理方法.引入选择性坐标变换的方法,获取图像处理的理想区域;采用直线方向粗...采用自主设计特征图案的方法,研究了一种基于视觉导引实现无人直升机自主着舰的快速算法.分析了制约位姿参数估计实时性的瓶颈,提出了一种基于信息分块的图像处理方法.引入选择性坐标变换的方法,获取图像处理的理想区域;采用直线方向粗识别的途径,确定参数区间,缩小了直线hough变换遍历空间,减小了底层图像处理的运算量.对真实图像的测试结果表明:对于768像素×576像素大小的图像帧,完成目标检测和位姿参数识别任务耗时小于30 ms,实现了视频流的实时处理;等效直升机与舰船相距10 m时,位置参数的均方根(RMS,Root Mean Square)误差在2cm内,姿态参数的RMS误差小于1.5°.算法能够满足自主着舰控制的实时性和实用性要求.展开更多
Despite rapid developments in visual image-based road detection, robustly identifying road areas in visual images remains challenging due to issues like illumination changes and blurry images. To this end, LiDAR senso...Despite rapid developments in visual image-based road detection, robustly identifying road areas in visual images remains challenging due to issues like illumination changes and blurry images. To this end, LiDAR sensor data can be incorporated to improve the visual image-based road detection,because LiDAR data is less susceptible to visual noises. However,the main difficulty in introducing LiDAR information into visual image-based road detection is that LiDAR data and its extracted features do not share the same space with the visual data and visual features. Such gaps in spaces may limit the benefits of LiDAR information for road detection. To overcome this issue, we introduce a novel Progressive LiDAR adaptation-aided road detection(PLARD) approach to adapt LiDAR information into visual image-based road detection and improve detection performance. In PLARD, progressive LiDAR adaptation consists of two subsequent modules: 1) data space adaptation, which transforms the LiDAR data to the visual data space to align with the perspective view by applying altitude difference-based transformation; and 2) feature space adaptation, which adapts LiDAR features to visual features through a cascaded fusion structure. Comprehensive empirical studies on the well-known KITTI road detection benchmark demonstrate that PLARD takes advantage of both the visual and LiDAR information, achieving much more robust road detection even in challenging urban scenes. In particular, PLARD outperforms other state-of-theart road detection models and is currently top of the publicly accessible benchmark leader-board.展开更多
One of the most basic and difficult areas of computer vision and image understanding applications is still object detection. Deep neural network models and enhanced object representation have led to significant progre...One of the most basic and difficult areas of computer vision and image understanding applications is still object detection. Deep neural network models and enhanced object representation have led to significant progress in object detection. This research investigates in greater detail how object detection has changed in the recent years in the deep learning age. We provide an overview of the literature on a range of cutting-edge object identification algorithms and the theoretical underpinnings of these techniques. Deep learning technologies are contributing to substantial innovations in the field of object detection. While Convolutional Neural Networks (CNN) have laid a solid foundation, new models such as You Only Look Once (YOLO) and Vision Transformers (ViTs) have expanded the possibilities even further by providing high accuracy and fast detection in a variety of settings. Even with these developments, integrating CNN, YOLO and ViTs, into a coherent framework still poses challenges with juggling computing demand, speed, and accuracy especially in dynamic contexts. Real-time processing in applications like surveillance and autonomous driving necessitates improvements that take use of each model type’s advantages. The goal of this work is to provide an object detection system that maximizes detection speed and accuracy while decreasing processing requirements by integrating YOLO, CNN, and ViTs. Improving real-time detection performance in changing weather and light exposure circumstances, as well as detecting small or partially obscured objects in crowded cities, are among the goals. We provide a hybrid architecture which leverages CNN for robust feature extraction, YOLO for rapid detection, and ViTs for remarkable global context capture via self-attention techniques. Using an innovative training regimen that prioritizes flexible learning rates and data augmentation procedures, the model is trained on an extensive dataset of urban settings. Compared to solo YOLO, CNN, or ViTs models, the suggested model exhi展开更多
文摘采用自主设计特征图案的方法,研究了一种基于视觉导引实现无人直升机自主着舰的快速算法.分析了制约位姿参数估计实时性的瓶颈,提出了一种基于信息分块的图像处理方法.引入选择性坐标变换的方法,获取图像处理的理想区域;采用直线方向粗识别的途径,确定参数区间,缩小了直线hough变换遍历空间,减小了底层图像处理的运算量.对真实图像的测试结果表明:对于768像素×576像素大小的图像帧,完成目标检测和位姿参数识别任务耗时小于30 ms,实现了视频流的实时处理;等效直升机与舰船相距10 m时,位置参数的均方根(RMS,Root Mean Square)误差在2cm内,姿态参数的RMS误差小于1.5°.算法能够满足自主着舰控制的实时性和实用性要求.
基金supported by Australian Research Council Projects(FL-170100117,DP-180103424,IH-180100002)National Natural Science Foundation of China(NSFC)(61806062)
文摘Despite rapid developments in visual image-based road detection, robustly identifying road areas in visual images remains challenging due to issues like illumination changes and blurry images. To this end, LiDAR sensor data can be incorporated to improve the visual image-based road detection,because LiDAR data is less susceptible to visual noises. However,the main difficulty in introducing LiDAR information into visual image-based road detection is that LiDAR data and its extracted features do not share the same space with the visual data and visual features. Such gaps in spaces may limit the benefits of LiDAR information for road detection. To overcome this issue, we introduce a novel Progressive LiDAR adaptation-aided road detection(PLARD) approach to adapt LiDAR information into visual image-based road detection and improve detection performance. In PLARD, progressive LiDAR adaptation consists of two subsequent modules: 1) data space adaptation, which transforms the LiDAR data to the visual data space to align with the perspective view by applying altitude difference-based transformation; and 2) feature space adaptation, which adapts LiDAR features to visual features through a cascaded fusion structure. Comprehensive empirical studies on the well-known KITTI road detection benchmark demonstrate that PLARD takes advantage of both the visual and LiDAR information, achieving much more robust road detection even in challenging urban scenes. In particular, PLARD outperforms other state-of-theart road detection models and is currently top of the publicly accessible benchmark leader-board.
文摘One of the most basic and difficult areas of computer vision and image understanding applications is still object detection. Deep neural network models and enhanced object representation have led to significant progress in object detection. This research investigates in greater detail how object detection has changed in the recent years in the deep learning age. We provide an overview of the literature on a range of cutting-edge object identification algorithms and the theoretical underpinnings of these techniques. Deep learning technologies are contributing to substantial innovations in the field of object detection. While Convolutional Neural Networks (CNN) have laid a solid foundation, new models such as You Only Look Once (YOLO) and Vision Transformers (ViTs) have expanded the possibilities even further by providing high accuracy and fast detection in a variety of settings. Even with these developments, integrating CNN, YOLO and ViTs, into a coherent framework still poses challenges with juggling computing demand, speed, and accuracy especially in dynamic contexts. Real-time processing in applications like surveillance and autonomous driving necessitates improvements that take use of each model type’s advantages. The goal of this work is to provide an object detection system that maximizes detection speed and accuracy while decreasing processing requirements by integrating YOLO, CNN, and ViTs. Improving real-time detection performance in changing weather and light exposure circumstances, as well as detecting small or partially obscured objects in crowded cities, are among the goals. We provide a hybrid architecture which leverages CNN for robust feature extraction, YOLO for rapid detection, and ViTs for remarkable global context capture via self-attention techniques. Using an innovative training regimen that prioritizes flexible learning rates and data augmentation procedures, the model is trained on an extensive dataset of urban settings. Compared to solo YOLO, CNN, or ViTs models, the suggested model exhi