针对当前人体动作识别算法中由于人体躯干遮挡而导致其检测精度不佳的问题,提出了一种基于加权三视角运动历史图像耦合时序分割的动作识别算法。首先,为了有效描述动作的形状和空间分布,从视频序列中提取运动历史图像(motion history im...针对当前人体动作识别算法中由于人体躯干遮挡而导致其检测精度不佳的问题,提出了一种基于加权三视角运动历史图像耦合时序分割的动作识别算法。首先,为了有效描述动作的形状和空间分布,从视频序列中提取运动历史图像(motion history image,MHI)。随后,应用深度相机(Kinect相机)来提取深度图像,以获取人体目标的动作前景轮廓。为了识别由于身体部位造成的自我遮挡,动作前景轮廓被投影到3个视角(3V)平面,形成3V-MHI,增强了对动作的正确提取,利用3V-MHI构造了一个用于记录观测运动轨迹的MHI,克服了单视角MHI的信息局限性。然后,利用时序分割(temporal segmentation,TS),根据相邻的3V-MHI来计算动作的能量和方向的变化,以检测运动的开始和结束,从而输出运动结果。此外,计算MHI的梯度值作为每个平面对应的权重,从而得到加权3V-MHI。最后,将提取的每个直方图运动模板与预先建立的数据库进行比较,完成动作的分类识别。实验表明,该方法能有效地解决自遮挡问题,在复杂环境和光照变化下有较高的准确性与鲁棒性。展开更多
This research focuses on addressing the challenges associated with image detection in low-light environments,particularly by applying artificial intelligence techniques to machine vision and object recognition systems...This research focuses on addressing the challenges associated with image detection in low-light environments,particularly by applying artificial intelligence techniques to machine vision and object recognition systems.The primary goal is to tackle issues related to recognizing objects with low brightness levels.In this study,the Intel RealSense Lidar Camera L515 is used to simultaneously capture color information and 16-bit depth information images.The detection scenarios are categorized into normal brightness and low brightness situations.When the system determines a normal brightness environment,normal brightness images are recognized using deep learning methods.In low-brightness situations,three methods are proposed for recognition.The first method is the SegmentationwithDepth image(SD)methodwhich involves segmenting the depth image,creating amask from the segmented depth image,mapping the obtained mask onto the true color(RGB)image to obtain a backgroundreduced RGB image,and recognizing the segmented image.The second method is theHDVmethod(hue,depth,value)which combines RGB images converted to HSV images(hue,saturation,value)with depth images D to form HDV images for recognition.The third method is the HSD(hue,saturation,depth)method which similarly combines RGB images converted to HSV images with depth images D to form HSD images for recognition.In experimental results,in normal brightness environments,the average recognition rate obtained using image recognition methods is 91%.For low-brightness environments,using the SD method with original images for training and segmented images for recognition achieves an average recognition rate of over 82%.TheHDVmethod achieves an average recognition rate of over 70%,while the HSD method achieves an average recognition rate of over 84%.The HSD method allows for a quick and convenient low-light object recognition system.This research outcome can be applied to nighttime surveillance systems or nighttime road safety systems.展开更多
Perception and manipulation tasks for robotic manipulators involving highly-cluttered objects have become increasingly indemand for achieving a more efficient problem solving method in modern industrial environments.B...Perception and manipulation tasks for robotic manipulators involving highly-cluttered objects have become increasingly indemand for achieving a more efficient problem solving method in modern industrial environments.But,most of the available methods for performing such cluttered tasks failed in terms of performance,mainly due to inability to adapt to the change of the environment and the handled objects.Here,we propose a new,near real-time approach to suction-based grasp point estimation in a highly cluttered environment by employing an affordance-based approach.Compared to the state-of-the-art,our proposed method offers two distinctive contributions.First,we use a modified deep neural network backbone for the input of the semantic segmentation,to classify pixel elements of the input red,green,blue and depth(RGBD)channel image which is then used to produce an affordance map,a pixel-wise probability map representing the probability of a successful grasping action in those particular pixel regions.Later,we incorporate a high speed semantic segmentation to the system,which makes our solution have a lower computational time.This approach does not need to have any prior knowledge or models of the objects since it removes the step of pose estimation and object recognition entirely compared to most of the current approaches and uses an assumption to grasp first then recognize later,which makes it possible to have an object-agnostic property.The system was designed to be used for household objects,but it can be easily extended to any kind of objects provided that the right dataset is used for training the models.Experimental results show the benefit of our approach which achieves a precision of 88.83%,compared to the 83.4%precision of the current state-of-the-art.展开更多
文摘针对当前人体动作识别算法中由于人体躯干遮挡而导致其检测精度不佳的问题,提出了一种基于加权三视角运动历史图像耦合时序分割的动作识别算法。首先,为了有效描述动作的形状和空间分布,从视频序列中提取运动历史图像(motion history image,MHI)。随后,应用深度相机(Kinect相机)来提取深度图像,以获取人体目标的动作前景轮廓。为了识别由于身体部位造成的自我遮挡,动作前景轮廓被投影到3个视角(3V)平面,形成3V-MHI,增强了对动作的正确提取,利用3V-MHI构造了一个用于记录观测运动轨迹的MHI,克服了单视角MHI的信息局限性。然后,利用时序分割(temporal segmentation,TS),根据相邻的3V-MHI来计算动作的能量和方向的变化,以检测运动的开始和结束,从而输出运动结果。此外,计算MHI的梯度值作为每个平面对应的权重,从而得到加权3V-MHI。最后,将提取的每个直方图运动模板与预先建立的数据库进行比较,完成动作的分类识别。实验表明,该方法能有效地解决自遮挡问题,在复杂环境和光照变化下有较高的准确性与鲁棒性。
基金the National Science and Technology Council of Taiwan under Grant NSTC 112-2221-E-130-005.
文摘This research focuses on addressing the challenges associated with image detection in low-light environments,particularly by applying artificial intelligence techniques to machine vision and object recognition systems.The primary goal is to tackle issues related to recognizing objects with low brightness levels.In this study,the Intel RealSense Lidar Camera L515 is used to simultaneously capture color information and 16-bit depth information images.The detection scenarios are categorized into normal brightness and low brightness situations.When the system determines a normal brightness environment,normal brightness images are recognized using deep learning methods.In low-brightness situations,three methods are proposed for recognition.The first method is the SegmentationwithDepth image(SD)methodwhich involves segmenting the depth image,creating amask from the segmented depth image,mapping the obtained mask onto the true color(RGB)image to obtain a backgroundreduced RGB image,and recognizing the segmented image.The second method is theHDVmethod(hue,depth,value)which combines RGB images converted to HSV images(hue,saturation,value)with depth images D to form HDV images for recognition.The third method is the HSD(hue,saturation,depth)method which similarly combines RGB images converted to HSV images with depth images D to form HSD images for recognition.In experimental results,in normal brightness environments,the average recognition rate obtained using image recognition methods is 91%.For low-brightness environments,using the SD method with original images for training and segmented images for recognition achieves an average recognition rate of over 82%.TheHDVmethod achieves an average recognition rate of over 70%,while the HSD method achieves an average recognition rate of over 84%.The HSD method allows for a quick and convenient low-light object recognition system.This research outcome can be applied to nighttime surveillance systems or nighttime road safety systems.
文摘Perception and manipulation tasks for robotic manipulators involving highly-cluttered objects have become increasingly indemand for achieving a more efficient problem solving method in modern industrial environments.But,most of the available methods for performing such cluttered tasks failed in terms of performance,mainly due to inability to adapt to the change of the environment and the handled objects.Here,we propose a new,near real-time approach to suction-based grasp point estimation in a highly cluttered environment by employing an affordance-based approach.Compared to the state-of-the-art,our proposed method offers two distinctive contributions.First,we use a modified deep neural network backbone for the input of the semantic segmentation,to classify pixel elements of the input red,green,blue and depth(RGBD)channel image which is then used to produce an affordance map,a pixel-wise probability map representing the probability of a successful grasping action in those particular pixel regions.Later,we incorporate a high speed semantic segmentation to the system,which makes our solution have a lower computational time.This approach does not need to have any prior knowledge or models of the objects since it removes the step of pose estimation and object recognition entirely compared to most of the current approaches and uses an assumption to grasp first then recognize later,which makes it possible to have an object-agnostic property.The system was designed to be used for household objects,but it can be easily extended to any kind of objects provided that the right dataset is used for training the models.Experimental results show the benefit of our approach which achieves a precision of 88.83%,compared to the 83.4%precision of the current state-of-the-art.