Generating realistic and synthetic video from text is a highly challenging task due to the multitude of issues involved,including digit deformation,noise interference between frames,blurred output,and the need for tem...Generating realistic and synthetic video from text is a highly challenging task due to the multitude of issues involved,including digit deformation,noise interference between frames,blurred output,and the need for temporal coherence across frames.In this paper,we propose a novel approach for generating coherent videos of moving digits from textual input using a Deep Deconvolutional Generative Adversarial Network(DD-GAN).The DDGAN comprises a Deep Deconvolutional Neural Network(DDNN)as a Generator(G)and a modified Deep Convolutional Neural Network(DCNN)as a Discriminator(D)to ensure temporal coherence between adjacent frames.The proposed research involves several steps.First,the input text is fed into a Long Short Term Memory(LSTM)based text encoder and then smoothed using Conditioning Augmentation(CA)techniques to enhance the effectiveness of the Generator(G).Next,using a DDNN to generate video frames by incorporating enhanced text and random noise and modifying a DCNN to act as a Discriminator(D),effectively distinguishing between generated and real videos.This research evaluates the quality of the generated videos using standard metrics like Inception Score(IS),Fréchet Inception Distance(FID),Fréchet Inception Distance for video(FID2vid),and Generative Adversarial Metric(GAM),along with a human study based on realism,coherence,and relevance.By conducting experiments on Single-Digit Bouncing MNIST GIFs(SBMG),Two-Digit Bouncing MNIST GIFs(TBMG),and a custom dataset of essential mathematics videos with related text,this research demonstrates significant improvements in both metrics and human study results,confirming the effectiveness of DD-GAN.This research also took the exciting challenge of generating preschool math videos from text,handling complex structures,digits,and symbols,and achieving successful results.The proposed research demonstrates promising results for generating coherent videos from textual input.展开更多
This paper presents a new blind XPIC and a new adaptive blind deconvolutional algorithm based on HOS processing, which separates and equalizes the signals in real time. The simulation results demonstrate that the perf...This paper presents a new blind XPIC and a new adaptive blind deconvolutional algorithm based on HOS processing, which separates and equalizes the signals in real time. The simulation results demonstrate that the performance of the proposed adaptive blind algorithm,compared with the conventional algorithms, is outstanding with the feature of feasibility, stability and fast convergence rate.展开更多
为了提高目标检测在嵌入式或移动端设备运行的可能性,基于SSD(Single Shot MultiBox Detector)框架,结合轻量化神经网络,构建一种轻量化SSD目标检测模型,称其为快速且精准的SSD(Fast and Accurate Single Shot Detector,FA-SSD).该方法...为了提高目标检测在嵌入式或移动端设备运行的可能性,基于SSD(Single Shot MultiBox Detector)框架,结合轻量化神经网络,构建一种轻量化SSD目标检测模型,称其为快速且精准的SSD(Fast and Accurate Single Shot Detector,FA-SSD).该方法采用轻量化卷积神经网络ESPNet作为基础网络,使用反卷积模块融合深浅层特征信息,并做轻量化处理,均衡模型尺寸和检测精度.实验结果表明,该方法相比原经典SSD算法具有更少的网络参数量和计算复杂度,在参数量上减少了47.3%,每秒处理图像帧数比经典SSD算法提升3.7倍.在VOC2007数据集中的测试平均精度均值(mAP)结果可以达到73.6%,和经典算法的结果相差无几,从而在保证检测精度的同时提高检测速度.展开更多
Image denoising is often used as a preprocessing step in computer vision tasks,which can help improve the accuracy of image processing models.Due to the imperfection of imaging systems,transmission media and recording...Image denoising is often used as a preprocessing step in computer vision tasks,which can help improve the accuracy of image processing models.Due to the imperfection of imaging systems,transmission media and recording equipment,digital images are often contaminated with various noises during their formation,which troubles the visual effects and even hinders people’s normal recognition.The pollution of noise directly affects the processing of image edge detection,feature extraction,pattern recognition,etc.,making it difficult for people to break through the bottleneck by modifying the model.Many traditional filtering methods have shown poor performance since they do not have optimal expression and adaptation for specific images.Meanwhile,deep learning technology opens up new possibilities for image denoising.In this paper,we propose a novel neural network which is based on generative adversarial networks for image denoising.Inspired by U-net,our method employs a novel symmetrical encoder-decoder based generator network.The encoder adopts convolutional neural networks to extract features,while the decoder outputs the noise in the images by deconvolutional neural networks.Specially,shortcuts are added between designated layers,which can preserve image texture details and prevent gradient explosions.Besides,in order to improve the training stability of the model,we add Wasserstein distance in loss function as an optimization.We use the peak signal-to-noise ratio(PSNR)to evaluate our model and we can prove the effectiveness of it with experimental results.When compared to the state-of-the-art approaches,our method presents competitive performance.展开更多
Shield tunnel lining is prone to water leakage,which may further bring about corrosion and structural damage to the walls,potentially leading to dangerous accidents.To avoid tedious and inefficient manual inspection,m...Shield tunnel lining is prone to water leakage,which may further bring about corrosion and structural damage to the walls,potentially leading to dangerous accidents.To avoid tedious and inefficient manual inspection,many projects use artificial intelligence(Al)to detect cracks and water leakage.A novel method for water leakage inspection in shield tunnel lining that utilizes deep learning is introduced in this paper.Our proposal includes a ConvNeXt-S backbone,deconvolutional-feature pyramid network(D-FPN),spatial attention module(SPAM).and a detection head.It can extract representative features of leaking areas to aid inspection processes.To further improve the model's robustness,we innovatively use an inversed low-light enhancement method to convert normally illuminated images to low light ones and introduce them into the training samples.Validation experiments are performed,achieving the average precision(AP)score of 56.8%,which outperforms previous work by a margin of 5.7%.Visualization illustrations also support our method's practical effectiveness.展开更多
<strong>Purpose</strong><span style="font-family:;" "=""><span style="font-family:Verdana;"><strong>: </strong></span><span style=&q...<strong>Purpose</strong><span style="font-family:;" "=""><span style="font-family:Verdana;"><strong>: </strong></span><span style="font-family:Verdana;">To improve the liver auto-segmentation performance of three-</span><span style="font-family:Verdana;">dimensional (3D) U-net by replacing the conventional up-sampling convolution layers with the Pixel De-convolutional Network (PDN) that considers spatial features. </span><b><span style="font-family:Verdana;">Methods</span></b><span style="font-family:Verdana;">: The U-net was originally developed to segment neuronal structure with outstanding performance but suffered serious artifacts from indirectly unrelated adjacent pixels in its up-sampling layers. The hypothesis of this study was that the segmentation quality of </span></span><span style="font-family:Verdana;">the </span><span style="font-family:Verdana;">liver could be improved with PDN in which the up-sampling layer was replaced by a pixel de-convolution layer (PDL). Seventy</span><span style="font-family:Verdana;">-</span><span style="font-family:;" "=""><span style="font-family:Verdana;">eight plans of abdominal cancer patients were anonymized and exported. Sixty-two were chosen for training two networks: 1) 3D U-Net, and 2) 3D PDN, by minimizing the Dice loss function. The other sixteen plans were used to test the performance. The similarity Dice and Average Hausdorff Distance (AHD) were calculated and compared between these two networks. </span><b><span style="font-family:Verdana;">Results</span></b><span style="font-family:Verdana;">: The computation time for 62 training cases and 200 training epochs was about 30 minutes for both networks. The segmentation performance was evaluated using the remaining 16 cases. For the Dice score, the mean ± standard deviation were 0.857 ± 0.011 and 0.858 ± 0.015 for the PDN and U-Net, respectively. For the AHD, the mean ± standard deviation were 1.575 ± 0.373 and 1.675 ± 0.769, respectively, corresponding to an improvement of 6.0% and 51.5% of mean and standar展开更多
基金supported by the General Program of the National Natural Science Foundation of China(Grant No.61977029).
文摘Generating realistic and synthetic video from text is a highly challenging task due to the multitude of issues involved,including digit deformation,noise interference between frames,blurred output,and the need for temporal coherence across frames.In this paper,we propose a novel approach for generating coherent videos of moving digits from textual input using a Deep Deconvolutional Generative Adversarial Network(DD-GAN).The DDGAN comprises a Deep Deconvolutional Neural Network(DDNN)as a Generator(G)and a modified Deep Convolutional Neural Network(DCNN)as a Discriminator(D)to ensure temporal coherence between adjacent frames.The proposed research involves several steps.First,the input text is fed into a Long Short Term Memory(LSTM)based text encoder and then smoothed using Conditioning Augmentation(CA)techniques to enhance the effectiveness of the Generator(G).Next,using a DDNN to generate video frames by incorporating enhanced text and random noise and modifying a DCNN to act as a Discriminator(D),effectively distinguishing between generated and real videos.This research evaluates the quality of the generated videos using standard metrics like Inception Score(IS),Fréchet Inception Distance(FID),Fréchet Inception Distance for video(FID2vid),and Generative Adversarial Metric(GAM),along with a human study based on realism,coherence,and relevance.By conducting experiments on Single-Digit Bouncing MNIST GIFs(SBMG),Two-Digit Bouncing MNIST GIFs(TBMG),and a custom dataset of essential mathematics videos with related text,this research demonstrates significant improvements in both metrics and human study results,confirming the effectiveness of DD-GAN.This research also took the exciting challenge of generating preschool math videos from text,handling complex structures,digits,and symbols,and achieving successful results.The proposed research demonstrates promising results for generating coherent videos from textual input.
文摘This paper presents a new blind XPIC and a new adaptive blind deconvolutional algorithm based on HOS processing, which separates and equalizes the signals in real time. The simulation results demonstrate that the performance of the proposed adaptive blind algorithm,compared with the conventional algorithms, is outstanding with the feature of feasibility, stability and fast convergence rate.
文摘为了提高目标检测在嵌入式或移动端设备运行的可能性,基于SSD(Single Shot MultiBox Detector)框架,结合轻量化神经网络,构建一种轻量化SSD目标检测模型,称其为快速且精准的SSD(Fast and Accurate Single Shot Detector,FA-SSD).该方法采用轻量化卷积神经网络ESPNet作为基础网络,使用反卷积模块融合深浅层特征信息,并做轻量化处理,均衡模型尺寸和检测精度.实验结果表明,该方法相比原经典SSD算法具有更少的网络参数量和计算复杂度,在参数量上减少了47.3%,每秒处理图像帧数比经典SSD算法提升3.7倍.在VOC2007数据集中的测试平均精度均值(mAP)结果可以达到73.6%,和经典算法的结果相差无几,从而在保证检测精度的同时提高检测速度.
基金supported by the National Natural Science Foundation of China(61872231,61701297)the Major Program of the National Social Science Foundation of China(Grant No.20&ZD130).
文摘Image denoising is often used as a preprocessing step in computer vision tasks,which can help improve the accuracy of image processing models.Due to the imperfection of imaging systems,transmission media and recording equipment,digital images are often contaminated with various noises during their formation,which troubles the visual effects and even hinders people’s normal recognition.The pollution of noise directly affects the processing of image edge detection,feature extraction,pattern recognition,etc.,making it difficult for people to break through the bottleneck by modifying the model.Many traditional filtering methods have shown poor performance since they do not have optimal expression and adaptation for specific images.Meanwhile,deep learning technology opens up new possibilities for image denoising.In this paper,we propose a novel neural network which is based on generative adversarial networks for image denoising.Inspired by U-net,our method employs a novel symmetrical encoder-decoder based generator network.The encoder adopts convolutional neural networks to extract features,while the decoder outputs the noise in the images by deconvolutional neural networks.Specially,shortcuts are added between designated layers,which can preserve image texture details and prevent gradient explosions.Besides,in order to improve the training stability of the model,we add Wasserstein distance in loss function as an optimization.We use the peak signal-to-noise ratio(PSNR)to evaluate our model and we can prove the effectiveness of it with experimental results.When compared to the state-of-the-art approaches,our method presents competitive performance.
基金This work is funded by the National Natural Science Foundation of China(Grant Nos.62171114 and 52222810)the Fundamental Research Funds for the Central Universities(No.DUT22RC(3)099).
文摘Shield tunnel lining is prone to water leakage,which may further bring about corrosion and structural damage to the walls,potentially leading to dangerous accidents.To avoid tedious and inefficient manual inspection,many projects use artificial intelligence(Al)to detect cracks and water leakage.A novel method for water leakage inspection in shield tunnel lining that utilizes deep learning is introduced in this paper.Our proposal includes a ConvNeXt-S backbone,deconvolutional-feature pyramid network(D-FPN),spatial attention module(SPAM).and a detection head.It can extract representative features of leaking areas to aid inspection processes.To further improve the model's robustness,we innovatively use an inversed low-light enhancement method to convert normally illuminated images to low light ones and introduce them into the training samples.Validation experiments are performed,achieving the average precision(AP)score of 56.8%,which outperforms previous work by a margin of 5.7%.Visualization illustrations also support our method's practical effectiveness.
文摘<strong>Purpose</strong><span style="font-family:;" "=""><span style="font-family:Verdana;"><strong>: </strong></span><span style="font-family:Verdana;">To improve the liver auto-segmentation performance of three-</span><span style="font-family:Verdana;">dimensional (3D) U-net by replacing the conventional up-sampling convolution layers with the Pixel De-convolutional Network (PDN) that considers spatial features. </span><b><span style="font-family:Verdana;">Methods</span></b><span style="font-family:Verdana;">: The U-net was originally developed to segment neuronal structure with outstanding performance but suffered serious artifacts from indirectly unrelated adjacent pixels in its up-sampling layers. The hypothesis of this study was that the segmentation quality of </span></span><span style="font-family:Verdana;">the </span><span style="font-family:Verdana;">liver could be improved with PDN in which the up-sampling layer was replaced by a pixel de-convolution layer (PDL). Seventy</span><span style="font-family:Verdana;">-</span><span style="font-family:;" "=""><span style="font-family:Verdana;">eight plans of abdominal cancer patients were anonymized and exported. Sixty-two were chosen for training two networks: 1) 3D U-Net, and 2) 3D PDN, by minimizing the Dice loss function. The other sixteen plans were used to test the performance. The similarity Dice and Average Hausdorff Distance (AHD) were calculated and compared between these two networks. </span><b><span style="font-family:Verdana;">Results</span></b><span style="font-family:Verdana;">: The computation time for 62 training cases and 200 training epochs was about 30 minutes for both networks. The segmentation performance was evaluated using the remaining 16 cases. For the Dice score, the mean ± standard deviation were 0.857 ± 0.011 and 0.858 ± 0.015 for the PDN and U-Net, respectively. For the AHD, the mean ± standard deviation were 1.575 ± 0.373 and 1.675 ± 0.769, respectively, corresponding to an improvement of 6.0% and 51.5% of mean and standar