Human posture estimation is a prominent research topic in the fields of human-com-puter interaction,motion recognition,and other intelligent applications.However,achieving highaccuracy in key point localization,which ...Human posture estimation is a prominent research topic in the fields of human-com-puter interaction,motion recognition,and other intelligent applications.However,achieving highaccuracy in key point localization,which is crucial for intelligent applications,contradicts the lowdetection accuracy of human posture detection models in practical scenarios.To address this issue,a human pose estimation network called AT-HRNet has been proposed,which combines convolu-tional self-attention and cross-dimensional feature transformation.AT-HRNet captures significantfeature information from various regions in an adaptive manner,aggregating them through convolu-tional operations within the local receptive domain.The residual structures TripNeck and Trip-Block of the high-resolution network are designed to further refine the key point locations,wherethe attention weight is adjusted by a cross-dimensional interaction to obtain more features.To vali-date the effectiveness of this network,AT-HRNet was evaluated using the COCO2017 dataset.Theresults show that AT-HRNet outperforms HRNet by improving 3.2%in mAP,4.0%in AP75,and3.9%in AP^(M).This suggests that AT-HRNet can offer more beneficial solutions for human posture estimation.展开更多
Generative adversarial networks(GANs)are an unsupervised generative model that learns data distribution through adversarial training.However,recent experiments indicated that GANs are difficult to train due to the req...Generative adversarial networks(GANs)are an unsupervised generative model that learns data distribution through adversarial training.However,recent experiments indicated that GANs are difficult to train due to the requirement of optimization in the high dimensional parameter space and the zero gradient problem.In this work,we propose a self-sparse generative adversarial network(Self-Sparse GAN)that reduces the parameter space and alleviates the zero gradient problem.In the Self-Sparse GAN,we design a self-adaptive sparse transform module(SASTM)comprising the sparsity decomposition and feature-map recombination,which can be applied on multi-channel feature maps to obtain sparse feature maps.The key idea of Self-Sparse GAN is to add the SASTM following every deconvolution layer in the generator,which can adaptively reduce the parameter space by utilizing the sparsity in multi-channel feature maps.We theoretically prove that the SASTM can not only reduce the search space of the convolution kernel weight of the generator but also alleviate the zero gradient problem by maintaining meaningful features in the batch normalization layer and driving the weight of deconvolution layers away from being negative.The experimental results show that our method achieves the best Fréchet inception distance(FID)scores for image generation compared with Wasserstein GAN with gradient penalty(WGAN-GP)on MNIST,Fashion-MNIST,CIFAR-10,STL-10,mini-ImageNet,CELEBA-HQ,and LSUN bedrooms datasets,and the relative decrease of FID is 4.76%-21.84%.Meanwhile,an architectural sketch dataset(Sketch)is also used to validate the superiority of the proposed method.展开更多
基金the National Natural Science Foundation of China(No.61975015)the Research and Innovation Project for Graduate Students at Zhongyuan University of Technology(No.YKY2024ZK14).
文摘Human posture estimation is a prominent research topic in the fields of human-com-puter interaction,motion recognition,and other intelligent applications.However,achieving highaccuracy in key point localization,which is crucial for intelligent applications,contradicts the lowdetection accuracy of human posture detection models in practical scenarios.To address this issue,a human pose estimation network called AT-HRNet has been proposed,which combines convolu-tional self-attention and cross-dimensional feature transformation.AT-HRNet captures significantfeature information from various regions in an adaptive manner,aggregating them through convolu-tional operations within the local receptive domain.The residual structures TripNeck and Trip-Block of the high-resolution network are designed to further refine the key point locations,wherethe attention weight is adjusted by a cross-dimensional interaction to obtain more features.To vali-date the effectiveness of this network,AT-HRNet was evaluated using the COCO2017 dataset.Theresults show that AT-HRNet outperforms HRNet by improving 3.2%in mAP,4.0%in AP75,and3.9%in AP^(M).This suggests that AT-HRNet can offer more beneficial solutions for human posture estimation.
基金This work was supported by the National Natural Science Foundation of China(Nos.51921006 and 52008138)Heilongjiang Touyan Innovation Team Program(No.AUEA5640200320).
文摘Generative adversarial networks(GANs)are an unsupervised generative model that learns data distribution through adversarial training.However,recent experiments indicated that GANs are difficult to train due to the requirement of optimization in the high dimensional parameter space and the zero gradient problem.In this work,we propose a self-sparse generative adversarial network(Self-Sparse GAN)that reduces the parameter space and alleviates the zero gradient problem.In the Self-Sparse GAN,we design a self-adaptive sparse transform module(SASTM)comprising the sparsity decomposition and feature-map recombination,which can be applied on multi-channel feature maps to obtain sparse feature maps.The key idea of Self-Sparse GAN is to add the SASTM following every deconvolution layer in the generator,which can adaptively reduce the parameter space by utilizing the sparsity in multi-channel feature maps.We theoretically prove that the SASTM can not only reduce the search space of the convolution kernel weight of the generator but also alleviate the zero gradient problem by maintaining meaningful features in the batch normalization layer and driving the weight of deconvolution layers away from being negative.The experimental results show that our method achieves the best Fréchet inception distance(FID)scores for image generation compared with Wasserstein GAN with gradient penalty(WGAN-GP)on MNIST,Fashion-MNIST,CIFAR-10,STL-10,mini-ImageNet,CELEBA-HQ,and LSUN bedrooms datasets,and the relative decrease of FID is 4.76%-21.84%.Meanwhile,an architectural sketch dataset(Sketch)is also used to validate the superiority of the proposed method.