With the rapid development of artificial intelligence and the widespread use of the Internet of Things, semantic communication, as an emerging communication paradigm, has been attracting great interest. Taking image t...With the rapid development of artificial intelligence and the widespread use of the Internet of Things, semantic communication, as an emerging communication paradigm, has been attracting great interest. Taking image transmission as an example, from the semantic communication's view, not all pixels in the images are equally important for certain receivers. The existing semantic communication systems directly perform semantic encoding and decoding on the whole image, in which the region of interest cannot be identified. In this paper, we propose a novel semantic communication system for image transmission that can distinguish between Regions Of Interest (ROI) and Regions Of Non-Interest (RONI) based on semantic segmentation, where a semantic segmentation algorithm is used to classify each pixel of the image and distinguish ROI and RONI. The system also enables high-quality transmission of ROI with lower communication overheads by transmissions through different semantic communication networks with different bandwidth requirements. An improved metric θPSNR is proposed to evaluate the transmission accuracy of the novel semantic transmission network. Experimental results show that our proposed system achieves a significant performance improvement compared with existing approaches, namely, existing semantic communication approaches and the conventional approach without semantics.展开更多
Video transmission requires considerable bandwidth,and current widely employed schemes prove inadequate when confronted with scenes featuring prominently.Motivated by the strides in talkinghead generative technology,t...Video transmission requires considerable bandwidth,and current widely employed schemes prove inadequate when confronted with scenes featuring prominently.Motivated by the strides in talkinghead generative technology,the paper introduces a semantic transmission system tailored for talking-head videos.The system captures semantic information from talking-head video and faithfully reconstructs source video at the receiver,only one-shot reference frame and compact semantic features are required for the entire transmission.Specifically,we analyze video semantics in the pixel domain frame-by-frame and jointly process multi-frame semantic information to seamlessly incorporate spatial and temporal information.Variational modeling is utilized to evaluate the diversity of importance among group semantics,thereby guiding bandwidth resource allocation for semantics to enhance system efficiency.The whole endto-end system is modeled as an optimization problem and equivalent to acquiring optimal rate-distortion performance.We evaluate our system on both reference frame and video transmission,experimental results demonstrate that our system can improve the efficiency and robustness of communications.Compared to the classical approaches,our system can save over 90%of bandwidth when user perception is close.展开更多
Recently,there have been significant advancements in the study of semantic communication in single-modal scenarios.However,the ability to process information in multi-modal environments remains limited.Inspired by the...Recently,there have been significant advancements in the study of semantic communication in single-modal scenarios.However,the ability to process information in multi-modal environments remains limited.Inspired by the research and applications of natural language processing across different modalities,our goal is to accurately extract frame-level semantic information from videos and ultimately transmit high-quality videos.Specifically,we propose a deep learning-basedMulti-ModalMutual Enhancement Video Semantic Communication system,called M3E-VSC.Built upon a VectorQuantized Generative AdversarialNetwork(VQGAN),our systemaims to leverage mutual enhancement among different modalities by using text as the main carrier of transmission.With it,the semantic information can be extracted fromkey-frame images and audio of the video and performdifferential value to ensure that the extracted text conveys accurate semantic information with fewer bits,thus improving the capacity of the system.Furthermore,a multi-frame semantic detection module is designed to facilitate semantic transitions during video generation.Simulation results demonstrate that our proposed model maintains high robustness in complex noise environments,particularly in low signal-to-noise ratio conditions,significantly improving the accuracy and speed of semantic transmission in video communication by approximately 50 percent.展开更多
With the development of underwater sonar detection technology,simultaneous localization and mapping(SLAM)approach has attracted much attention in underwater navigation field in recent years.But the weak detection abil...With the development of underwater sonar detection technology,simultaneous localization and mapping(SLAM)approach has attracted much attention in underwater navigation field in recent years.But the weak detection ability of a single vehicle limits the SLAM performance in wide areas.Thereby,cooperative SLAM using multiple vehicles has become an important research direction.The key factor of cooperative SLAM is timely and efficient sonar image transmission among underwater vehicles.However,the limited bandwidth of underwater acoustic channels contradicts a large amount of sonar image data.It is essential to compress the images before transmission.Recently,deep neural networks have great value in image compression by virtue of the powerful learning ability of neural networks,but the existing sonar image compression methods based on neural network usually focus on the pixel-level information without the semantic-level information.In this paper,we propose a novel underwater acoustic transmission scheme called UAT-SSIC that includes semantic segmentation-based sonar image compression(SSIC)framework and the joint source-channel codec,to improve the accuracy of the semantic information of the reconstructed sonar image at the receiver.The SSIC framework consists of Auto-Encoder structure-based sonar image compression network,which is measured by a semantic segmentation network's residual.Considering that sonar images have the characteristics of blurred target edges,the semantic segmentation network used a special dilated convolution neural network(DiCNN)to enhance segmentation accuracy by expanding the range of receptive fields.The joint source-channel codec with unequal error protection is proposed that adjusts the power level of the transmitted data,which deal with sonar image transmission error caused by the serious underwater acoustic channel.Experiment results demonstrate that our method preserves more semantic information,with advantages over existing methods at the same compression ratio.It also improves the error 展开更多
With the development of deep learning(DL),joint source-channel coding(JSCC)solutions for end-to-end transmission have gained a lot of attention.Adaptive deep JSCC schemes support dynamically adjusting the rate accordi...With the development of deep learning(DL),joint source-channel coding(JSCC)solutions for end-to-end transmission have gained a lot of attention.Adaptive deep JSCC schemes support dynamically adjusting the rate according to different channel conditions during transmission,enhancing robustness in dynamic wireless environment.However,most of the existing adaptive JSCC schemes only consider different channel conditions,ignoring the different feature importance in the image processing and transmission.The uniform compression of different features in the image may result in the compromise of critical image details,particularly in low signal-to-noise ratio(SNR)scenarios.To address the above issues,in this paper,a dual attention mechanism is introduced and an SNR-adaptive deep JSCC mechanism with a convolutional block attention module(CBAM)is proposed,in which matrix operations are applied to features in spatial and channel dimensions respectively.The proposed solution concatenates the pooling feature with the SNR level and passes it sequentially through the channel attention network and spatial attention network to obtain the importance evaluation result.Experiments show that the proposed solution outperforms other baseline schemes in terms of peak SNR(PSNR)and structural similarity(SSIM),particularly in low SNR scenarios or when dealing with complex image content.展开更多
The emerging new services in the sixth generation(6G)communication system impose increasingly stringent requirements and challenges on video transmission.Semantic communications are envisioned as a promising solution ...The emerging new services in the sixth generation(6G)communication system impose increasingly stringent requirements and challenges on video transmission.Semantic communications are envisioned as a promising solution to these challenges.This paper provides a highly-efficient solution to video transmission by proposing a scalable semantic transmission algorithm,named scalable semantic transmission framework for video(SST-V),which jointly considers the semantic importance and channel conditions.Specifically,a semantic importance evaluation module is designed to extract more informative semantic features according to the estimated importance level,facilitating high-efficiency semantic coding.By further considering the channel condition,a cascaded learning based scalable joint semanticchannel coding algorithm is proposed,which autonomously adapts the semantic coding and channel coding strategies to the specific signalto-noise ratio(SNR).Simulation results show that SST-V achieves better video reconstruction performance,while significantly reducing the transmission overhead.展开更多
基金supported in part by collaborative research with Toyota Motor Corporation,in part by ROIS NII Open Collaborative Research under Grant 21S0601,in part by JSPS KAKENHI under Grants 20H00592,21H03424.
文摘With the rapid development of artificial intelligence and the widespread use of the Internet of Things, semantic communication, as an emerging communication paradigm, has been attracting great interest. Taking image transmission as an example, from the semantic communication's view, not all pixels in the images are equally important for certain receivers. The existing semantic communication systems directly perform semantic encoding and decoding on the whole image, in which the region of interest cannot be identified. In this paper, we propose a novel semantic communication system for image transmission that can distinguish between Regions Of Interest (ROI) and Regions Of Non-Interest (RONI) based on semantic segmentation, where a semantic segmentation algorithm is used to classify each pixel of the image and distinguish ROI and RONI. The system also enables high-quality transmission of ROI with lower communication overheads by transmissions through different semantic communication networks with different bandwidth requirements. An improved metric θPSNR is proposed to evaluate the transmission accuracy of the novel semantic transmission network. Experimental results show that our proposed system achieves a significant performance improvement compared with existing approaches, namely, existing semantic communication approaches and the conventional approach without semantics.
基金supported by the National Natural Science Foundation of China(No.61971062)BUPT Excellent Ph.D.Students Foundation(CX2022153)。
文摘Video transmission requires considerable bandwidth,and current widely employed schemes prove inadequate when confronted with scenes featuring prominently.Motivated by the strides in talkinghead generative technology,the paper introduces a semantic transmission system tailored for talking-head videos.The system captures semantic information from talking-head video and faithfully reconstructs source video at the receiver,only one-shot reference frame and compact semantic features are required for the entire transmission.Specifically,we analyze video semantics in the pixel domain frame-by-frame and jointly process multi-frame semantic information to seamlessly incorporate spatial and temporal information.Variational modeling is utilized to evaluate the diversity of importance among group semantics,thereby guiding bandwidth resource allocation for semantics to enhance system efficiency.The whole endto-end system is modeled as an optimization problem and equivalent to acquiring optimal rate-distortion performance.We evaluate our system on both reference frame and video transmission,experimental results demonstrate that our system can improve the efficiency and robustness of communications.Compared to the classical approaches,our system can save over 90%of bandwidth when user perception is close.
基金supported by the National Key Research and Development Project under Grant 2020YFB1807602Key Program of Marine Economy Development Special Foundation of Department of Natural Resources of Guangdong Province(GDNRC[2023]24)the National Natural Science Foundation of China under Grant 62271267.
文摘Recently,there have been significant advancements in the study of semantic communication in single-modal scenarios.However,the ability to process information in multi-modal environments remains limited.Inspired by the research and applications of natural language processing across different modalities,our goal is to accurately extract frame-level semantic information from videos and ultimately transmit high-quality videos.Specifically,we propose a deep learning-basedMulti-ModalMutual Enhancement Video Semantic Communication system,called M3E-VSC.Built upon a VectorQuantized Generative AdversarialNetwork(VQGAN),our systemaims to leverage mutual enhancement among different modalities by using text as the main carrier of transmission.With it,the semantic information can be extracted fromkey-frame images and audio of the video and performdifferential value to ensure that the extracted text conveys accurate semantic information with fewer bits,thus improving the capacity of the system.Furthermore,a multi-frame semantic detection module is designed to facilitate semantic transitions during video generation.Simulation results demonstrate that our proposed model maintains high robustness in complex noise environments,particularly in low signal-to-noise ratio conditions,significantly improving the accuracy and speed of semantic transmission in video communication by approximately 50 percent.
基金supported in part by the Tianjin Technology Innovation Guidance Special Fund Project under Grant No.21YDTPJC00850in part by the National Natural Science Foundation of China under Grant No.41906161in part by the Natural Science Foundation of Tianjin under Grant No.21JCQNJC00650。
文摘With the development of underwater sonar detection technology,simultaneous localization and mapping(SLAM)approach has attracted much attention in underwater navigation field in recent years.But the weak detection ability of a single vehicle limits the SLAM performance in wide areas.Thereby,cooperative SLAM using multiple vehicles has become an important research direction.The key factor of cooperative SLAM is timely and efficient sonar image transmission among underwater vehicles.However,the limited bandwidth of underwater acoustic channels contradicts a large amount of sonar image data.It is essential to compress the images before transmission.Recently,deep neural networks have great value in image compression by virtue of the powerful learning ability of neural networks,but the existing sonar image compression methods based on neural network usually focus on the pixel-level information without the semantic-level information.In this paper,we propose a novel underwater acoustic transmission scheme called UAT-SSIC that includes semantic segmentation-based sonar image compression(SSIC)framework and the joint source-channel codec,to improve the accuracy of the semantic information of the reconstructed sonar image at the receiver.The SSIC framework consists of Auto-Encoder structure-based sonar image compression network,which is measured by a semantic segmentation network's residual.Considering that sonar images have the characteristics of blurred target edges,the semantic segmentation network used a special dilated convolution neural network(DiCNN)to enhance segmentation accuracy by expanding the range of receptive fields.The joint source-channel codec with unequal error protection is proposed that adjusts the power level of the transmitted data,which deal with sonar image transmission error caused by the serious underwater acoustic channel.Experiment results demonstrate that our method preserves more semantic information,with advantages over existing methods at the same compression ratio.It also improves the error
基金This work was supported in part by the National Natural Science Foundation of China(62293481)in part by the Young Elite Scientists Sponsorship Program by CAST(2023QNRC001)+1 种基金in part by the National Natural Science Foundation for Young Scientists of China(62001050)in part by the Fundamental Research Funds for the Central Universities(2023RC95).
文摘With the development of deep learning(DL),joint source-channel coding(JSCC)solutions for end-to-end transmission have gained a lot of attention.Adaptive deep JSCC schemes support dynamically adjusting the rate according to different channel conditions during transmission,enhancing robustness in dynamic wireless environment.However,most of the existing adaptive JSCC schemes only consider different channel conditions,ignoring the different feature importance in the image processing and transmission.The uniform compression of different features in the image may result in the compromise of critical image details,particularly in low signal-to-noise ratio(SNR)scenarios.To address the above issues,in this paper,a dual attention mechanism is introduced and an SNR-adaptive deep JSCC mechanism with a convolutional block attention module(CBAM)is proposed,in which matrix operations are applied to features in spatial and channel dimensions respectively.The proposed solution concatenates the pooling feature with the SNR level and passes it sequentially through the channel attention network and spatial attention network to obtain the importance evaluation result.Experiments show that the proposed solution outperforms other baseline schemes in terms of peak SNR(PSNR)and structural similarity(SSIM),particularly in low SNR scenarios or when dealing with complex image content.
基金supported in part by the National Natural Science Founda⁃tion of China under Grant No.62293485the Fundamental Research Funds for the Central Universities under Grant No.2022RC18.
文摘The emerging new services in the sixth generation(6G)communication system impose increasingly stringent requirements and challenges on video transmission.Semantic communications are envisioned as a promising solution to these challenges.This paper provides a highly-efficient solution to video transmission by proposing a scalable semantic transmission algorithm,named scalable semantic transmission framework for video(SST-V),which jointly considers the semantic importance and channel conditions.Specifically,a semantic importance evaluation module is designed to extract more informative semantic features according to the estimated importance level,facilitating high-efficiency semantic coding.By further considering the channel condition,a cascaded learning based scalable joint semanticchannel coding algorithm is proposed,which autonomously adapts the semantic coding and channel coding strategies to the specific signalto-noise ratio(SNR).Simulation results show that SST-V achieves better video reconstruction performance,while significantly reducing the transmission overhead.