Prior studies have demonstrated that deep learning-based approaches can enhance the performance of source code vulnerability detection by training neural networks to learn vulnerability patterns in code representation...Prior studies have demonstrated that deep learning-based approaches can enhance the performance of source code vulnerability detection by training neural networks to learn vulnerability patterns in code representations.However,due to limitations in code representation and neural network design,the validity and practicality of the model still need to be improved.Additionally,due to differences in programming languages,most methods lack cross-language detection generality.To address these issues,in this paper,we analyze the shortcomings of previous code representations and neural networks.We propose a novel hierarchical code representation that combines Concrete Syntax Trees(CST)with Program Dependence Graphs(PDG).Furthermore,we introduce a Tree-Graph-Gated-Attention(TGGA)network based on gated recurrent units and attention mechanisms to build a Hierarchical Code Representation learning-based Vulnerability Detection(HCRVD)system.This system enables cross-language vulnerability detection at the function-level.The experiments show that HCRVD surpasses many competitors in vulnerability detection capabilities.It benefits from the hierarchical code representation learning method,and outperforms baseline in cross-language vulnerability detection by 9.772%and 11.819%in the C/C++and Java datasets,respectively.Moreover,HCRVD has certain ability to detect vulnerabilities in unknown programming languages and is useful in real open-source projects.HCRVD shows good validity,generality and practicality.展开更多
Cross-lingual image description,the task of generating image captions in a target language from images and descriptions in a source language,is addressed in this study through a novel approach that combines neural net...Cross-lingual image description,the task of generating image captions in a target language from images and descriptions in a source language,is addressed in this study through a novel approach that combines neural network models and semantic matching techniques.Experiments conducted on the Flickr8k and AraImg2k benchmark datasets,featuring images and descriptions in English and Arabic,showcase remarkable performance improvements over state-of-the-art methods.Our model,equipped with the Image&Cross-Language Semantic Matching module and the Target Language Domain Evaluation module,significantly enhances the semantic relevance of generated image descriptions.For English-to-Arabic and Arabic-to-English cross-language image descriptions,our approach achieves a CIDEr score for English and Arabic of 87.9%and 81.7%,respectively,emphasizing the substantial contributions of our methodology.Comparative analyses with previous works further affirm the superior performance of our approach,and visual results underscore that our model generates image captions that are both semantically accurate and stylistically consistent with the target language.In summary,this study advances the field of cross-lingual image description,offering an effective solution for generating image captions across languages,with the potential to impact multilingual communication and accessibility.Future research directions include expanding to more languages and incorporating diverse visual and textual data sources.展开更多
以深度学习为代表的数据分析应用越来越多依赖分布式文件系统存储管理大规模数据集.为了增强数据访问的兼容性,现有分布式文件存储系统通常需提供标准POSIX接口,以支持深度学习等应用的无缝对接.然而,以内核模块形态开发提供POSIX接口...以深度学习为代表的数据分析应用越来越多依赖分布式文件系统存储管理大规模数据集.为了增强数据访问的兼容性,现有分布式文件存储系统通常需提供标准POSIX接口,以支持深度学习等应用的无缝对接.然而,以内核模块形态开发提供POSIX接口的文件系统非常复杂耗时.近年来,用户态文件系统(Filesystem in Userspace,FUSE)框架大幅简化了文件系统的开发工作,已被Alluxio和Ceph等诸多知名分布式文件系统使用.目前常用的用户态FUSE库libfuse仅提供C语言编程接口,但现有大数据分布式文件系统基本都是基于Java语言开发的(例如HDFS和Alluxio等),为了使基于Java语言开发的分布式文件系统可以对接C语言开发的FUSE库,需采用跨语言FUSE框架作为中介.跨语言FUSE框架利用跨编程语言的函数回调机制,使操作系统FUSE库的C语言函数可以跨语言的调用分布式文件系统提供的Java语言编程接口,从而为大数据分布式文件系统提供标准POSIX接口的访问能力.但在数据密集型应用中,现有跨语言FUSE框架的执行效率低,导致数据密集型作业(深度学习、大数据分析等)中数据I/O耗时占据了显著的性能开销,成为新的潜在性能瓶颈.针对此问题,本文首先评估分析了重要且广为使用的跨语言FUSE框架JNR-FUSE的性能,发现并定位其在高并发和小文件场景下存在的性能瓶颈;接着从多方面剖析性能瓶颈根因,进而总结出高效跨语言FUSE框架的性能优化方向,并面向Java语言设计实现了跨语言FUSE框架JNI-FUSE.JNI-FUSE利用延迟分离和元信息缓存等优化技术降低跨语言函数回调开销,从而提升跨语言FUSE框架的性能.实验结果表明,对比当前性能最好的Java FUSE框架JNR-FUSE,本文提出的JNI-FUSE带来了1.15~6.04倍的FUSE框架性能提升和1.90~2.71倍的文件系统端到端性能提升,并为上层深度学习训练任务带来了1.06~1.73倍的训练加速.本文设�展开更多
基金funded by the Major Science and Technology Projects in Henan Province,China,Grant No.221100210600.
文摘Prior studies have demonstrated that deep learning-based approaches can enhance the performance of source code vulnerability detection by training neural networks to learn vulnerability patterns in code representations.However,due to limitations in code representation and neural network design,the validity and practicality of the model still need to be improved.Additionally,due to differences in programming languages,most methods lack cross-language detection generality.To address these issues,in this paper,we analyze the shortcomings of previous code representations and neural networks.We propose a novel hierarchical code representation that combines Concrete Syntax Trees(CST)with Program Dependence Graphs(PDG).Furthermore,we introduce a Tree-Graph-Gated-Attention(TGGA)network based on gated recurrent units and attention mechanisms to build a Hierarchical Code Representation learning-based Vulnerability Detection(HCRVD)system.This system enables cross-language vulnerability detection at the function-level.The experiments show that HCRVD surpasses many competitors in vulnerability detection capabilities.It benefits from the hierarchical code representation learning method,and outperforms baseline in cross-language vulnerability detection by 9.772%and 11.819%in the C/C++and Java datasets,respectively.Moreover,HCRVD has certain ability to detect vulnerabilities in unknown programming languages and is useful in real open-source projects.HCRVD shows good validity,generality and practicality.
文摘Cross-lingual image description,the task of generating image captions in a target language from images and descriptions in a source language,is addressed in this study through a novel approach that combines neural network models and semantic matching techniques.Experiments conducted on the Flickr8k and AraImg2k benchmark datasets,featuring images and descriptions in English and Arabic,showcase remarkable performance improvements over state-of-the-art methods.Our model,equipped with the Image&Cross-Language Semantic Matching module and the Target Language Domain Evaluation module,significantly enhances the semantic relevance of generated image descriptions.For English-to-Arabic and Arabic-to-English cross-language image descriptions,our approach achieves a CIDEr score for English and Arabic of 87.9%and 81.7%,respectively,emphasizing the substantial contributions of our methodology.Comparative analyses with previous works further affirm the superior performance of our approach,and visual results underscore that our model generates image captions that are both semantically accurate and stylistically consistent with the target language.In summary,this study advances the field of cross-lingual image description,offering an effective solution for generating image captions across languages,with the potential to impact multilingual communication and accessibility.Future research directions include expanding to more languages and incorporating diverse visual and textual data sources.
文摘以深度学习为代表的数据分析应用越来越多依赖分布式文件系统存储管理大规模数据集.为了增强数据访问的兼容性,现有分布式文件存储系统通常需提供标准POSIX接口,以支持深度学习等应用的无缝对接.然而,以内核模块形态开发提供POSIX接口的文件系统非常复杂耗时.近年来,用户态文件系统(Filesystem in Userspace,FUSE)框架大幅简化了文件系统的开发工作,已被Alluxio和Ceph等诸多知名分布式文件系统使用.目前常用的用户态FUSE库libfuse仅提供C语言编程接口,但现有大数据分布式文件系统基本都是基于Java语言开发的(例如HDFS和Alluxio等),为了使基于Java语言开发的分布式文件系统可以对接C语言开发的FUSE库,需采用跨语言FUSE框架作为中介.跨语言FUSE框架利用跨编程语言的函数回调机制,使操作系统FUSE库的C语言函数可以跨语言的调用分布式文件系统提供的Java语言编程接口,从而为大数据分布式文件系统提供标准POSIX接口的访问能力.但在数据密集型应用中,现有跨语言FUSE框架的执行效率低,导致数据密集型作业(深度学习、大数据分析等)中数据I/O耗时占据了显著的性能开销,成为新的潜在性能瓶颈.针对此问题,本文首先评估分析了重要且广为使用的跨语言FUSE框架JNR-FUSE的性能,发现并定位其在高并发和小文件场景下存在的性能瓶颈;接着从多方面剖析性能瓶颈根因,进而总结出高效跨语言FUSE框架的性能优化方向,并面向Java语言设计实现了跨语言FUSE框架JNI-FUSE.JNI-FUSE利用延迟分离和元信息缓存等优化技术降低跨语言函数回调开销,从而提升跨语言FUSE框架的性能.实验结果表明,对比当前性能最好的Java FUSE框架JNR-FUSE,本文提出的JNI-FUSE带来了1.15~6.04倍的FUSE框架性能提升和1.90~2.71倍的文件系统端到端性能提升,并为上层深度学习训练任务带来了1.06~1.73倍的训练加速.本文设�