针对国防科技大学自主研发的异构多核数字信号处理(digital signal processing, DSP)芯片的特征以及卷积算法自身特点,提出了一种面向多核DSP架构的高性能多核并行卷积实现方案。针对1×1卷积提出了特征图级多核并行方案;针对卷积...针对国防科技大学自主研发的异构多核数字信号处理(digital signal processing, DSP)芯片的特征以及卷积算法自身特点,提出了一种面向多核DSP架构的高性能多核并行卷积实现方案。针对1×1卷积提出了特征图级多核并行方案;针对卷积核大于1的卷积提出了窗口级多核并行优化设计,同时提出了逐元素向量化计算的核内并行优化实现。实验结果表明,所提并行优化方法实现单核计算效率最高能达到64.95%,在带宽受限情况下,多核并行扩展效率可达到48.36%~88.52%,在典型网络ResNet50上的执行性能与E5-2640 CPU相比,获得了5.39倍性能加速。展开更多
矩阵乘卷积算法能够为各种卷积配置提供高性能基础实现,是面向给定芯片进行卷积性能优化的首要选择。针对国防科技大学自主研制的飞腾异构多核数字信号处理器(digital signal processor,DSP)芯片的特征以及矩阵乘卷积算法自身的特点,提...矩阵乘卷积算法能够为各种卷积配置提供高性能基础实现,是面向给定芯片进行卷积性能优化的首要选择。针对国防科技大学自主研制的飞腾异构多核数字信号处理器(digital signal processor,DSP)芯片的特征以及矩阵乘卷积算法自身的特点,提出了一种面向多核DSP架构的高性能并行矩阵乘卷积实现算法ftmEConv。该算法由输入特征图转换、卷积核转换、矩阵乘以及输出特征图转换这四个均运行在通用多核DSP上的并行化部分构成,通过有效挖掘通用DSP核中功能单元的潜力来提升各个部分的性能。实验结果表明,ftmEConv实现了高达42.90%的计算效率,与芯片上的其他矩阵乘卷积算法实现相比,获得了高达7.79倍的性能加速。展开更多
The Ring effect refers to the filling in of Fraunhofer lines, which is mainly attributed to the rotational Raman scattering of solar spectra by N2 and O2 molecules in the atmosphere. The Ring effect is one of the most...The Ring effect refers to the filling in of Fraunhofer lines, which is mainly attributed to the rotational Raman scattering of solar spectra by N2 and O2 molecules in the atmosphere. The Ring effect is one of the most significant factors affecting the accuracy of retrieving concentrations of atmospheric trace gases, such as NO2 and SO2, from satellite observations through differential optical absorption spectroscopy. First in this study, the solar spectrum measured by the Ozone Monitoring Instrument onboard NASA Aura is convolved with the rotational Raman cross section of the atmosphere, which is calculated from the rotational Raman cross sections of N2 and O2 molecules, and divided by the original solar spectrum. The slowly varying term is removed by fitting it with a cubic polynomial to obtain the differential Ring spectrum. The results agree well with the calculations using a radiative transfer model (R2=0.9663). Second, the differential Ring spectrum is computed using two fixed wavelengths of 410 nm and 488 nm, and the resulting differential Ring spectra are similar to that calculated with varying wavelengths and agree well with the calculation using the radiative transfer model (R2=0.9624 and 0.9639 respectively). The computation time using the fixed wavelength is about 0.128% of that using a varying wavelength. Finally, we found that the frequency spectrum of the Raman cross sections for the atmosphere, N2 molecules and O2 molecules are similar; thus, the Raman cross section of N2 or O2 molecules can be used to compute the approximate Ring effect for simplicity.展开更多
文摘针对国防科技大学自主研发的异构多核数字信号处理(digital signal processing, DSP)芯片的特征以及卷积算法自身特点,提出了一种面向多核DSP架构的高性能多核并行卷积实现方案。针对1×1卷积提出了特征图级多核并行方案;针对卷积核大于1的卷积提出了窗口级多核并行优化设计,同时提出了逐元素向量化计算的核内并行优化实现。实验结果表明,所提并行优化方法实现单核计算效率最高能达到64.95%,在带宽受限情况下,多核并行扩展效率可达到48.36%~88.52%,在典型网络ResNet50上的执行性能与E5-2640 CPU相比,获得了5.39倍性能加速。
文摘矩阵乘卷积算法能够为各种卷积配置提供高性能基础实现,是面向给定芯片进行卷积性能优化的首要选择。针对国防科技大学自主研制的飞腾异构多核数字信号处理器(digital signal processor,DSP)芯片的特征以及矩阵乘卷积算法自身的特点,提出了一种面向多核DSP架构的高性能并行矩阵乘卷积实现算法ftmEConv。该算法由输入特征图转换、卷积核转换、矩阵乘以及输出特征图转换这四个均运行在通用多核DSP上的并行化部分构成,通过有效挖掘通用DSP核中功能单元的潜力来提升各个部分的性能。实验结果表明,ftmEConv实现了高达42.90%的计算效率,与芯片上的其他矩阵乘卷积算法实现相比,获得了高达7.79倍的性能加速。
文摘The Ring effect refers to the filling in of Fraunhofer lines, which is mainly attributed to the rotational Raman scattering of solar spectra by N2 and O2 molecules in the atmosphere. The Ring effect is one of the most significant factors affecting the accuracy of retrieving concentrations of atmospheric trace gases, such as NO2 and SO2, from satellite observations through differential optical absorption spectroscopy. First in this study, the solar spectrum measured by the Ozone Monitoring Instrument onboard NASA Aura is convolved with the rotational Raman cross section of the atmosphere, which is calculated from the rotational Raman cross sections of N2 and O2 molecules, and divided by the original solar spectrum. The slowly varying term is removed by fitting it with a cubic polynomial to obtain the differential Ring spectrum. The results agree well with the calculations using a radiative transfer model (R2=0.9663). Second, the differential Ring spectrum is computed using two fixed wavelengths of 410 nm and 488 nm, and the resulting differential Ring spectra are similar to that calculated with varying wavelengths and agree well with the calculation using the radiative transfer model (R2=0.9624 and 0.9639 respectively). The computation time using the fixed wavelength is about 0.128% of that using a varying wavelength. Finally, we found that the frequency spectrum of the Raman cross sections for the atmosphere, N2 molecules and O2 molecules are similar; thus, the Raman cross section of N2 or O2 molecules can be used to compute the approximate Ring effect for simplicity.