As an important computing operation,photonic matrix-vector multiplication is widely used in photonic neutral networks and signal processing.However,conventional incoherent matrix-vector multiplication focuses on real-...As an important computing operation,photonic matrix-vector multiplication is widely used in photonic neutral networks and signal processing.However,conventional incoherent matrix-vector multiplication focuses on real-valued operations,which cannot work well in complex-valued neural networks and discrete Fourier transform.In this paper,we propose a systematic solution to extend the matrix computation of microring arrays from the real-valued field to the complex-valued field,and from small-scale(i.e.,4×4)to large-scale matrix computation(i.e.,16×16).Combining matrix decomposition and matrix partition,our photonic complex matrix-vector multiplier chip can support arbitrary large-scale and complex-valued matrix computation.We further demonstrate Walsh-Hardmard transform,discrete cosine transform,discrete Fourier transform,and image convolutional processing.Our scheme provides a path towards breaking the limits of complex-valued computing accelerator in conventional incoherent optical architecture.More importantly,our results reveal that an integrated photonic platform is of huge potential for large-scale,complex-valued,artificial intelligence computing and signal processing.展开更多
For a scintillating-fiber array fast-neutron radiography system,a point-spread-function computing model was introduced,and the simulation code was developed. The results of calculation show that fast-neutron radiograp...For a scintillating-fiber array fast-neutron radiography system,a point-spread-function computing model was introduced,and the simulation code was developed. The results of calculation show that fast-neutron radiographs vary with the size of fast neutron sources,the size of fiber cross-section and the imaging geometry. The results suggest that the following qualifications are helpful for a good point spread function: The cross-section of scintillating fibers not greater than 200 μm×200 μm,the size of neutron source as small as a few millimeters,the distance between the source and the scintillating fiber array greater than 1 m,and inspected samples placed as close as possible to the array. The results give suggestions not only to experiment considerations but also to the estimation of spatial resolution for a specific system.展开更多
In order to take into account the computing efficiency and flexibility of calculating transcendental functions, this paper proposes one kind of reconfigurable transcendental function generator. The generator is of a r...In order to take into account the computing efficiency and flexibility of calculating transcendental functions, this paper proposes one kind of reconfigurable transcendental function generator. The generator is of a reconfigurable array structure composed of 30 processing elements (PEs). The coordinate rotational digital computer (CORDIC) algorithm is implemented on this structure. Different functions, such as sine, cosine, inverse tangent, logarithmic, etc., can be calculated based on the structure by reconfiguring the functions of PEs. The functional simulation and field programmable gate array (FPGA) verification show that the proposed method obtains great flexibility with acceptable performance.展开更多
Facing the computing demands of Internet of things(IoT)and artificial intelligence(AI),the cost induced by moving the data between the central processing unit(CPU)and memory is the key problem and a chip featured with...Facing the computing demands of Internet of things(IoT)and artificial intelligence(AI),the cost induced by moving the data between the central processing unit(CPU)and memory is the key problem and a chip featured with flexible structural unit,ultra-low power consumption,and huge parallelism will be needed.In-memory computing,a non-von Neumann architecture fusing memory units and computing units,can eliminate the data transfer time and energy consumption while performing massive parallel computations.Prototype in-memory computing schemes modified from different memory technologies have shown orders of magnitude improvement in computing efficiency,making it be regarded as the ultimate computing paradigm.Here we review the state-of-the-art memory device technologies potential for in-memory computing,summarize their versatile applications in neural network,stochastic generation,and hybrid precision digital computing,with promising solutions for unprecedented computing tasks,and also discuss the challenges of stability and integration for general in-memory computing.展开更多
Deep learning algorithms have been widely used in computer vision,natural language processing and other fields.However,due to the ever-increasing scale of the deep learning model,the requirements for storage and compu...Deep learning algorithms have been widely used in computer vision,natural language processing and other fields.However,due to the ever-increasing scale of the deep learning model,the requirements for storage and computing performance are getting higher and higher,and the processors based on the von Neumann architecture have gradually exposed significant shortcomings such as consumption and long latency.In order to alleviate this problem,large-scale processing systems are shifting from a traditional computing-centric model to a data-centric model.A near-memory computing array architecture based on the shared buffer is proposed in this paper to improve system performance,which supports instructions with the characteristics of store-calculation integration,reducing the data movement between the processor and main memory.Through data reuse,the processing speed of the algorithm is further improved.The proposed architecture is verified and tested through the parallel realization of the convolutional neural network(CNN)algorithm.The experimental results show that at the frequency of 110 MHz,the calculation speed of a single convolution operation is increased by 66.64%on average compared with the CNN architecture that performs parallel calculations on field programmable gate array(FPGA).The processing speed of the whole convolution layer is improved by 8.81%compared with the reconfigurable array processor that does not support near-memory computing.展开更多
基金This work was partially supported by the National Key Research and Development Project of China(No.2018YFB2201901)the National Natural Science Foundation of China(Grant Nos.61805090 and 62075075)+1 种基金Shenzhen Science and Technology Innovation Commission(No.SGDX2019081623060558)Research Grants Council of Hong Kong SAR(No.PolyU152241/18E).
文摘As an important computing operation,photonic matrix-vector multiplication is widely used in photonic neutral networks and signal processing.However,conventional incoherent matrix-vector multiplication focuses on real-valued operations,which cannot work well in complex-valued neural networks and discrete Fourier transform.In this paper,we propose a systematic solution to extend the matrix computation of microring arrays from the real-valued field to the complex-valued field,and from small-scale(i.e.,4×4)to large-scale matrix computation(i.e.,16×16).Combining matrix decomposition and matrix partition,our photonic complex matrix-vector multiplier chip can support arbitrary large-scale and complex-valued matrix computation.We further demonstrate Walsh-Hardmard transform,discrete cosine transform,discrete Fourier transform,and image convolutional processing.Our scheme provides a path towards breaking the limits of complex-valued computing accelerator in conventional incoherent optical architecture.More importantly,our results reveal that an integrated photonic platform is of huge potential for large-scale,complex-valued,artificial intelligence computing and signal processing.
基金Supported by the Foundation of Double-Hundred Talents of China Academy of Engineering Physics (Grant No. 2004R0301)
文摘For a scintillating-fiber array fast-neutron radiography system,a point-spread-function computing model was introduced,and the simulation code was developed. The results of calculation show that fast-neutron radiographs vary with the size of fast neutron sources,the size of fiber cross-section and the imaging geometry. The results suggest that the following qualifications are helpful for a good point spread function: The cross-section of scintillating fibers not greater than 200 μm×200 μm,the size of neutron source as small as a few millimeters,the distance between the source and the scintillating fiber array greater than 1 m,and inspected samples placed as close as possible to the array. The results give suggestions not only to experiment considerations but also to the estimation of spatial resolution for a specific system.
基金supported by the National Natural Science Foundation of China(61272120,61602377,61634004)the Natural Science Foundation of Shaanxi Province of China(2015JM6326)+1 种基金Shaanxi Provincial Co-ordination Innovation Project of Science and Technology(2016KTZDGY02-04-02)the Project of Education Department of Shaanxi Provincial Government(15JK1683)
文摘In order to take into account the computing efficiency and flexibility of calculating transcendental functions, this paper proposes one kind of reconfigurable transcendental function generator. The generator is of a reconfigurable array structure composed of 30 processing elements (PEs). The coordinate rotational digital computer (CORDIC) algorithm is implemented on this structure. Different functions, such as sine, cosine, inverse tangent, logarithmic, etc., can be calculated based on the structure by reconfiguring the functions of PEs. The functional simulation and field programmable gate array (FPGA) verification show that the proposed method obtains great flexibility with acceptable performance.
基金Project supported by the National Natural Science Foundation of China(Grant Nos.61925402 and 61851402)Science and Technology Commission of Shanghai Municipality,China(Grant No.19JC1416600)+1 种基金the National Key Research and Development Program of China(Grant No.2017YFB0405600)Shanghai Education Development Foundation and Shanghai Municipal Education Commission Shuguang Program,China(Grant No.18SG01).
文摘Facing the computing demands of Internet of things(IoT)and artificial intelligence(AI),the cost induced by moving the data between the central processing unit(CPU)and memory is the key problem and a chip featured with flexible structural unit,ultra-low power consumption,and huge parallelism will be needed.In-memory computing,a non-von Neumann architecture fusing memory units and computing units,can eliminate the data transfer time and energy consumption while performing massive parallel computations.Prototype in-memory computing schemes modified from different memory technologies have shown orders of magnitude improvement in computing efficiency,making it be regarded as the ultimate computing paradigm.Here we review the state-of-the-art memory device technologies potential for in-memory computing,summarize their versatile applications in neural network,stochastic generation,and hybrid precision digital computing,with promising solutions for unprecedented computing tasks,and also discuss the challenges of stability and integration for general in-memory computing.
基金Supported by the National Natural Science Foundation of China(No.61802304,61834005,61772417,61602377)the Shaanxi Province KeyR&D Plan(No.2021GY-029)。
文摘Deep learning algorithms have been widely used in computer vision,natural language processing and other fields.However,due to the ever-increasing scale of the deep learning model,the requirements for storage and computing performance are getting higher and higher,and the processors based on the von Neumann architecture have gradually exposed significant shortcomings such as consumption and long latency.In order to alleviate this problem,large-scale processing systems are shifting from a traditional computing-centric model to a data-centric model.A near-memory computing array architecture based on the shared buffer is proposed in this paper to improve system performance,which supports instructions with the characteristics of store-calculation integration,reducing the data movement between the processor and main memory.Through data reuse,the processing speed of the algorithm is further improved.The proposed architecture is verified and tested through the parallel realization of the convolutional neural network(CNN)algorithm.The experimental results show that at the frequency of 110 MHz,the calculation speed of a single convolution operation is increased by 66.64%on average compared with the CNN architecture that performs parallel calculations on field programmable gate array(FPGA).The processing speed of the whole convolution layer is improved by 8.81%compared with the reconfigurable array processor that does not support near-memory computing.