With the rapid development of deep learning algorithms,the computational complexity and functional diversity are increasing rapidly.However,the gap between high computational density and insufficient memory bandwidth ...With the rapid development of deep learning algorithms,the computational complexity and functional diversity are increasing rapidly.However,the gap between high computational density and insufficient memory bandwidth under the traditional von Neumann architecture is getting worse.Analyzing the algorithmic characteristics of convolutional neural network(CNN),it is found that the access characteristics of convolution(CONV)and fully connected(FC)operations are very different.Based on this feature,a dual-mode reronfigurable distributed memory architecture for CNN accelerator is designed.It can be configured in Bank mode or first input first output(FIFO)mode to accommodate the access needs of different operations.At the same time,a programmable memory control unit is designed,which can effectively control the dual-mode configurable distributed memory architecture by using customized special accessing instructions and reduce the data accessing delay.The proposed architecture is verified and tested by parallel implementation of some CNN algorithms.The experimental results show that the peak bandwidth can reach 13.44 GB·s^(-1)at an operating frequency of 120 MHz.This work can achieve 1.40,1.12,2.80 and 4.70 times the peak bandwidth compared with the existing work.展开更多
Based on the character of the modular self-reconfigurable (MSR) robot, a novel homogeneous and lattice MSR robot, M-Cubes, was designed. Each module unit of the robot has 12 freedoms and is composed of six rotary jo...Based on the character of the modular self-reconfigurable (MSR) robot, a novel homogeneous and lattice MSR robot, M-Cubes, was designed. Each module unit of the robot has 12 freedoms and is composed of six rotary joints and one cubic link. An attached/detached mechanism was designed on the rotary joints. A novel space transmitting system was placed on the inner portion of the cubic link. A motor separately transmitted torque to the six joints which were distributed equally on six surfaces of the cubic link. The example of a basic motion for the module was demonstrated. The result shows that the robot is concise and compact in structure, highly efficient in transmission, credible in connecting, and simple in controlling. At the same time, a simulator is developed to graphically design the system configuration, the reconfiguration process and the motion of cluster modules. The character of local action for the cellular automata (CA) is utilized. Each module is simplified as a cell. The transition rules of the CA are developed to combine with the genetic algorithm (GA) and applied to each module to accomplish distributed control. Simulation proves that the method is effective and feasible.展开更多
基金Supported by the National Key R&D Program of China(No.2022ZD0119001)the National Natural Science Foundation of China(No.61834005,61802304)+1 种基金the Education Department of Shaanxi Province(No.22JY060)the Shaanxi Provincial Key Research and Devel-opment Plan(No.2024GX-YBXM-100)。
文摘With the rapid development of deep learning algorithms,the computational complexity and functional diversity are increasing rapidly.However,the gap between high computational density and insufficient memory bandwidth under the traditional von Neumann architecture is getting worse.Analyzing the algorithmic characteristics of convolutional neural network(CNN),it is found that the access characteristics of convolution(CONV)and fully connected(FC)operations are very different.Based on this feature,a dual-mode reronfigurable distributed memory architecture for CNN accelerator is designed.It can be configured in Bank mode or first input first output(FIFO)mode to accommodate the access needs of different operations.At the same time,a programmable memory control unit is designed,which can effectively control the dual-mode configurable distributed memory architecture by using customized special accessing instructions and reduce the data accessing delay.The proposed architecture is verified and tested by parallel implementation of some CNN algorithms.The experimental results show that the peak bandwidth can reach 13.44 GB·s^(-1)at an operating frequency of 120 MHz.This work can achieve 1.40,1.12,2.80 and 4.70 times the peak bandwidth compared with the existing work.
文摘Based on the character of the modular self-reconfigurable (MSR) robot, a novel homogeneous and lattice MSR robot, M-Cubes, was designed. Each module unit of the robot has 12 freedoms and is composed of six rotary joints and one cubic link. An attached/detached mechanism was designed on the rotary joints. A novel space transmitting system was placed on the inner portion of the cubic link. A motor separately transmitted torque to the six joints which were distributed equally on six surfaces of the cubic link. The example of a basic motion for the module was demonstrated. The result shows that the robot is concise and compact in structure, highly efficient in transmission, credible in connecting, and simple in controlling. At the same time, a simulator is developed to graphically design the system configuration, the reconfiguration process and the motion of cluster modules. The character of local action for the cellular automata (CA) is utilized. Each module is simplified as a cell. The transition rules of the CA are developed to combine with the genetic algorithm (GA) and applied to each module to accomplish distributed control. Simulation proves that the method is effective and feasible.