稀疏矩阵向量乘(Sparse M atrix-VectorMu ltip ly,SMVM),形如Ab=x,在科学计算、信息检索、数据挖掘等领域中都是重要的计算核心之一。在基于FPGA实现的SMVM系统中,其底层基本处理单元(Processing E lem ent,PE)的主要功能,是对单精度...稀疏矩阵向量乘(Sparse M atrix-VectorMu ltip ly,SMVM),形如Ab=x,在科学计算、信息检索、数据挖掘等领域中都是重要的计算核心之一。在基于FPGA实现的SMVM系统中,其底层基本处理单元(Processing E lem ent,PE)的主要功能,是对单精度浮点输入进行乘累加运算。本文针对SMVM算法的特点,提出浮点乘累加PE的设计方案,并在V irtex4LX60上加以实现,工作频率达到123.6MHz。展开更多
Cross-correlation (CC) is the most time-consuming in the implementation of image matching algorithms based on the correlation method. Therefore, how to calculate CC fast is crucial to real-time image matching. This ...Cross-correlation (CC) is the most time-consuming in the implementation of image matching algorithms based on the correlation method. Therefore, how to calculate CC fast is crucial to real-time image matching. This work reveals that the single cascading multiply-accumulate (CAMAC) and concurrent multiply-accumulate (COMAC) architectures which have been widely used in the past, actually, do not necessarily bring about a satisfactory time performance for CC. To obtain better time performance and higher resource efficiency, this paper proposes a configurable circuit involving the advantages of CAMAC and COMAC for a large amount of multiply-accumulate (MAC) operations of CC in exhaustive search. The proposed circuit works in an array manner and can better adapt to changing size image matching in real-time processing. Experimental results demonstrate that this novel circuit which involves the two structures can complete vast MAC calculations at a very high speed. Compared with existing related work, it improves the computation density further and is more flexible to use.展开更多
An efficient design method for a 24 × 24 bit +48 bit parallel saturating multiply-accumulate (MAC) unit is described. The augend in the MAC is merged as a partial product into Wallace tree array. The optimized...An efficient design method for a 24 × 24 bit +48 bit parallel saturating multiply-accumulate (MAC) unit is described. The augend in the MAC is merged as a partial product into Wallace tree array. The optimized saturation detection logic is proposed. The 679. 2 μm × 132. 5μm area size has been achieved in 0. 18 μm 1.8 V 1P6M CMOS technology by the full-custom circuit layout design. The simulation results show that the design way has significantly less area (about 23.52% reduction) and less delay than those of the common saturating MAC based on standard cell library.展开更多
文摘稀疏矩阵向量乘(Sparse M atrix-VectorMu ltip ly,SMVM),形如Ab=x,在科学计算、信息检索、数据挖掘等领域中都是重要的计算核心之一。在基于FPGA实现的SMVM系统中,其底层基本处理单元(Processing E lem ent,PE)的主要功能,是对单精度浮点输入进行乘累加运算。本文针对SMVM算法的特点,提出浮点乘累加PE的设计方案,并在V irtex4LX60上加以实现,工作频率达到123.6MHz。
文摘Cross-correlation (CC) is the most time-consuming in the implementation of image matching algorithms based on the correlation method. Therefore, how to calculate CC fast is crucial to real-time image matching. This work reveals that the single cascading multiply-accumulate (CAMAC) and concurrent multiply-accumulate (COMAC) architectures which have been widely used in the past, actually, do not necessarily bring about a satisfactory time performance for CC. To obtain better time performance and higher resource efficiency, this paper proposes a configurable circuit involving the advantages of CAMAC and COMAC for a large amount of multiply-accumulate (MAC) operations of CC in exhaustive search. The proposed circuit works in an array manner and can better adapt to changing size image matching in real-time processing. Experimental results demonstrate that this novel circuit which involves the two structures can complete vast MAC calculations at a very high speed. Compared with existing related work, it improves the computation density further and is more flexible to use.
基金The National Natural Science Foundation of China(No.90407009),the National High Technology Research and Develop-ment Program of China(863Program) (No.2003AA1Z1340)
文摘An efficient design method for a 24 × 24 bit +48 bit parallel saturating multiply-accumulate (MAC) unit is described. The augend in the MAC is merged as a partial product into Wallace tree array. The optimized saturation detection logic is proposed. The 679. 2 μm × 132. 5μm area size has been achieved in 0. 18 μm 1.8 V 1P6M CMOS technology by the full-custom circuit layout design. The simulation results show that the design way has significantly less area (about 23.52% reduction) and less delay than those of the common saturating MAC based on standard cell library.