This paper presents an efficient algorithm that implements one to-many, or multicast, communication in one-port wormhole-routed cube-connected cycles (CCCs) in the absence of hardware multicast support. By exploiting...This paper presents an efficient algorithm that implements one to-many, or multicast, communication in one-port wormhole-routed cube-connected cycles (CCCs) in the absence of hardware multicast support. By exploiting the properties of the switching technology and the use of virtual channels, a minimumtime multicast algorithm is presented for n-dimensional CCCs that use deterministic routing of unicast messages. The algorithm can deliver a multicast message to m - 1 destinations in [log2, m] message-passing steps, while avoiding contention among the constituent unicast messages. Performance results of a simulation study on CCCs with up to 10,240 nodes are also given.展开更多
CCC has lower hardware complexity than hypercube and is suited for current VLSI technology LC-permutations are a large set of important permutations frequently used in various parallel computations. Existing routing a...CCC has lower hardware complexity than hypercube and is suited for current VLSI technology LC-permutations are a large set of important permutations frequently used in various parallel computations. Existing routing algorithms for CCC cannot realize LC-permutations without network conflict. Wepresent an algorithm to realize LC-permutations on CCC. The algorithm consists of two periods of inter-cycle transmissions and one period of inner-cycletransmissions. In the inter-cycle transmissions the dimensional links of CCCare used in a 'pipeline' manner and in the innercycle transmissions the datapackets are sorted by a part of its destination address. The algorithm is fast(O(log2 N)) and no conflict will occur.展开更多
Let A be m by n matrix, M and N be positive definite matrices of order in and n respectively. This paper presents an efficient method for computing (M-N) singular value decomposition((M-N) SVD) of A on a cube connecte...Let A be m by n matrix, M and N be positive definite matrices of order in and n respectively. This paper presents an efficient method for computing (M-N) singular value decomposition((M-N) SVD) of A on a cube connected single instruction stream-multiple data stream(SIMD) parallel computer. This method is based on a one-sided orthogonalization algorithm due to Hestenes. On the cube connected SIMD parallel computer with o(n) processors, the (M -- N) SVD of a matrix A requires a computation time of o(m3 log m/n).展开更多
基金The work of this paper is supported by the National Natural Science Foundation of China under grant ! No.69896250.
文摘This paper presents an efficient algorithm that implements one to-many, or multicast, communication in one-port wormhole-routed cube-connected cycles (CCCs) in the absence of hardware multicast support. By exploiting the properties of the switching technology and the use of virtual channels, a minimumtime multicast algorithm is presented for n-dimensional CCCs that use deterministic routing of unicast messages. The algorithm can deliver a multicast message to m - 1 destinations in [log2, m] message-passing steps, while avoiding contention among the constituent unicast messages. Performance results of a simulation study on CCCs with up to 10,240 nodes are also given.
文摘CCC has lower hardware complexity than hypercube and is suited for current VLSI technology LC-permutations are a large set of important permutations frequently used in various parallel computations. Existing routing algorithms for CCC cannot realize LC-permutations without network conflict. Wepresent an algorithm to realize LC-permutations on CCC. The algorithm consists of two periods of inter-cycle transmissions and one period of inner-cycletransmissions. In the inter-cycle transmissions the dimensional links of CCCare used in a 'pipeline' manner and in the innercycle transmissions the datapackets are sorted by a part of its destination address. The algorithm is fast(O(log2 N)) and no conflict will occur.
文摘Let A be m by n matrix, M and N be positive definite matrices of order in and n respectively. This paper presents an efficient method for computing (M-N) singular value decomposition((M-N) SVD) of A on a cube connected single instruction stream-multiple data stream(SIMD) parallel computer. This method is based on a one-sided orthogonalization algorithm due to Hestenes. On the cube connected SIMD parallel computer with o(n) processors, the (M -- N) SVD of a matrix A requires a computation time of o(m3 log m/n).