A hybrid method is presented for determining maximal eigenvalue and its eigenvector(called eigenpair)of a large,dense,symmetric matrix.Many problems require finding only a small part of the eigenpairs,and some require...A hybrid method is presented for determining maximal eigenvalue and its eigenvector(called eigenpair)of a large,dense,symmetric matrix.Many problems require finding only a small part of the eigenpairs,and some require only the maximal one.In a series of papers,efficient algorithms have been developed by Mufa Chen for computing the maximal eigenpairs of tridiagonal matrices with positive off-diagonal elements.The key idea is to explicitly construet effective initial guess of the maximal eigenpair and then to employ a self-closed iterative algorithm.In this paper we will extend Mufa Chen's algorithm to find maximal eigenpair for a large scale,dense,symmetric matrix.Our strategy is to first convert the underlying matrix into the tridiagonal form by using similarity transformations.We then handle the cases that prevent us from applying Chen's algorithm directly,e.g.,the cases with zero or negative super-or sub-diagonal elements.Serval numerical experiments are carried out to demonstrate the efficiency of the proposed hybrid method.展开更多
Given that the concurrent L1-minimization(L1-min)problem is often required in some real applications,we investigate how to solve it in parallel on GPUs in this paper.First,we propose a novel self-adaptive warp impleme...Given that the concurrent L1-minimization(L1-min)problem is often required in some real applications,we investigate how to solve it in parallel on GPUs in this paper.First,we propose a novel self-adaptive warp implementation of the matrix-vector multiplication(Ax)and a novel self-adaptive thread implementation of the matrix-vector multiplication(ATx),respectively,on the GPU.The vector-operation and inner-product decision trees are adopted to choose the optimal vector-operation and inner-product kernels for vectors of any size.Second,based on the above proposed kernels,the iterative shrinkage-thresholding algorithm is utilized to present two concurrent L1-min solvers from the perspective of the streams and the thread blocks on a GPU,and optimize their performance by using the new features of GPU such as the shuffle instruction and the read-only data cache.Finally,we design a concurrent L1-min solver on multiple GPUs.The experimental results have validated the high effectiveness and good performance of our proposed methods.展开更多
基金This work is partially supported by the Special Project on High-Performance Computing of the National Key R&D Program under No.2016YFB0200604the National Natural Science Foundation of China(NSFC)Grant No.11731006,and the NSFC/Hong Kong RRC Joint Research Scheme(NFSC/RGC 11961160718)The work of J.Yang is supported by NSFC-11871264 and Natural Science Foundation of Guangdong Province(2018A0303130123).
文摘A hybrid method is presented for determining maximal eigenvalue and its eigenvector(called eigenpair)of a large,dense,symmetric matrix.Many problems require finding only a small part of the eigenpairs,and some require only the maximal one.In a series of papers,efficient algorithms have been developed by Mufa Chen for computing the maximal eigenpairs of tridiagonal matrices with positive off-diagonal elements.The key idea is to explicitly construet effective initial guess of the maximal eigenpair and then to employ a self-closed iterative algorithm.In this paper we will extend Mufa Chen's algorithm to find maximal eigenpair for a large scale,dense,symmetric matrix.Our strategy is to first convert the underlying matrix into the tridiagonal form by using similarity transformations.We then handle the cases that prevent us from applying Chen's algorithm directly,e.g.,the cases with zero or negative super-or sub-diagonal elements.Serval numerical experiments are carried out to demonstrate the efficiency of the proposed hybrid method.
基金The research has been supported by the Natural Science Foundation of China under great number 61872422the Natural Science Foundation of Zhejiang Province,China under great number LY19F020028.
文摘Given that the concurrent L1-minimization(L1-min)problem is often required in some real applications,we investigate how to solve it in parallel on GPUs in this paper.First,we propose a novel self-adaptive warp implementation of the matrix-vector multiplication(Ax)and a novel self-adaptive thread implementation of the matrix-vector multiplication(ATx),respectively,on the GPU.The vector-operation and inner-product decision trees are adopted to choose the optimal vector-operation and inner-product kernels for vectors of any size.Second,based on the above proposed kernels,the iterative shrinkage-thresholding algorithm is utilized to present two concurrent L1-min solvers from the perspective of the streams and the thread blocks on a GPU,and optimize their performance by using the new features of GPU such as the shuffle instruction and the read-only data cache.Finally,we design a concurrent L1-min solver on multiple GPUs.The experimental results have validated the high effectiveness and good performance of our proposed methods.