针对Lennard-Jones(LJ)团簇的结构优化问题,在前人工作的基础上,提出了一个新的无偏优化算法,即DLS-TPIO(dynamic lattice searching method with two-phase local searchand interior operation)算法.对LJ2-650,LJ660,LJ665-680这666...针对Lennard-Jones(LJ)团簇的结构优化问题,在前人工作的基础上,提出了一个新的无偏优化算法,即DLS-TPIO(dynamic lattice searching method with two-phase local searchand interior operation)算法.对LJ2-650,LJ660,LJ665-680这666个实例进行了优化计算.为其中每个实例所找到的构型其势能均达到了剑桥团簇数据库中公布的最好记录.对LJ533与LJ536这两个算例,所达到的势能则优于先前的最好记录.在DLS-TPIO算法中,采用了内部操作,两阶段局部搜索方法以及动态格点搜索方法.在优化的前一阶段,内部操作将若干能量较高的表面原子移入团簇的内部,从而降低团簇的能量,并使其构型逐渐地变为有序.与此同时,两阶段局部搜索方法指导搜索进入更有希望的构型区域.这种做法显著地提高了算法的成功率.在优化的后一阶段,借用动态格点搜索方法对团簇表面原子的位置作进一步优化,以再一次降低团簇的能量.另外,为识别二十面体构型的中心原子,本文给出了一个简单的新方法.相比于文献中一些著名的无偏优化算法,DLS-TPIO算法具有较高的计算速度与成功率.展开更多
In this paper we discuss the degeneracy in nonlinear programming with linear constraints, and give a technique for dealing with degeneracy in a general model of reduced gradient algorithms. Under the assumption that t...In this paper we discuss the degeneracy in nonlinear programming with linear constraints, and give a technique for dealing with degeneracy in a general model of reduced gradient algorithms. Under the assumption that the objective function is continuously differentiable, we prove that either the iterative sequence {xk} generated by the method terminates at a Kuhn-Tucker point after a finite number of iterations, or any cluster point of the sequence {xk} is a KuhnTucker point.展开更多
Unified Parallel C (UPC) is a parallel extension of ANSI C based on the Partitioned Global Address Space (PGAS) programming model, which provides a shared memory view that simplifies code development while it can ...Unified Parallel C (UPC) is a parallel extension of ANSI C based on the Partitioned Global Address Space (PGAS) programming model, which provides a shared memory view that simplifies code development while it can take advantage of the scalability of distributed memory architectures. Therefore, UPC allows programmers to write parallel applications on hybrid shared/distributed memory architectures, such as multi-core clusters, in a more productive way, accessing remote memory by means of different high-level language constructs, such as assignments to shared variables or collective primitives. However, the standard UPC collectives library includes a reduced set of eight basic primitives with quite limited functionality. This work presents the design and implementation of extended UPC collective functions that overcome the limitations of the standard collectives library, allowing, for example, the use of a specific source and destination thread or defining the amount of data transferred by each particular thread. This library fulfills the demands made by the UPC developers community and implements portable algorithms, independent of the specific UPC compiler/runtime being used. The use of a representative set of these extended collectives has been evaluated using two applications and four kernels as case studies. The results obtained confirm the suitability of the new library to provide easier programming without trading off performance, thus achieving high productivity in parallel programming to harness the performance of hybrid shared/distributed memory architectures in high performance computing.展开更多
文摘针对Lennard-Jones(LJ)团簇的结构优化问题,在前人工作的基础上,提出了一个新的无偏优化算法,即DLS-TPIO(dynamic lattice searching method with two-phase local searchand interior operation)算法.对LJ2-650,LJ660,LJ665-680这666个实例进行了优化计算.为其中每个实例所找到的构型其势能均达到了剑桥团簇数据库中公布的最好记录.对LJ533与LJ536这两个算例,所达到的势能则优于先前的最好记录.在DLS-TPIO算法中,采用了内部操作,两阶段局部搜索方法以及动态格点搜索方法.在优化的前一阶段,内部操作将若干能量较高的表面原子移入团簇的内部,从而降低团簇的能量,并使其构型逐渐地变为有序.与此同时,两阶段局部搜索方法指导搜索进入更有希望的构型区域.这种做法显著地提高了算法的成功率.在优化的后一阶段,借用动态格点搜索方法对团簇表面原子的位置作进一步优化,以再一次降低团簇的能量.另外,为识别二十面体构型的中心原子,本文给出了一个简单的新方法.相比于文献中一些著名的无偏优化算法,DLS-TPIO算法具有较高的计算速度与成功率.
文摘In this paper we discuss the degeneracy in nonlinear programming with linear constraints, and give a technique for dealing with degeneracy in a general model of reduced gradient algorithms. Under the assumption that the objective function is continuously differentiable, we prove that either the iterative sequence {xk} generated by the method terminates at a Kuhn-Tucker point after a finite number of iterations, or any cluster point of the sequence {xk} is a KuhnTucker point.
基金funded by Hewlett-Packard (Project "Improving UPC Usability and Performance in Constellation Systems:Implementation/Extensions of UPC Libraries")partially supported by the Ministry of Science and Innovation of Spain under Project No.TIN2010-16735the Galician Government (Consolidation of Competitive Research Groups,Xunta de Galicia ref.2010/6)
文摘Unified Parallel C (UPC) is a parallel extension of ANSI C based on the Partitioned Global Address Space (PGAS) programming model, which provides a shared memory view that simplifies code development while it can take advantage of the scalability of distributed memory architectures. Therefore, UPC allows programmers to write parallel applications on hybrid shared/distributed memory architectures, such as multi-core clusters, in a more productive way, accessing remote memory by means of different high-level language constructs, such as assignments to shared variables or collective primitives. However, the standard UPC collectives library includes a reduced set of eight basic primitives with quite limited functionality. This work presents the design and implementation of extended UPC collective functions that overcome the limitations of the standard collectives library, allowing, for example, the use of a specific source and destination thread or defining the amount of data transferred by each particular thread. This library fulfills the demands made by the UPC developers community and implements portable algorithms, independent of the specific UPC compiler/runtime being used. The use of a representative set of these extended collectives has been evaluated using two applications and four kernels as case studies. The results obtained confirm the suitability of the new library to provide easier programming without trading off performance, thus achieving high productivity in parallel programming to harness the performance of hybrid shared/distributed memory architectures in high performance computing.