针对多类不平衡数据分类准确率低的问题,提出一种基于空间扩展的支持向量机学习算法(support vector machine algorithm based on space spreading,SS-SVM)。根据空间扩展原理,在多维欧式空间中通过空间扩展对少类数据进行上采样,使其...针对多类不平衡数据分类准确率低的问题,提出一种基于空间扩展的支持向量机学习算法(support vector machine algorithm based on space spreading,SS-SVM)。根据空间扩展原理,在多维欧式空间中通过空间扩展对少类数据进行上采样,使其处理数据时减少小区块的影响;降低数据不平衡度以优化分类器组;在扩展的数据集上训练SVM分类器。标准数据集上的实验结果表明,与几种经典的算法相比,SS-SVM在多类不平衡数据分类上可获得令人满意的分类结果,对少类数据分类精度要求较高的问题尤为有效。展开更多
The rapid advancement and broad application of machine learning(ML)have driven a groundbreaking revolution in computational biology.One of the most cutting-edge and important applications of ML is its integration with...The rapid advancement and broad application of machine learning(ML)have driven a groundbreaking revolution in computational biology.One of the most cutting-edge and important applications of ML is its integration with molecular simulations to improve the sampling efficiency of the vast conformational space of large biomolecules.This review focuses on recent studies that utilize ML-based techniques in the exploration of protein conformational landscape.We first highlight the recent development of ML-aided enhanced sampling methods,including heuristic algorithms and neural networks that are designed to refine the selection of reaction coordinates for the construction of bias potential,or facilitate the exploration of the unsampled region of the energy landscape.Further,we review the development of autoencoder based methods that combine molecular simulations and deep learning to expand the search for protein conformations.Lastly,we discuss the cutting-edge methodologies for the one-shot generation of protein conformations with precise Boltzmann weights.Collectively,this review demonstrates the promising potential of machine learning in revolutionizing our insight into the complex conformational ensembles of proteins.展开更多
Sampling design(SD) plays a crucial role in providing reliable input for digital soil mapping(DSM) and increasing its efficiency.Sampling design, with a predetermined sample size and consideration of budget and spatia...Sampling design(SD) plays a crucial role in providing reliable input for digital soil mapping(DSM) and increasing its efficiency.Sampling design, with a predetermined sample size and consideration of budget and spatial variability, is a selection procedure for identifying a set of sample locations spread over a geographical space or with a good feature space coverage. A good feature space coverage ensures accurate estimation of regression parameters, while spatial coverage contributes to effective spatial interpolation.First, we review several statistical and geometric SDs that mainly optimize the sampling pattern in a geographical space and illustrate the strengths and weaknesses of these SDs by considering spatial coverage, simplicity, accuracy, and efficiency. Furthermore, Latin hypercube sampling, which obtains a full representation of multivariate distribution in geographical space, is described in detail for its development, improvement, and application. In addition, we discuss the fuzzy k-means sampling, response surface sampling, and Kennard-Stone sampling, which optimize sampling patterns in a feature space. We then discuss some practical applications that are mainly addressed by the conditioned Latin hypercube sampling with the flexibility and feasibility of adding multiple optimization criteria. We also discuss different methods of validation, an important stage of DSM, and conclude that an independent dataset selected from the probability sampling is superior for its free model assumptions. For future work, we recommend: 1) exploring SDs with both good spatial coverage and feature space coverage; 2) uncovering the real impacts of an SD on the integral DSM procedure;and 3) testing the feasibility and contribution of SDs in three-dimensional(3 D) DSM with variability for multiple layers.展开更多
The sampling process is very inefficient for sam-pling-based motion planning algorithms that excess random sam-ples are generated in the planning space.In this paper,we pro-pose an adaptive space expansion(ASE)approac...The sampling process is very inefficient for sam-pling-based motion planning algorithms that excess random sam-ples are generated in the planning space.In this paper,we pro-pose an adaptive space expansion(ASE)approach which belongs to the informed sampling category to improve the sampling effi-ciency for quickly finding a feasible path.The ASE method enlarges the search space gradually and restrains the sampling process in a sequence of small hyper-ellipsoid ring subsets to avoid exploring the unnecessary space.Specifically,for a con-structed small hyper-ellipsoid ring subset,if the algorithm cannot find a feasible path in it,then the subset is expanded.Thus,the ASE method successively does space exploring and space expan-sion until the final path has been found.Besides,we present a particular construction method of the hyper-ellipsoid ring that uniform random samples can be directly generated in it.At last,we present a feasible motion planner BiASE and an asymptoti-cally optimal motion planner BiASE*using the bidirectional exploring method and the ASE strategy.Simulations demon-strate that the computation speed is much faster than that of the state-of-the-art algorithms.The source codes are available at https://github.com/shshlei/ompl.展开更多
A general A-P iterative algorithm in a shift-invariant space is presented. We use the algorithm to show reconstruction of signals from weighted samples and also show that the general improved algorithm has better conv...A general A-P iterative algorithm in a shift-invariant space is presented. We use the algorithm to show reconstruction of signals from weighted samples and also show that the general improved algorithm has better convergence rate than the existing one. An explicit estimate for a guaranteed rate of convergence is given.展开更多
文摘针对多类不平衡数据分类准确率低的问题,提出一种基于空间扩展的支持向量机学习算法(support vector machine algorithm based on space spreading,SS-SVM)。根据空间扩展原理,在多维欧式空间中通过空间扩展对少类数据进行上采样,使其处理数据时减少小区块的影响;降低数据不平衡度以优化分类器组;在扩展的数据集上训练SVM分类器。标准数据集上的实验结果表明,与几种经典的算法相比,SS-SVM在多类不平衡数据分类上可获得令人满意的分类结果,对少类数据分类精度要求较高的问题尤为有效。
基金Project supported by the National Key Research and Development Program of China(Grant No.2023YFF1204402)the National Natural Science Foundation of China(Grant Nos.12074079 and 12374208)+1 种基金the Natural Science Foundation of Shanghai(Grant No.22ZR1406800)the China Postdoctoral Science Foundation(Grant No.2022M720815).
文摘The rapid advancement and broad application of machine learning(ML)have driven a groundbreaking revolution in computational biology.One of the most cutting-edge and important applications of ML is its integration with molecular simulations to improve the sampling efficiency of the vast conformational space of large biomolecules.This review focuses on recent studies that utilize ML-based techniques in the exploration of protein conformational landscape.We first highlight the recent development of ML-aided enhanced sampling methods,including heuristic algorithms and neural networks that are designed to refine the selection of reaction coordinates for the construction of bias potential,or facilitate the exploration of the unsampled region of the energy landscape.Further,we review the development of autoencoder based methods that combine molecular simulations and deep learning to expand the search for protein conformations.Lastly,we discuss the cutting-edge methodologies for the one-shot generation of protein conformations with precise Boltzmann weights.Collectively,this review demonstrates the promising potential of machine learning in revolutionizing our insight into the complex conformational ensembles of proteins.
基金funded by the Natural Science and Engineering Research Council (NSERC) of Canada (No. RGPIN-2014-04100)
文摘Sampling design(SD) plays a crucial role in providing reliable input for digital soil mapping(DSM) and increasing its efficiency.Sampling design, with a predetermined sample size and consideration of budget and spatial variability, is a selection procedure for identifying a set of sample locations spread over a geographical space or with a good feature space coverage. A good feature space coverage ensures accurate estimation of regression parameters, while spatial coverage contributes to effective spatial interpolation.First, we review several statistical and geometric SDs that mainly optimize the sampling pattern in a geographical space and illustrate the strengths and weaknesses of these SDs by considering spatial coverage, simplicity, accuracy, and efficiency. Furthermore, Latin hypercube sampling, which obtains a full representation of multivariate distribution in geographical space, is described in detail for its development, improvement, and application. In addition, we discuss the fuzzy k-means sampling, response surface sampling, and Kennard-Stone sampling, which optimize sampling patterns in a feature space. We then discuss some practical applications that are mainly addressed by the conditioned Latin hypercube sampling with the flexibility and feasibility of adding multiple optimization criteria. We also discuss different methods of validation, an important stage of DSM, and conclude that an independent dataset selected from the probability sampling is superior for its free model assumptions. For future work, we recommend: 1) exploring SDs with both good spatial coverage and feature space coverage; 2) uncovering the real impacts of an SD on the integral DSM procedure;and 3) testing the feasibility and contribution of SDs in three-dimensional(3 D) DSM with variability for multiple layers.
基金supported in part by the National Natural Science Foun-dation of China(51975236)the National Key Research and Development Program of China(2018YFA0703203)the Innovation Project of Optics Valley Laboratory(OVL2021BG007)。
文摘The sampling process is very inefficient for sam-pling-based motion planning algorithms that excess random sam-ples are generated in the planning space.In this paper,we pro-pose an adaptive space expansion(ASE)approach which belongs to the informed sampling category to improve the sampling effi-ciency for quickly finding a feasible path.The ASE method enlarges the search space gradually and restrains the sampling process in a sequence of small hyper-ellipsoid ring subsets to avoid exploring the unnecessary space.Specifically,for a con-structed small hyper-ellipsoid ring subset,if the algorithm cannot find a feasible path in it,then the subset is expanded.Thus,the ASE method successively does space exploring and space expan-sion until the final path has been found.Besides,we present a particular construction method of the hyper-ellipsoid ring that uniform random samples can be directly generated in it.At last,we present a feasible motion planner BiASE and an asymptoti-cally optimal motion planner BiASE*using the bidirectional exploring method and the ASE strategy.Simulations demon-strate that the computation speed is much faster than that of the state-of-the-art algorithms.The source codes are available at https://github.com/shshlei/ompl.
基金This work is supported in part by the National Natural Science Foundation of China (10771190, 10801136), the Mathematical Tianyuan Foundation of China NSF (10526036), China Postdoctoral Science Foundation (20060391063), Natural Science Foundation of Guangdong Province (07300434)
文摘A general A-P iterative algorithm in a shift-invariant space is presented. We use the algorithm to show reconstruction of signals from weighted samples and also show that the general improved algorithm has better convergence rate than the existing one. An explicit estimate for a guaranteed rate of convergence is given.