过采样和欠采样方法是处理非平衡数据集分类的常用方法,但使用单一的采样算法可能造成少数类样本过拟合或者丢失含有重要信息的样本。提出了基于分类超平面的混合采样算法SVM_HS(hybrid sampling algorithm based on support vector mac...过采样和欠采样方法是处理非平衡数据集分类的常用方法,但使用单一的采样算法可能造成少数类样本过拟合或者丢失含有重要信息的样本。提出了基于分类超平面的混合采样算法SVM_HS(hybrid sampling algorithm based on support vector machine),旨在克服SVM算法在处理非平衡数据时分类超平面容易偏向少数类样本的问题。该算法首先利用SVM算法得到分类超平面。然后迭代进行混合采样,主要包括:(1)删除离分类超平面较远的一些多数类样本;(2)对靠近真实类边界的少数类样本用SMOTE(synthetic minority oversampling technique)过采样,使分类超平面向着真实类边界方向偏移。实验结果表明相比其他相关算法,该算法的F-value值和G-mean值均有较大提高。展开更多
随机森林算法是根据Bagging抽样和随机特征子集划分策略,由多棵决策树组成的集成算法。与其他分类算法相比,随机森林算法有更高的分类精度、更低的泛化误差以及训练速度快等特点,因此在数据挖掘领域得到了多方面的应用。然而随机森林算...随机森林算法是根据Bagging抽样和随机特征子集划分策略,由多棵决策树组成的集成算法。与其他分类算法相比,随机森林算法有更高的分类精度、更低的泛化误差以及训练速度快等特点,因此在数据挖掘领域得到了多方面的应用。然而随机森林算法在分类预测特征维度高且不平衡的数据时,分类性能受到了极大限制。为了更好地处理高维不平衡数据,文中提出了一种基于混合采样和特征选择的改进随机森林算法(Hybrid Samping&Feature Selection Random Forest,HF_RF)。该算法首先从数据层面出发,通过SMOTE算法和随机欠采样相结合的方式对高维不平衡数据集进行预处理,同时引入聚类算法对SMOTE算法进行改进,提高对负类样本的处理性能;然后从算法层面出发,通过ReliefF算法对平衡后的高维数据赋予不同的权值,剔除不相关和冗余特征,对高维数据进行维度约简;最后采用加权投票原则进一步提高算法的分类性能。实验结果显示,改进后的算法与原算法相比,在处理高维不平衡数据方面的各评价指标更高,证明HF_RF算法对于高维不平衡数据的分类性能高于传统随机森林算法。展开更多
This paper focuses on the use of models for increasing the precision of estimators in large-area forest surveys. It is motivated by the increasing availability of remotely sensed data, which facilitates the developmen...This paper focuses on the use of models for increasing the precision of estimators in large-area forest surveys. It is motivated by the increasing availability of remotely sensed data, which facilitates the development of models predicting the variables of interest in forest surveys. We present, review and compare three different estimation frameworks where models play a core role: model-assisted, model-based, and hybrid estimation. The first two are well known, whereas the third has only recently been introduced in forest surveys. Hybrid inference mixes design- based and model-based inference, since it relies on a probability sample of auxiliary data and a model predicting the target variable from the auxiliary data.We review studies on large-area forest surveys based on model-assisted, model- based, and hybrid estimation, and discuss advantages and disadvantages of the approaches. We conclude that no general recommendations can be made about whether model-assisted, model-based, or hybrid estimation should be preferred. The choice depends on the objective of the survey and the possibilities to acquire appropriate field and remotely sensed data. We also conclude that modelling approaches can only be successfully applied for estimating target variables such as growing stock volume or biomass, which are adequately related to commonly available remotely sensed data, and thus purely field based surveys remain important for several important forest parameters.展开更多
随着现有工程问题高非线性、高计算复杂度、高维度等特征的凸显和对低成本高保真度仿真模型的要求,基于多学科耦合的工程结构多目标优化设计求解难度显著提高,且计算量大,这一问题引起了广泛的研究。针对这一挑战,本文提出了一种基于混...随着现有工程问题高非线性、高计算复杂度、高维度等特征的凸显和对低成本高保真度仿真模型的要求,基于多学科耦合的工程结构多目标优化设计求解难度显著提高,且计算量大,这一问题引起了广泛的研究。针对这一挑战,本文提出了一种基于混合指标自适应采样代理模型实现工程结构多目标优化设计的方法。为降低优化设计成本,综合考虑了优化设计空间的全局探索与局部开发特征,提出了一种基于Voronoi区域划分的混合指标自适应采样方法,用于全局代理模型构建,经与不同案例及方法对比测试,在保证精度的前提下显著降低了样本数量;为实现工程结构多目标优化问题的求解,提出了一种基于优势面旋转投影和区域划分新型拥挤度算子的多目标优化设计NSGA-Ⅱ-RD(Improvednon-dominatedsortinggenetic algorithmⅡbased on a rotation and density operator,NSGA-Ⅱ-RD)算法,经与不同算法对比测试,该方法求解收敛速度更快且计算结果准确。最后,将提出的混合指标采样代理模型构建方法与NSGA-Ⅱ-RD算法结合,在绝缘栅双极晶体管母排的结构设计上进行应用,针对母排的质量、电路压降与疲劳损伤进行多目标优化设计。结果表明,该方法不仅保证了母排的轻量化与良好导电性能,还使其具备了更好的抗超声焊接疲劳性能。同时,验证了该方法在保证低成本与高精度仿真模型的前提下,能够有效解决实际工程中的多目标优化设计问题。展开更多
文摘过采样和欠采样方法是处理非平衡数据集分类的常用方法,但使用单一的采样算法可能造成少数类样本过拟合或者丢失含有重要信息的样本。提出了基于分类超平面的混合采样算法SVM_HS(hybrid sampling algorithm based on support vector machine),旨在克服SVM算法在处理非平衡数据时分类超平面容易偏向少数类样本的问题。该算法首先利用SVM算法得到分类超平面。然后迭代进行混合采样,主要包括:(1)删除离分类超平面较远的一些多数类样本;(2)对靠近真实类边界的少数类样本用SMOTE(synthetic minority oversampling technique)过采样,使分类超平面向着真实类边界方向偏移。实验结果表明相比其他相关算法,该算法的F-value值和G-mean值均有较大提高。
文摘随机森林算法是根据Bagging抽样和随机特征子集划分策略,由多棵决策树组成的集成算法。与其他分类算法相比,随机森林算法有更高的分类精度、更低的泛化误差以及训练速度快等特点,因此在数据挖掘领域得到了多方面的应用。然而随机森林算法在分类预测特征维度高且不平衡的数据时,分类性能受到了极大限制。为了更好地处理高维不平衡数据,文中提出了一种基于混合采样和特征选择的改进随机森林算法(Hybrid Samping&Feature Selection Random Forest,HF_RF)。该算法首先从数据层面出发,通过SMOTE算法和随机欠采样相结合的方式对高维不平衡数据集进行预处理,同时引入聚类算法对SMOTE算法进行改进,提高对负类样本的处理性能;然后从算法层面出发,通过ReliefF算法对平衡后的高维数据赋予不同的权值,剔除不相关和冗余特征,对高维数据进行维度约简;最后采用加权投票原则进一步提高算法的分类性能。实验结果显示,改进后的算法与原算法相比,在处理高维不平衡数据方面的各评价指标更高,证明HF_RF算法对于高维不平衡数据的分类性能高于传统随机森林算法。
文摘This paper focuses on the use of models for increasing the precision of estimators in large-area forest surveys. It is motivated by the increasing availability of remotely sensed data, which facilitates the development of models predicting the variables of interest in forest surveys. We present, review and compare three different estimation frameworks where models play a core role: model-assisted, model-based, and hybrid estimation. The first two are well known, whereas the third has only recently been introduced in forest surveys. Hybrid inference mixes design- based and model-based inference, since it relies on a probability sample of auxiliary data and a model predicting the target variable from the auxiliary data.We review studies on large-area forest surveys based on model-assisted, model- based, and hybrid estimation, and discuss advantages and disadvantages of the approaches. We conclude that no general recommendations can be made about whether model-assisted, model-based, or hybrid estimation should be preferred. The choice depends on the objective of the survey and the possibilities to acquire appropriate field and remotely sensed data. We also conclude that modelling approaches can only be successfully applied for estimating target variables such as growing stock volume or biomass, which are adequately related to commonly available remotely sensed data, and thus purely field based surveys remain important for several important forest parameters.
文摘随着现有工程问题高非线性、高计算复杂度、高维度等特征的凸显和对低成本高保真度仿真模型的要求,基于多学科耦合的工程结构多目标优化设计求解难度显著提高,且计算量大,这一问题引起了广泛的研究。针对这一挑战,本文提出了一种基于混合指标自适应采样代理模型实现工程结构多目标优化设计的方法。为降低优化设计成本,综合考虑了优化设计空间的全局探索与局部开发特征,提出了一种基于Voronoi区域划分的混合指标自适应采样方法,用于全局代理模型构建,经与不同案例及方法对比测试,在保证精度的前提下显著降低了样本数量;为实现工程结构多目标优化问题的求解,提出了一种基于优势面旋转投影和区域划分新型拥挤度算子的多目标优化设计NSGA-Ⅱ-RD(Improvednon-dominatedsortinggenetic algorithmⅡbased on a rotation and density operator,NSGA-Ⅱ-RD)算法,经与不同算法对比测试,该方法求解收敛速度更快且计算结果准确。最后,将提出的混合指标采样代理模型构建方法与NSGA-Ⅱ-RD算法结合,在绝缘栅双极晶体管母排的结构设计上进行应用,针对母排的质量、电路压降与疲劳损伤进行多目标优化设计。结果表明,该方法不仅保证了母排的轻量化与良好导电性能,还使其具备了更好的抗超声焊接疲劳性能。同时,验证了该方法在保证低成本与高精度仿真模型的前提下,能够有效解决实际工程中的多目标优化设计问题。