Biomaterials with surface nanostructures effectively enhance protein secretion and stimulate tissue regeneration.When nanoparticles(NPs)enter the living system,they quickly interact with proteins in the body fluid,for...Biomaterials with surface nanostructures effectively enhance protein secretion and stimulate tissue regeneration.When nanoparticles(NPs)enter the living system,they quickly interact with proteins in the body fluid,forming the protein corona(PC).The accurate prediction of the PC composition is critical for analyzing the osteoinductivity of biomaterials and guiding the reverse design of NPs.However,achieving accurate predictions remains a significant challenge.Although several machine learning(ML)models like Random Forest(RF)have been used for PC prediction,they often fail to consider the extreme values in the abundance region of PC absorption and struggle to improve accuracy due to the imbalanced data distribution.In this study,resampling embedding was introduced to resolve the issue of imbalanced distribution in PC data.Various ML models were evaluated,and RF model was finally used for prediction,and good correlation coefficient(R^(2))and root-mean-square deviation(RMSE)values were obtained.Our ablation experiments demonstrated that the proposed method achieved an R^(2) of 0.68,indicating an improvement of approximately 10%,and an RMSE of 0.90,representing a reduction of approximately 10%.Furthermore,through the verification of label-free quantification of four NPs:hydroxyapatite(HA),titanium dioxide(TiO_(2)),silicon dioxide(SiO_(2))and silver(Ag),and we achieved a prediction performance with an R^(2) value>0.70 using Random Oversampling.Additionally,the feature analysis revealed that the composition of the PC is most significantly influenced by the incubation plasma concentration,PDI and surface modification.展开更多
Imbalanced data classification is one of the major problems in machine learning.This imbalanced dataset typically has significant differences in the number of data samples between its classes.In most cases,the perform...Imbalanced data classification is one of the major problems in machine learning.This imbalanced dataset typically has significant differences in the number of data samples between its classes.In most cases,the performance of the machine learning algorithm such as Support Vector Machine(SVM)is affected when dealing with an imbalanced dataset.The classification accuracy is mostly skewed toward the majority class and poor results are exhibited in the prediction of minority-class samples.In this paper,a hybrid approach combining data pre-processing technique andSVMalgorithm based on improved Simulated Annealing(SA)was proposed.Firstly,the data preprocessing technique which primarily aims at solving the resampling strategy of handling imbalanced datasets was proposed.In this technique,the data were first synthetically generated to equalize the number of samples between classes and followed by a reduction step to remove redundancy and duplicated data.Next is the training of a balanced dataset using SVM.Since this algorithm requires an iterative process to search for the best penalty parameter during training,an improved SA algorithm was proposed for this task.In this proposed improvement,a new acceptance criterion for the solution to be accepted in the SA algorithm was introduced to enhance the accuracy of the optimization process.Experimental works based on ten publicly available imbalanced datasets have demonstrated higher accuracy in the classification tasks using the proposed approach in comparison with the conventional implementation of SVM.Registering at an average of 89.65%of accuracy for the binary class classification has demonstrated the good performance of the proposed works.展开更多
Direction-dependence,or anisotropy,of spatial distribution patterns of vegetation is rarely explored due to neglect of this ecological phenomenon and the paucity of methods dealing with this issue.This paper proposes ...Direction-dependence,or anisotropy,of spatial distribution patterns of vegetation is rarely explored due to neglect of this ecological phenomenon and the paucity of methods dealing with this issue.This paper proposes a new approach to anisotropy analysis of spatial distribution patterns of plant populations on the basis of the data resam-pling technique(DRT)combined with Ripley’s L index.Using the ArcView Geographic Information System(GIS)platform,a case study was carried out by selecting the popula-tion of Pinus massoniana from a needle-and broad-leaved mixed forest community in the Heishiding Nature Reserve,Guangdong Province.Results showed that the spatial pattern of the P.massoniana population was typically anisotropic with different patterns in different directions.The DRT was found to be an effective approach to the anisotropy analysis of spatial patterns of plant populations.By employing resam-pling sub-datasets from the original dataset in different direc-tions,we could overcome the difficulty in the direct use of current non-angular methods of pattern analysis.展开更多
由于缺少简洁有效的分析方法,目前对植被空间格局各向异性特征研究的报道很少。该文提出基于数据重采样技术并结合R ip ley s L指数进行种群格局各向异性分析的新思路,并在ArcV iew G IS技术平台上,对广东省黑石顶自然保护区针阔叶混交...由于缺少简洁有效的分析方法,目前对植被空间格局各向异性特征研究的报道很少。该文提出基于数据重采样技术并结合R ip ley s L指数进行种群格局各向异性分析的新思路,并在ArcV iew G IS技术平台上,对广东省黑石顶自然保护区针阔叶混交林中的马尾松Pinus massoniana种群分布格局的各向异性特征进行实例研究。结果表明,马尾松种群分布格局具有典型的各向异性特征,在不同方向上表现不同的分布格局。实例研究表明,通过数据重采样技术在典型方向上的取样过程解决现有格局分析方法中缺少方向参数的问题,是进行种群格局各向异性分析的有效途径,具有一定的实用性。展开更多
基金sponsored by the National Key Research and Development Program of China(2021YFB3802100,2021YFB3802105)the Major Project of Sichuan Science and Technology Department(2022ZDZX0029)the Miaozi Project of Sichuan Science and Technology Department(2023JDRC0097)。
文摘Biomaterials with surface nanostructures effectively enhance protein secretion and stimulate tissue regeneration.When nanoparticles(NPs)enter the living system,they quickly interact with proteins in the body fluid,forming the protein corona(PC).The accurate prediction of the PC composition is critical for analyzing the osteoinductivity of biomaterials and guiding the reverse design of NPs.However,achieving accurate predictions remains a significant challenge.Although several machine learning(ML)models like Random Forest(RF)have been used for PC prediction,they often fail to consider the extreme values in the abundance region of PC absorption and struggle to improve accuracy due to the imbalanced data distribution.In this study,resampling embedding was introduced to resolve the issue of imbalanced distribution in PC data.Various ML models were evaluated,and RF model was finally used for prediction,and good correlation coefficient(R^(2))and root-mean-square deviation(RMSE)values were obtained.Our ablation experiments demonstrated that the proposed method achieved an R^(2) of 0.68,indicating an improvement of approximately 10%,and an RMSE of 0.90,representing a reduction of approximately 10%.Furthermore,through the verification of label-free quantification of four NPs:hydroxyapatite(HA),titanium dioxide(TiO_(2)),silicon dioxide(SiO_(2))and silver(Ag),and we achieved a prediction performance with an R^(2) value>0.70 using Random Oversampling.Additionally,the feature analysis revealed that the composition of the PC is most significantly influenced by the incubation plasma concentration,PDI and surface modification.
文摘Imbalanced data classification is one of the major problems in machine learning.This imbalanced dataset typically has significant differences in the number of data samples between its classes.In most cases,the performance of the machine learning algorithm such as Support Vector Machine(SVM)is affected when dealing with an imbalanced dataset.The classification accuracy is mostly skewed toward the majority class and poor results are exhibited in the prediction of minority-class samples.In this paper,a hybrid approach combining data pre-processing technique andSVMalgorithm based on improved Simulated Annealing(SA)was proposed.Firstly,the data preprocessing technique which primarily aims at solving the resampling strategy of handling imbalanced datasets was proposed.In this technique,the data were first synthetically generated to equalize the number of samples between classes and followed by a reduction step to remove redundancy and duplicated data.Next is the training of a balanced dataset using SVM.Since this algorithm requires an iterative process to search for the best penalty parameter during training,an improved SA algorithm was proposed for this task.In this proposed improvement,a new acceptance criterion for the solution to be accepted in the SA algorithm was introduced to enhance the accuracy of the optimization process.Experimental works based on ten publicly available imbalanced datasets have demonstrated higher accuracy in the classification tasks using the proposed approach in comparison with the conventional implementation of SVM.Registering at an average of 89.65%of accuracy for the binary class classification has demonstrated the good performance of the proposed works.
基金This paper was supported by the National Natural Science Foundation of China(Grant No.30370254).
文摘Direction-dependence,or anisotropy,of spatial distribution patterns of vegetation is rarely explored due to neglect of this ecological phenomenon and the paucity of methods dealing with this issue.This paper proposes a new approach to anisotropy analysis of spatial distribution patterns of plant populations on the basis of the data resam-pling technique(DRT)combined with Ripley’s L index.Using the ArcView Geographic Information System(GIS)platform,a case study was carried out by selecting the popula-tion of Pinus massoniana from a needle-and broad-leaved mixed forest community in the Heishiding Nature Reserve,Guangdong Province.Results showed that the spatial pattern of the P.massoniana population was typically anisotropic with different patterns in different directions.The DRT was found to be an effective approach to the anisotropy analysis of spatial patterns of plant populations.By employing resam-pling sub-datasets from the original dataset in different direc-tions,we could overcome the difficulty in the direct use of current non-angular methods of pattern analysis.
文摘由于缺少简洁有效的分析方法,目前对植被空间格局各向异性特征研究的报道很少。该文提出基于数据重采样技术并结合R ip ley s L指数进行种群格局各向异性分析的新思路,并在ArcV iew G IS技术平台上,对广东省黑石顶自然保护区针阔叶混交林中的马尾松Pinus massoniana种群分布格局的各向异性特征进行实例研究。结果表明,马尾松种群分布格局具有典型的各向异性特征,在不同方向上表现不同的分布格局。实例研究表明,通过数据重采样技术在典型方向上的取样过程解决现有格局分析方法中缺少方向参数的问题,是进行种群格局各向异性分析的有效途径,具有一定的实用性。