This paper presents a fast hybrid fault location method for active distribution networks with distributed generation(DG)and microgrids.The method uses the voltage and current data from the measurement points at the ma...This paper presents a fast hybrid fault location method for active distribution networks with distributed generation(DG)and microgrids.The method uses the voltage and current data from the measurement points at the main substation,and the connection points of DG and microgrids.The data is used in a single feedforward artificial neural network(ANN)to estimate the distances to fault from all the measuring points.A k-nearest neighbors(KNN)classifier then interprets the ANN outputs and estimates a single fault location.Simulation results validate the accuracy of the fault location method under different fault conditions including fault types,fault points,and fault resistances.The performance is also validated for non-synchronized measurements and measurement errors.展开更多
Hepatitis B virus (HBV)-induced liver failure is an emergent liver disease leading to high mortality. The severity of liver failure may be reflected by the profile of some metabolites. This study assessed the potent...Hepatitis B virus (HBV)-induced liver failure is an emergent liver disease leading to high mortality. The severity of liver failure may be reflected by the profile of some metabolites. This study assessed the potential of using metabolites as biomarkers for liver failure by identifying metabolites with good discriminative performance for its phenotype. The serum samples from 24 HBV-indueed liver failure patients and 23 healthy volunteers were collected and analyzed by gas chromatography-mass spectrometry (GC-MS) to generate metabolite profiles. The 24 patients were further grouped into two classes according to the severity of liver failure. Twenty-five eommensal peaks in all metabolite profiles were extracted, and the relative area values of these peaks were used as features for each sample. Three algorithms, F-test, k-nearest neighbor (KNN) and fuzzy support vector machine (FSVM) combined with exhaustive search (ES), were employed to identify a subset of metabolites (biomarkers) that best predict liver failure. Based on the achieved experimental dataset, 93.62% predictive accuracy by 6 features was selected with FSVM-ES and three key metabolites, glyeerie acid, cis-aeonitie acid and citric acid, are identified as potential diagnostic biomarkers.展开更多
Accurate information about forest volumes is essential for forest management planning. The survey interval of the Forest Resource Inventory of China (FRIC) is too long to meet the demand for timely decision-making req...Accurate information about forest volumes is essential for forest management planning. The survey interval of the Forest Resource Inventory of China (FRIC) is too long to meet the demand for timely decision-making required for forest protection, management, and utilization. Analysis of satellite imagery provides good potential for more frequent reporting of forest parameters. In this study, we describe an application of the k-nearest neighbors (kNN) method to Landsat TM imagery for improving estimation of forest volumes. Several spectral features were tested and compared in forest volume estimations, including normalized difference vegetation index, environmental vegetation index, and the combination of the spectral features. The combined index resulted in the most accurate volume estimations. The kNN estimator and the combined index were then used in forest volume estimation. The estimation error (RMSE) of the total volume was44.2%, much lower than those for Larix forest (the RMSE was 51.7%) and those for the Korean pine and broadleaved forests (the estimation errors were over 71.7% and 88.19%,respectively). This preliminary study demonstrates the potential of forest volume estimations with remote sensing data to provide useful information for forest management if only limited ground information is available.展开更多
In this paper, a memetic algorithm with competition(MAC) is proposed to solve the capacitated green vehicle routing problem(CGVRP). Firstly, the permutation array called traveling salesman problem(TSP) route is used t...In this paper, a memetic algorithm with competition(MAC) is proposed to solve the capacitated green vehicle routing problem(CGVRP). Firstly, the permutation array called traveling salesman problem(TSP) route is used to encode the solution, and an effective decoding method to construct the CGVRP route is presented accordingly. Secondly, the k-nearest neighbor(k NN) based initialization is presented to take use of the location information of the customers. Thirdly, according to the characteristics of the CGVRP, the search operators in the variable neighborhood search(VNS) framework and the simulated annealing(SA) strategy are executed on the TSP route for all solutions. Moreover, the customer adjustment operator and the alternative fuel station(AFS) adjustment operator on the CGVRP route are executed for the elite solutions after competition. In addition, the crossover operator is employed to share information among different solutions. The effect of parameter setting is investigated using the Taguchi method of design-ofexperiment to suggest suitable values. Via numerical tests, it demonstrates the effectiveness of both the competitive search and the decoding method. Moreover, extensive comparative results show that the proposed algorithm is more effective and efficient than the existing methods in solving the CGVRP.展开更多
Short-term traffic flow is one of the core technologies to realize traffic flow guidance. In this article, in view of the characteristics that the traffic flow changes repeatedly, a short-term traffic flow forecasting...Short-term traffic flow is one of the core technologies to realize traffic flow guidance. In this article, in view of the characteristics that the traffic flow changes repeatedly, a short-term traffic flow forecasting method based on a three-layer K-nearest neighbor non-parametric regression algorithm is proposed. Specifically, two screening layers based on shape similarity were introduced in K-nearest neighbor non-parametric regression method, and the forecasting results were output using the weighted averaging on the reciprocal values of the shape similarity distances and the most-similar-point distance adjustment method. According to the experimental results, the proposed algorithm has improved the predictive ability of the traditional K-nearest neighbor non-parametric regression method, and greatly enhanced the accuracy and real-time performance of short-term traffic flow forecasting.展开更多
Recent development of wireless communication technologies and the popularity of smart phones .are making location-based services (LBS) popular. However, requesting queries to LBS servers with users' exact locations...Recent development of wireless communication technologies and the popularity of smart phones .are making location-based services (LBS) popular. However, requesting queries to LBS servers with users' exact locations may threat the privacy of users. Therefore, there have been many researches on generating a cloaked query region for user privacy protection. Consequently, an efficient query processing algorithm for a query region is required. So, in this paper, we propose k-nearest neighbor query (k-NN) processing algorithms for a query region in road networks. To efficiently retrieve k-NN points of interest (POIs), we make use of the Island index. We also propose a method that generates an adaptive Island index to improve the query processing performance and storage usage. Finally, we show by our performance analysis that our k-NN query processing algorithms outperform the existing k-Range Nearest Neighbor (kRNN) algorithm in terms of network expansion cost and query processing time.展开更多
This paper proposes a novel grading method of apples,in an automated grading device that uses convolutional neural networks to extract the size,color,texture,and roundness of an apple.The developed machine learning me...This paper proposes a novel grading method of apples,in an automated grading device that uses convolutional neural networks to extract the size,color,texture,and roundness of an apple.The developed machine learning method uses the ability of learning representative features by means of a convolutional neural network(CNN),to determine suitable features of apples for the grading process.This information is fed into a one-to-one classifier that uses a support vector machine(SVM),instead of the softmax output layer of the CNN.In this manner,Yantai apples with similar shapes and low discrimination are graded using four different approaches.The fusion model using both CNN and SVM classifiers is much more accurate than the simple k-nearest neighbor(KNN),SVM,and CNN model when used separately for grading,and the learning ability and the generalization ability of the model is correspondingly increased by the combined method.Grading tests are carried out using the automated grading device that is developed in the present work.It is verified that the actual effect of apple grading using the combined CNN-SVM model is fast and accurate,which greatly reduces the manpower and labor costs of manual grading,and has important commercial prospects.展开更多
Early stroke prediction is vital to prevent damage. A stroke happens when the blood flow to the brain is disrupted by a clot or bleeding, resulting in brain death or injury. However, early diagnosis and treatment redu...Early stroke prediction is vital to prevent damage. A stroke happens when the blood flow to the brain is disrupted by a clot or bleeding, resulting in brain death or injury. However, early diagnosis and treatment reduce long-term needs and lower health costs. We aim for this research to be a machine-learning method for forecasting early warning signs of stroke. The methodology we employed feature selection techniques and multiple algorithms. Utilizing the XGboost Algorithm, the research findings indicate that their proposed model achieved an accuracy rate of 96.45%. This research shows that machine learning can effectively predict early warning signs of stroke, which can help reduce long-term treatment and rehabilitation needs and lower health costs.展开更多
Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear mode...Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.展开更多
Flowing bottom-hole pressure(FBHP)is a key metric parameter in the evaluation of performances of oil and gas production wells.An accurate prediction of FBHP is highly required in the petroleum industry for many applic...Flowing bottom-hole pressure(FBHP)is a key metric parameter in the evaluation of performances of oil and gas production wells.An accurate prediction of FBHP is highly required in the petroleum industry for many applications,such the hydrocarbon production optimization,oil lifting cost,and assessment of workover operations.Production and reservoir engineers rely on empirical correlations and mechanistic models exist in open resources to estimate the FBHP.Several empirical models have been developed based on simulation and laboratory results that involved many assumptions that reduce the model's accuracy when they are applied for the field applications.The technologies of machine learning(ML)are one discipline of Artificial Intelligence(AI)techniques provide promising tools that help solving human's complex problems.This study develops machine-learning based models to predict the multiphase FBHP using three machine learning techniques that are Random forest,K-Nearest Neighbors(KNN),and artificial neural network(ANN).Results showed that using an artificial neural network model give error of 2.5%to estimate the FBHP which is less than the random forest and K-nearest neighbor models with error of 3.6%and 4%respectively.The ML models were developed based on a surface production data,which makes the FBHP is predicted using actual field data.The accuracy of the proposed models from ML was evaluated by comparing the results with the actual dataset values to ensure the effectiveness of the work.The results of this study show the potential of artificial intelligence in predicting the most complex parameter in the multiphase petroleum production process.展开更多
Liquid leakage from pipelines is a critical issue in large-scale process plants.Damage in pipelines affects the normal operation of the plant and increases maintenance costs.Furthermore,it causes unsafe and hazardous ...Liquid leakage from pipelines is a critical issue in large-scale process plants.Damage in pipelines affects the normal operation of the plant and increases maintenance costs.Furthermore,it causes unsafe and hazardous situations for operators.Therefore,the detection and localization of leakages is a crucial task for maintenance and condition monitoring.Recently,the use of infrared(IR)cameras was found to be a promising approach for leakage detection in large-scale plants.IR cameras can capture leaking liquid if it has a higher(or lower)temperature than its surroundings.In this paper,a method based on IR video data and machine vision techniques is proposed to detect and localize liquid leakages in a chemical process plant.Since the proposed method is a vision-based method and does not consider the physical properties of the leaking liquid,it is applicable for any type of liquid leakage(i.e.,water,oil,etc.).In this method,subsequent frames are subtracted and divided into blocks.Then,principle component analysis is performed in each block to extract features from the blocks.All subtracted frames within the blocks are individually transferred to feature vectors,which are used as a basis for classifying the blocks.The k-nearest neighbor algorithm is used to classify the blocks as normal(without leakage)or anomalous(with leakage).Finally,the positions of the leakages are determined in each anomalous block.In order to evaluate the approach,two datasets with two different formats,consisting of video footage of a laboratory demonstrator plant captured by an IR camera,are considered.The results show that the proposed method is a promising approach to detect and localize leakages from pipelines using IR videos.The proposed method has high accuracy and a reasonable detection time for leakage detection.The possibility of extending the proposed method to a real industrial plant and the limitations of this method are discussed at the end.展开更多
The k-nearest neighbor (k-NN) method was evaluated to predict the influent flow rate and four water qualities, namely chemical oxygen demand (COD), suspended solid (SS), total nitrogen (T-N) and total phosphor...The k-nearest neighbor (k-NN) method was evaluated to predict the influent flow rate and four water qualities, namely chemical oxygen demand (COD), suspended solid (SS), total nitrogen (T-N) and total phosphorus (T-P) at a wastewater treatment plant (WWTP). The search range and approach for determining the number of nearest neighbors (NNs) under dry and wet weather conditions were initially optimized based on the root mean square error (RMSE). The optimum search range for considering data size was one year. The square root-based (SR) approach was superior to the distance factor-based (DF) approach in determining the appropriate number of NNs. However, the results for both approaches varied slightly depending on the water quality and the weather conditions. The influent flow rate was accurately predicted within one standard deviation of measured values. Influent water qualities were well predicted with the mean absolute percentage error (MAPE) under both wet and dry weather conditions. For the seven-day prediction, the difference in predictive accuracy was less than 5% in dry weather conditions and slightly worse in wet weather conditions. Overall, the k-NN method was verified to be useful for predicting WWTP influent characteristics.展开更多
The EM algorithm is a very popular maximum likelihood estimation method, the iterative algorithm for solving the maximum likelihood estimator when the observation data is the incomplete data, but also is very effectiv...The EM algorithm is a very popular maximum likelihood estimation method, the iterative algorithm for solving the maximum likelihood estimator when the observation data is the incomplete data, but also is very effective algorithm to estimate the finite mixture model parameters. However, EM algorithm can not guarantee to find the global optimal solution, and often easy to fall into local optimal solution, so it is sensitive to the determination of initial value to iteration. Traditional EM algorithm select the initial value at random, we propose an improved method of selection of initial value. First, we use the k-nearest-neighbor method to delete outliers. Second, use the k-means to initialize the EM algorithm. Compare this method with the original random initial value method, numerical experiments show that the parameter estimation effect of the initialization of the EM algorithm is significantly better than the effect of the original EM algorithm.展开更多
Recently,great attention has been paid to geopolymer concrete due to its advantageous mechanical and environmentally friendly properties.Much effort has been made in experimental studies to advance the understanding o...Recently,great attention has been paid to geopolymer concrete due to its advantageous mechanical and environmentally friendly properties.Much effort has been made in experimental studies to advance the understanding of geopolymer concrete,in which compressive strength is one of the most important properties.To facilitate engineering work on the material,an efficient predicting model is needed.In this study,three machine learning(ML)-based models,namely deep neural network(DNN),K-nearest neighbors(KNN),and support vector machines(SVM),are developed for forecasting the compressive strength of the geopolymer concrete.A total of 375 experimental samples are collected from the literature to build a database for the development of the predicting models.A careful procedure for data preprocessing is implemented,by which outliers are examined and removed from the database and input variables are standardized before feeding to the fitting process.The standard K-fold cross-validation approach is applied for evaluating the performance of the models so that overfitting status is well managed,thus the generalizability of the models is ensured.The effectiveness of the models is assessed via statistical metrics including root mean squared error(RMSE),mean absolute error(MAE),correlation coefficient(R),and the recently proposed performance index(PI).The basic mean square error(MSE)is used as the loss function to be minimized during the model fitting process.The three ML-based models are successfully developed for estimating the compressive strength,for which good correlations between the predicted and the true values are obtained for DNN,KNN,and SVM.The numerical results suggest that the DNN model generally outperforms the other two models.展开更多
Slurry electrolysis(SE),as a hydrometallurgical process,has the characteristic of a multitank series connection,which leads to various stirring conditions and a complex solid suspension state.The computational fluid d...Slurry electrolysis(SE),as a hydrometallurgical process,has the characteristic of a multitank series connection,which leads to various stirring conditions and a complex solid suspension state.The computational fluid dynamics(CFD),which requires high computing resources,and a combination with machine learning was proposed to construct a rapid prediction model for the liquid flow and solid concentration fields in a SE tank.Through scientific selection of calculation samples via orthogonal experiments,a comprehensive dataset covering a wide range of conditions was established while effectively reducing the number of simulations and providing reasonable weights for each factor.Then,a prediction model of the SE tank was constructed using the K-nearest neighbor algorithm.The results show that with the increase in levels of orthogonal experiments,the prediction accuracy of the model improved remarkably.The model established with four factors and nine levels can accurately predict the flow and concentration fields,and the regression coefficients of average velocity and solid concentration were 0.926 and 0.937,respectively.Compared with traditional CFD,the response time of field information prediction in this model was reduced from 75 h to 20 s,which solves the problem of serious lag in CFD applied alone to actual production and meets real-time production control requirements.展开更多
The growing usage of Android smartphones has led to a significant rise in incidents of Android malware andprivacy breaches.This escalating security concern necessitates the development of advanced technologies capable...The growing usage of Android smartphones has led to a significant rise in incidents of Android malware andprivacy breaches.This escalating security concern necessitates the development of advanced technologies capableof automatically detecting andmitigatingmalicious activities in Android applications(apps).Such technologies arecrucial for safeguarding user data and maintaining the integrity of mobile devices in an increasingly digital world.Current methods employed to detect sensitive data leaks in Android apps are hampered by two major limitationsthey require substantial computational resources and are prone to a high frequency of false positives.This meansthat while attempting to identify security breaches,these methods often consume considerable processing powerand mistakenly flag benign activities as malicious,leading to inefficiencies and reduced reliability in malwaredetection.The proposed approach includes a data preprocessing step that removes duplicate samples,managesunbalanced datasets,corrects inconsistencies,and imputes missing values to ensure data accuracy.The Minimaxmethod is then used to normalize numerical data,followed by feature vector extraction using the Gain ratio andChi-squared test to identify and extract the most significant characteristics using an appropriate prediction model.This study focuses on extracting a subset of attributes best suited for the task and recommending a predictivemodel based on domain expert opinion.The proposed method is evaluated using Drebin and TUANDROMDdatasets containing 15,036 and 4,464 benign and malicious samples,respectively.The empirical result shows thatthe RandomForest(RF)and Support VectorMachine(SVC)classifiers achieved impressive accuracy rates of 98.9%and 98.8%,respectively,in detecting unknown Androidmalware.A sensitivity analysis experiment was also carriedout on all three ML-based classifiers based on MAE,MSE,R2,and sensitivity parameters,resulting in a flawlessperformance for both datasets.This approach has substantial potential for real-world a展开更多
In this study,our aim is to address the problem of gene selection by proposing a hybrid bio-inspired evolutionary algorithm that combines Grey Wolf Optimization(GWO)with Harris Hawks Optimization(HHO)for feature selec...In this study,our aim is to address the problem of gene selection by proposing a hybrid bio-inspired evolutionary algorithm that combines Grey Wolf Optimization(GWO)with Harris Hawks Optimization(HHO)for feature selection.Themotivation for utilizingGWOandHHOstems fromtheir bio-inspired nature and their demonstrated success in optimization problems.We aimto leverage the strengths of these algorithms to enhance the effectiveness of feature selection in microarray-based cancer classification.We selected leave-one-out cross-validation(LOOCV)to evaluate the performance of both two widely used classifiers,k-nearest neighbors(KNN)and support vector machine(SVM),on high-dimensional cancer microarray data.The proposed method is extensively tested on six publicly available cancer microarray datasets,and a comprehensive comparison with recently published methods is conducted.Our hybrid algorithm demonstrates its effectiveness in improving classification performance,Surpassing alternative approaches in terms of precision.The outcomes confirm the capability of our method to substantially improve both the precision and efficiency of cancer classification,thereby advancing the development ofmore efficient treatment strategies.The proposed hybridmethod offers a promising solution to the gene selection problem in microarray-based cancer classification.It improves the accuracy and efficiency of cancer diagnosis and treatment,and its superior performance compared to other methods highlights its potential applicability in realworld cancer classification tasks.By harnessing the complementary search mechanisms of GWO and HHO,we leverage their bio-inspired behavior to identify informative genes relevant to cancer diagnosis and treatment.展开更多
Accurate and efficient urban traffic flow prediction can help drivers identify road traffic conditions in real-time,consequently helping them avoid congestion and accidents to a certain extent.However,the existing met...Accurate and efficient urban traffic flow prediction can help drivers identify road traffic conditions in real-time,consequently helping them avoid congestion and accidents to a certain extent.However,the existing methods for real-time urban traffic flow prediction focus on improving the model prediction accuracy or efficiency while ignoring the training efficiency,which results in a prediction system that lacks the scalability to integrate real-time traffic flow into the training procedure.To conduct accurate and real-time urban traffic flow prediction while considering the latest historical data and avoiding time-consuming online retraining,herein,we propose a scalable system for Predicting short-term URban traffic flow in real-time based on license Plate recognition data(PURP).First,to ensure prediction accuracy,PURP constructs the spatio-temporal contexts of traffic flow prediction from License Plate Recognition(LPR)data as effective characteristics.Subsequently,to utilize the recent data without retraining the model online,PURP uses the nonparametric method k-Nearest Neighbor(namely KNN)as the prediction framework because the KNN can efficiently identify the top-k most similar spatio-temporal contexts and make predictions based on these contexts without time-consuming model retraining online.The experimental results show that PURP retains strong prediction efficiency as the prediction period increases.展开更多
文摘This paper presents a fast hybrid fault location method for active distribution networks with distributed generation(DG)and microgrids.The method uses the voltage and current data from the measurement points at the main substation,and the connection points of DG and microgrids.The data is used in a single feedforward artificial neural network(ANN)to estimate the distances to fault from all the measuring points.A k-nearest neighbors(KNN)classifier then interprets the ANN outputs and estimates a single fault location.Simulation results validate the accuracy of the fault location method under different fault conditions including fault types,fault points,and fault resistances.The performance is also validated for non-synchronized measurements and measurement errors.
基金Project supported by the Postdoctoral Science Foundation of China(No.20070410397)the National Natural Science Foundation of China(No.60705002)the Science and Technology Project of Zhejiang Province,China(No.2005C13026)
文摘Hepatitis B virus (HBV)-induced liver failure is an emergent liver disease leading to high mortality. The severity of liver failure may be reflected by the profile of some metabolites. This study assessed the potential of using metabolites as biomarkers for liver failure by identifying metabolites with good discriminative performance for its phenotype. The serum samples from 24 HBV-indueed liver failure patients and 23 healthy volunteers were collected and analyzed by gas chromatography-mass spectrometry (GC-MS) to generate metabolite profiles. The 24 patients were further grouped into two classes according to the severity of liver failure. Twenty-five eommensal peaks in all metabolite profiles were extracted, and the relative area values of these peaks were used as features for each sample. Three algorithms, F-test, k-nearest neighbor (KNN) and fuzzy support vector machine (FSVM) combined with exhaustive search (ES), were employed to identify a subset of metabolites (biomarkers) that best predict liver failure. Based on the achieved experimental dataset, 93.62% predictive accuracy by 6 features was selected with FSVM-ES and three key metabolites, glyeerie acid, cis-aeonitie acid and citric acid, are identified as potential diagnostic biomarkers.
基金supported by the National Natural Science Foundation of China(Grant Nos.30470302 and 70373044).
文摘Accurate information about forest volumes is essential for forest management planning. The survey interval of the Forest Resource Inventory of China (FRIC) is too long to meet the demand for timely decision-making required for forest protection, management, and utilization. Analysis of satellite imagery provides good potential for more frequent reporting of forest parameters. In this study, we describe an application of the k-nearest neighbors (kNN) method to Landsat TM imagery for improving estimation of forest volumes. Several spectral features were tested and compared in forest volume estimations, including normalized difference vegetation index, environmental vegetation index, and the combination of the spectral features. The combined index resulted in the most accurate volume estimations. The kNN estimator and the combined index were then used in forest volume estimation. The estimation error (RMSE) of the total volume was44.2%, much lower than those for Larix forest (the RMSE was 51.7%) and those for the Korean pine and broadleaved forests (the estimation errors were over 71.7% and 88.19%,respectively). This preliminary study demonstrates the potential of forest volume estimations with remote sensing data to provide useful information for forest management if only limited ground information is available.
基金supported by the National Science Fund for Distinguished Young Scholars of China(61525304)the National Natural Science Foundation of China(61873328)
文摘In this paper, a memetic algorithm with competition(MAC) is proposed to solve the capacitated green vehicle routing problem(CGVRP). Firstly, the permutation array called traveling salesman problem(TSP) route is used to encode the solution, and an effective decoding method to construct the CGVRP route is presented accordingly. Secondly, the k-nearest neighbor(k NN) based initialization is presented to take use of the location information of the customers. Thirdly, according to the characteristics of the CGVRP, the search operators in the variable neighborhood search(VNS) framework and the simulated annealing(SA) strategy are executed on the TSP route for all solutions. Moreover, the customer adjustment operator and the alternative fuel station(AFS) adjustment operator on the CGVRP route are executed for the elite solutions after competition. In addition, the crossover operator is employed to share information among different solutions. The effect of parameter setting is investigated using the Taguchi method of design-ofexperiment to suggest suitable values. Via numerical tests, it demonstrates the effectiveness of both the competitive search and the decoding method. Moreover, extensive comparative results show that the proposed algorithm is more effective and efficient than the existing methods in solving the CGVRP.
文摘Short-term traffic flow is one of the core technologies to realize traffic flow guidance. In this article, in view of the characteristics that the traffic flow changes repeatedly, a short-term traffic flow forecasting method based on a three-layer K-nearest neighbor non-parametric regression algorithm is proposed. Specifically, two screening layers based on shape similarity were introduced in K-nearest neighbor non-parametric regression method, and the forecasting results were output using the weighted averaging on the reciprocal values of the shape similarity distances and the most-similar-point distance adjustment method. According to the experimental results, the proposed algorithm has improved the predictive ability of the traditional K-nearest neighbor non-parametric regression method, and greatly enhanced the accuracy and real-time performance of short-term traffic flow forecasting.
基金supported by the Korea Institute of Science and Technology Information (KISTI)
文摘Recent development of wireless communication technologies and the popularity of smart phones .are making location-based services (LBS) popular. However, requesting queries to LBS servers with users' exact locations may threat the privacy of users. Therefore, there have been many researches on generating a cloaked query region for user privacy protection. Consequently, an efficient query processing algorithm for a query region is required. So, in this paper, we propose k-nearest neighbor query (k-NN) processing algorithms for a query region in road networks. To efficiently retrieve k-NN points of interest (POIs), we make use of the Island index. We also propose a method that generates an adaptive Island index to improve the query processing performance and storage usage. Finally, we show by our performance analysis that our k-NN query processing algorithms outperform the existing k-Range Nearest Neighbor (kRNN) algorithm in terms of network expansion cost and query processing time.
文摘This paper proposes a novel grading method of apples,in an automated grading device that uses convolutional neural networks to extract the size,color,texture,and roundness of an apple.The developed machine learning method uses the ability of learning representative features by means of a convolutional neural network(CNN),to determine suitable features of apples for the grading process.This information is fed into a one-to-one classifier that uses a support vector machine(SVM),instead of the softmax output layer of the CNN.In this manner,Yantai apples with similar shapes and low discrimination are graded using four different approaches.The fusion model using both CNN and SVM classifiers is much more accurate than the simple k-nearest neighbor(KNN),SVM,and CNN model when used separately for grading,and the learning ability and the generalization ability of the model is correspondingly increased by the combined method.Grading tests are carried out using the automated grading device that is developed in the present work.It is verified that the actual effect of apple grading using the combined CNN-SVM model is fast and accurate,which greatly reduces the manpower and labor costs of manual grading,and has important commercial prospects.
文摘Early stroke prediction is vital to prevent damage. A stroke happens when the blood flow to the brain is disrupted by a clot or bleeding, resulting in brain death or injury. However, early diagnosis and treatment reduce long-term needs and lower health costs. We aim for this research to be a machine-learning method for forecasting early warning signs of stroke. The methodology we employed feature selection techniques and multiple algorithms. Utilizing the XGboost Algorithm, the research findings indicate that their proposed model achieved an accuracy rate of 96.45%. This research shows that machine learning can effectively predict early warning signs of stroke, which can help reduce long-term treatment and rehabilitation needs and lower health costs.
文摘Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance.
文摘Flowing bottom-hole pressure(FBHP)is a key metric parameter in the evaluation of performances of oil and gas production wells.An accurate prediction of FBHP is highly required in the petroleum industry for many applications,such the hydrocarbon production optimization,oil lifting cost,and assessment of workover operations.Production and reservoir engineers rely on empirical correlations and mechanistic models exist in open resources to estimate the FBHP.Several empirical models have been developed based on simulation and laboratory results that involved many assumptions that reduce the model's accuracy when they are applied for the field applications.The technologies of machine learning(ML)are one discipline of Artificial Intelligence(AI)techniques provide promising tools that help solving human's complex problems.This study develops machine-learning based models to predict the multiphase FBHP using three machine learning techniques that are Random forest,K-Nearest Neighbors(KNN),and artificial neural network(ANN).Results showed that using an artificial neural network model give error of 2.5%to estimate the FBHP which is less than the random forest and K-nearest neighbor models with error of 3.6%and 4%respectively.The ML models were developed based on a surface production data,which makes the FBHP is predicted using actual field data.The accuracy of the proposed models from ML was evaluated by comparing the results with the actual dataset values to ensure the effectiveness of the work.The results of this study show the potential of artificial intelligence in predicting the most complex parameter in the multiphase petroleum production process.
基金funded by the German Federal Ministry for Economic Affairs and Energy(BMWi)(01MD15009F).
文摘Liquid leakage from pipelines is a critical issue in large-scale process plants.Damage in pipelines affects the normal operation of the plant and increases maintenance costs.Furthermore,it causes unsafe and hazardous situations for operators.Therefore,the detection and localization of leakages is a crucial task for maintenance and condition monitoring.Recently,the use of infrared(IR)cameras was found to be a promising approach for leakage detection in large-scale plants.IR cameras can capture leaking liquid if it has a higher(or lower)temperature than its surroundings.In this paper,a method based on IR video data and machine vision techniques is proposed to detect and localize liquid leakages in a chemical process plant.Since the proposed method is a vision-based method and does not consider the physical properties of the leaking liquid,it is applicable for any type of liquid leakage(i.e.,water,oil,etc.).In this method,subsequent frames are subtracted and divided into blocks.Then,principle component analysis is performed in each block to extract features from the blocks.All subtracted frames within the blocks are individually transferred to feature vectors,which are used as a basis for classifying the blocks.The k-nearest neighbor algorithm is used to classify the blocks as normal(without leakage)or anomalous(with leakage).Finally,the positions of the leakages are determined in each anomalous block.In order to evaluate the approach,two datasets with two different formats,consisting of video footage of a laboratory demonstrator plant captured by an IR camera,are considered.The results show that the proposed method is a promising approach to detect and localize leakages from pipelines using IR videos.The proposed method has high accuracy and a reasonable detection time for leakage detection.The possibility of extending the proposed method to a real industrial plant and the limitations of this method are discussed at the end.
文摘The k-nearest neighbor (k-NN) method was evaluated to predict the influent flow rate and four water qualities, namely chemical oxygen demand (COD), suspended solid (SS), total nitrogen (T-N) and total phosphorus (T-P) at a wastewater treatment plant (WWTP). The search range and approach for determining the number of nearest neighbors (NNs) under dry and wet weather conditions were initially optimized based on the root mean square error (RMSE). The optimum search range for considering data size was one year. The square root-based (SR) approach was superior to the distance factor-based (DF) approach in determining the appropriate number of NNs. However, the results for both approaches varied slightly depending on the water quality and the weather conditions. The influent flow rate was accurately predicted within one standard deviation of measured values. Influent water qualities were well predicted with the mean absolute percentage error (MAPE) under both wet and dry weather conditions. For the seven-day prediction, the difference in predictive accuracy was less than 5% in dry weather conditions and slightly worse in wet weather conditions. Overall, the k-NN method was verified to be useful for predicting WWTP influent characteristics.
文摘The EM algorithm is a very popular maximum likelihood estimation method, the iterative algorithm for solving the maximum likelihood estimator when the observation data is the incomplete data, but also is very effective algorithm to estimate the finite mixture model parameters. However, EM algorithm can not guarantee to find the global optimal solution, and often easy to fall into local optimal solution, so it is sensitive to the determination of initial value to iteration. Traditional EM algorithm select the initial value at random, we propose an improved method of selection of initial value. First, we use the k-nearest-neighbor method to delete outliers. Second, use the k-means to initialize the EM algorithm. Compare this method with the original random initial value method, numerical experiments show that the parameter estimation effect of the initialization of the EM algorithm is significantly better than the effect of the original EM algorithm.
文摘Recently,great attention has been paid to geopolymer concrete due to its advantageous mechanical and environmentally friendly properties.Much effort has been made in experimental studies to advance the understanding of geopolymer concrete,in which compressive strength is one of the most important properties.To facilitate engineering work on the material,an efficient predicting model is needed.In this study,three machine learning(ML)-based models,namely deep neural network(DNN),K-nearest neighbors(KNN),and support vector machines(SVM),are developed for forecasting the compressive strength of the geopolymer concrete.A total of 375 experimental samples are collected from the literature to build a database for the development of the predicting models.A careful procedure for data preprocessing is implemented,by which outliers are examined and removed from the database and input variables are standardized before feeding to the fitting process.The standard K-fold cross-validation approach is applied for evaluating the performance of the models so that overfitting status is well managed,thus the generalizability of the models is ensured.The effectiveness of the models is assessed via statistical metrics including root mean squared error(RMSE),mean absolute error(MAE),correlation coefficient(R),and the recently proposed performance index(PI).The basic mean square error(MSE)is used as the loss function to be minimized during the model fitting process.The three ML-based models are successfully developed for estimating the compressive strength,for which good correlations between the predicted and the true values are obtained for DNN,KNN,and SVM.The numerical results suggest that the DNN model generally outperforms the other two models.
基金financially supported by the National Natural Science Foundation of China(No.51974018the Open Foundation of the State Key Laboratory of Process Automation in Mining and Metallurgy(No.BGRIMM-KZSKL-2022-9).
文摘Slurry electrolysis(SE),as a hydrometallurgical process,has the characteristic of a multitank series connection,which leads to various stirring conditions and a complex solid suspension state.The computational fluid dynamics(CFD),which requires high computing resources,and a combination with machine learning was proposed to construct a rapid prediction model for the liquid flow and solid concentration fields in a SE tank.Through scientific selection of calculation samples via orthogonal experiments,a comprehensive dataset covering a wide range of conditions was established while effectively reducing the number of simulations and providing reasonable weights for each factor.Then,a prediction model of the SE tank was constructed using the K-nearest neighbor algorithm.The results show that with the increase in levels of orthogonal experiments,the prediction accuracy of the model improved remarkably.The model established with four factors and nine levels can accurately predict the flow and concentration fields,and the regression coefficients of average velocity and solid concentration were 0.926 and 0.937,respectively.Compared with traditional CFD,the response time of field information prediction in this model was reduced from 75 h to 20 s,which solves the problem of serious lag in CFD applied alone to actual production and meets real-time production control requirements.
基金Princess Nourah bint Abdulrahman University and Researchers Supporting Project Number(PNURSP2024R346)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘The growing usage of Android smartphones has led to a significant rise in incidents of Android malware andprivacy breaches.This escalating security concern necessitates the development of advanced technologies capableof automatically detecting andmitigatingmalicious activities in Android applications(apps).Such technologies arecrucial for safeguarding user data and maintaining the integrity of mobile devices in an increasingly digital world.Current methods employed to detect sensitive data leaks in Android apps are hampered by two major limitationsthey require substantial computational resources and are prone to a high frequency of false positives.This meansthat while attempting to identify security breaches,these methods often consume considerable processing powerand mistakenly flag benign activities as malicious,leading to inefficiencies and reduced reliability in malwaredetection.The proposed approach includes a data preprocessing step that removes duplicate samples,managesunbalanced datasets,corrects inconsistencies,and imputes missing values to ensure data accuracy.The Minimaxmethod is then used to normalize numerical data,followed by feature vector extraction using the Gain ratio andChi-squared test to identify and extract the most significant characteristics using an appropriate prediction model.This study focuses on extracting a subset of attributes best suited for the task and recommending a predictivemodel based on domain expert opinion.The proposed method is evaluated using Drebin and TUANDROMDdatasets containing 15,036 and 4,464 benign and malicious samples,respectively.The empirical result shows thatthe RandomForest(RF)and Support VectorMachine(SVC)classifiers achieved impressive accuracy rates of 98.9%and 98.8%,respectively,in detecting unknown Androidmalware.A sensitivity analysis experiment was also carriedout on all three ML-based classifiers based on MAE,MSE,R2,and sensitivity parameters,resulting in a flawlessperformance for both datasets.This approach has substantial potential for real-world a
基金the Deputyship for Research and Innovation,“Ministry of Education”in Saudi Arabia for funding this research(IFKSUOR3-014-3).
文摘In this study,our aim is to address the problem of gene selection by proposing a hybrid bio-inspired evolutionary algorithm that combines Grey Wolf Optimization(GWO)with Harris Hawks Optimization(HHO)for feature selection.Themotivation for utilizingGWOandHHOstems fromtheir bio-inspired nature and their demonstrated success in optimization problems.We aimto leverage the strengths of these algorithms to enhance the effectiveness of feature selection in microarray-based cancer classification.We selected leave-one-out cross-validation(LOOCV)to evaluate the performance of both two widely used classifiers,k-nearest neighbors(KNN)and support vector machine(SVM),on high-dimensional cancer microarray data.The proposed method is extensively tested on six publicly available cancer microarray datasets,and a comprehensive comparison with recently published methods is conducted.Our hybrid algorithm demonstrates its effectiveness in improving classification performance,Surpassing alternative approaches in terms of precision.The outcomes confirm the capability of our method to substantially improve both the precision and efficiency of cancer classification,thereby advancing the development ofmore efficient treatment strategies.The proposed hybridmethod offers a promising solution to the gene selection problem in microarray-based cancer classification.It improves the accuracy and efficiency of cancer diagnosis and treatment,and its superior performance compared to other methods highlights its potential applicability in realworld cancer classification tasks.By harnessing the complementary search mechanisms of GWO and HHO,we leverage their bio-inspired behavior to identify informative genes relevant to cancer diagnosis and treatment.
基金This work was supported by the National Natural Science Foundation of China(Nos.62072405 and 62276233)the Key Research Project of Zhejiang Province(No.2023C01048).
文摘Accurate and efficient urban traffic flow prediction can help drivers identify road traffic conditions in real-time,consequently helping them avoid congestion and accidents to a certain extent.However,the existing methods for real-time urban traffic flow prediction focus on improving the model prediction accuracy or efficiency while ignoring the training efficiency,which results in a prediction system that lacks the scalability to integrate real-time traffic flow into the training procedure.To conduct accurate and real-time urban traffic flow prediction while considering the latest historical data and avoiding time-consuming online retraining,herein,we propose a scalable system for Predicting short-term URban traffic flow in real-time based on license Plate recognition data(PURP).First,to ensure prediction accuracy,PURP constructs the spatio-temporal contexts of traffic flow prediction from License Plate Recognition(LPR)data as effective characteristics.Subsequently,to utilize the recent data without retraining the model online,PURP uses the nonparametric method k-Nearest Neighbor(namely KNN)as the prediction framework because the KNN can efficiently identify the top-k most similar spatio-temporal contexts and make predictions based on these contexts without time-consuming model retraining online.The experimental results show that PURP retains strong prediction efficiency as the prediction period increases.