The object-oriented information extraction technique was used to improve classification accuracy,and addressed the problem that HJ-1 CCD remote sensing images have only four spectral bands with moderate spatial resolu...The object-oriented information extraction technique was used to improve classification accuracy,and addressed the problem that HJ-1 CCD remote sensing images have only four spectral bands with moderate spatial resolution.We used two key techniques:the selection of optimum image segmentation scale and the development of an appropriate object-oriented information extraction strategy.With the principle of minimizing merge cost of merging neighboring pixels/objects,we used spatial autocorrelation index Moran's I and the variance index to select the optimum segmentation scale.The Nearest Neighborhood(NN) classifier based on sampling and a knowledge-based fuzzy classifier were used in the object-oriented information extraction strategy.In this classification step,feature optimization was used to improve information extraction accuracy using reduced data dimension.These two techniques were applied to land cover information extraction for Shanghai city using a HJ-1 CCD image.Results indicate that the information extraction accuracy of the object-oriented method was much higher than that of the pixel-based method.展开更多
The time dependent vehicle routing problem with time windows(TDVRPTW) is considered. A multi-type ant system(MTAS) algorithm hybridized with the ant colony system(ACS)and the max-min ant system(MMAS) algorithm...The time dependent vehicle routing problem with time windows(TDVRPTW) is considered. A multi-type ant system(MTAS) algorithm hybridized with the ant colony system(ACS)and the max-min ant system(MMAS) algorithms is proposed. This combination absorbs the merits of the two algorithms in solutions construction and optimization separately. In order to improve the efficiency of the insertion procedure, a nearest neighbor selection(NNS) mechanism, an insertion local search procedure and a local optimization procedure are specified in detail. And in order to find a balance between good scouting performance and fast convergence rate, an adaptive pheromone updating strategy is proposed in the MTAS. Computational results confirm the MTAS algorithm's good performance with all these strategies on classic vehicle routing problem with time windows(VRPTW) benchmark instances and the TDVRPTW instances, and some better results especially for the number of vehicles and travel times of the best solutions are obtained in comparison with the previous research.展开更多
Support vector machine (SVM), as a novel approach in pattern recognition, has demonstrated a success in face detection and face recognition. In this paper, a face recognition approach based on the SVM classifier with ...Support vector machine (SVM), as a novel approach in pattern recognition, has demonstrated a success in face detection and face recognition. In this paper, a face recognition approach based on the SVM classifier with the nearest neighbor classifier (NNC) is proposed. The principal component analysis (PCA) is used to reduce the dimension and extract features. Then one-against-all stratedy is used to train the SVM classifiers. At the testing stage, we propose an al-展开更多
There is a growing concern in traffic accident rate in recent years. Using Mashhad city (Iran second populous city) traffic accident records as case study, this paper applied the combi- nation of geo-information tec...There is a growing concern in traffic accident rate in recent years. Using Mashhad city (Iran second populous city) traffic accident records as case study, this paper applied the combi- nation of geo-information technology and spatial-statistical analysis to bring out the influence of spatial factors in their formation. The aim of the study is to examine 4 clustering analyses to have a better understanding of traffic accidents patterns in complex urban network. In order to deploy the clustering technique in urban roads, 9331 point features for inner city traffic accidents during 12 months have been registered according to their x and y location in geographic information system (GIS). The mentioned areas were analyzed by kernel density estimation (KDE) using ARCMAP and two other analyses using SANET 4th edition software so that the results of network analysis can be compared with traditional KDE method. In addi- tion, this research introduces five classifications for determining the eventfulness of the under study area based on standard deviation and to make priority in creating security in the area. The nearest neighbor and K-function output analysis consist of four curves and regarding the fact that for all fatal, injury and property damage only crashes, the observed value curve is above the 5% confidence interval. Accidents in the study region are more clustered than ex- pected by random chance. The importance of this study is to use GIS as a management system for accident analysis by combination of spatial-statistical methods.展开更多
It is well-known that in order to build a strong ensemble, the component learners should be with high diversity as well as high accuracy. If perturbing the training set can cause significant changes in the component l...It is well-known that in order to build a strong ensemble, the component learners should be with high diversity as well as high accuracy. If perturbing the training set can cause significant changes in the component learners constructed, then Bagging can effectively improve accuracy. However, for stable learners such as nearest neighbor classifiers, perturbing the training set can hardly produce diverse component learners, therefore Bagging does not work well. This paper adapts Bagging to nearest neighbor classifiers through injecting randomness to distance metrics. In constructing the component learners, both the training set and the distance metric employed for identifying the neighbors are perturbed. A large scale empirical study reported in this paper shows that the proposed BagInRand algorithm can effectively improve the accuracy of nearest neighbor classifiers.展开更多
A novel traffic sign recognition system is presented in this work. Firstly, the color segmentation and shape classifier based on signature feature of region are used to detect traffic signs in input video sequences. S...A novel traffic sign recognition system is presented in this work. Firstly, the color segmentation and shape classifier based on signature feature of region are used to detect traffic signs in input video sequences. Secondly, traffic sign color-image is preprocessed with gray scaling, and normalized to 64×64 size. Then, image features could be obtained by four levels DT-CWT images. Thirdly, 2DICA and nearest neighbor classifier are united to recognize traffic signs. The whole recognition algorithm is implemented for classification of 50 categories of traffic signs and its recognition accuracy reaches 90%. Comparing image representation DT-CWT with the well-established image representation like template, Gabor, and 2DICA with feature selection techniques such as PCA, LPP, 2DPCA at the same time, the results show that combination method of DT-CWT and 2DICA is useful in traffic signs recognition. Experimental results indicate that the proposed algorithm is robust, effective and accurate.展开更多
Many data mining applications have a large amount of data but labeling data is usually difficult, expensive, or time consuming, as it requires human experts for annotation. Semi-supervised learning addresses this prob...Many data mining applications have a large amount of data but labeling data is usually difficult, expensive, or time consuming, as it requires human experts for annotation. Semi-supervised learning addresses this problem by using unlabeled data together with labeled data in the training process. Co-Training is a popular semi-supervised learning algorithm that has the assumptions that each example is represented by multiple sets of features (views) and these views are sufficient for learning and independent given the class. However, these assumptions axe strong and are not satisfied in many real-world domains. In this paper, a single-view variant of Co-Training, called Co-Training by Committee (CoBC) is proposed, in which an ensemble of diverse classifiers is used instead of redundant and independent views. We introduce a new labeling confidence measure for unlabeled examples based on estimating the local accuracy of the committee members on its neighborhood. Then we introduce two new learning algorithms, QBC-then-CoBC and QBC-with-CoBC, which combine the merits of committee-based semi-supervised learning and active learning. The random subspace method is applied on both C4.5 decision trees and 1-nearest neighbor classifiers to construct the diverse ensembles used for semi-supervised learning and active learning. Experiments show that these two combinations can outperform other non committee-based ones.展开更多
Attacks such as APT usually hide communication data in massive legitimate network traffic, and mining structurally complex and latent relationships among flow-based network traffic to detect attacks has become the foc...Attacks such as APT usually hide communication data in massive legitimate network traffic, and mining structurally complex and latent relationships among flow-based network traffic to detect attacks has become the focus of many initiatives. Effectively analyzing massive network security data with high dimensions for suspicious flow diagnosis is a huge challenge. In addition, the uneven distribution of network traffic does not fully reflect the differences of class sample features, resulting in the low accuracy of attack detection. To solve these problems, a novel approach called the fuzzy entropy weighted natural nearest neighbor(FEW-NNN) method is proposed to enhance the accuracy and efficiency of flowbased network traffic attack detection. First, the FEW-NNN method uses the Fisher score and deep graph feature learning algorithm to remove unimportant features and reduce the data dimension. Then, according to the proposed natural nearest neighbor searching algorithm(NNN_Searching), the density of data points, each class center and the smallest enclosing sphere radius are determined correspondingly. Finally, a fuzzy entropy weighted KNN classification method based on affinity is proposed, which mainly includes the following three steps: 1、 the feature weights of samples are calculated based on fuzzy entropy values, 2、 the fuzzy memberships of samples are determined based on affinity among samples, and 3、 K-neighbors are selected according to the class-conditional weighted Euclidean distance, the fuzzy membership value of the testing sample is calculated based on the membership of k-neighbors, and then all testing samples are classified according to the fuzzy membership value of the samples belonging to each class;that is, the attack type is determined. The method has been applied to the problem of attack detection and validated based on the famous KDD99 and CICIDS-2017 datasets. From the experimental results shown in this paper, it is observed that the FEW-NNN method improves the accuracy and efficiency of f展开更多
AIM:To support probe-based confocal laser endomi-croscopy (pCLE) diagnosis by designing software for the automated classification of colonic polyps. METHODS:Intravenous fluorescein pCLE imaging of colorectal lesions w...AIM:To support probe-based confocal laser endomi-croscopy (pCLE) diagnosis by designing software for the automated classification of colonic polyps. METHODS:Intravenous fluorescein pCLE imaging of colorectal lesions was performed on patients under-going screening and surveillance colonoscopies, followed by polypectomies. All resected specimens were reviewed by a reference gastrointestinal pathologist blinded to pCLE information. Histopathology was used as the criterion standard for the differentiation between neoplastic and non-neoplastic lesions. The pCLE video sequences, recorded for each polyp, were analyzed off-line by 2 expert endoscopists who were blinded to the endoscopic characteristics and histopathology. These pCLE videos, along with their histopathology diagnosis, were used to train the automated classification software which is a content-based image retrieval technique followed by k-nearest neighbor classification. The performance of the off-line diagnosis of pCLE videos established by the 2 expert endoscopists was compared with that of automated pCLE software classification. All evaluations were performed using leave-one-patient- out cross-validation to avoid bias. RESULTS:Colorectal lesions (135) were imaged in 71 patients. Based on histopathology, 93 of these 135 lesions were neoplastic and 42 were non-neoplastic. The study found no statistical significance for the difference between the performance of automated pCLE software classification (accuracy 89.6%, sensitivity 92.5%, specificity 83.3%, using leave-one-patient-out cross-validation) and the performance of the off-line diagnosis of pCLE videos established by the 2 expert endoscopists (accuracy 89.6%, sensitivity 91.4%, specificity 85.7%). There was very low power (< 6%) to detect the observed differences. The 95% confidence intervals for equivalence testing were:-0.073 to 0.073 for accuracy, -0.068 to 0.089 for sensitivity and -0.18 to 0.13 for specificity. The classification software proposed in this study is not a "black box" but an informa展开更多
The purpose of this paper is to study the convergence problem of the iteration scheme xn+l = λn+1y + (1 - λn+1)Tn+1xn for a family of infinitely many nonexpansive mappings T1, T2,... in a Hilbert space. It is...The purpose of this paper is to study the convergence problem of the iteration scheme xn+l = λn+1y + (1 - λn+1)Tn+1xn for a family of infinitely many nonexpansive mappings T1, T2,... in a Hilbert space. It is proved that under suitable conditions this iteration scheme converges strongly to the nearest common fixed point of this family of nonexpansive mappings. The results presented in this paper extend and improve some recent results.展开更多
In order to reveal the complex network characteristics and evolution principle of China aviation network,the probability distribution and evolution trace of arithmetic average of edge vertices nearest neighbor average...In order to reveal the complex network characteristics and evolution principle of China aviation network,the probability distribution and evolution trace of arithmetic average of edge vertices nearest neighbor average degree values of China aviation network were studied based on the statistics data of China civil aviation network in 1988,1994,2001,2008 and 2015.According to the theory and method of complex network,the network system was constructed with the city where the airport was located as the network node and the route between cities as the edge of the network.Based on the statistical data,the arithmetic averages of edge vertices nearest neighbor average degree values of China aviation network in 1988,1994,2001,2008 and 2015 were calculated.Using the probability statistical analysis method,it was found that the arithmetic average of edge vertices nearest neighbor average degree values had the probability distribution of normal function and the position parameters and scale parameters of the probability distribution had linear evolution trace.展开更多
Due to the large number of submodules(SMs),and modular multilevel converters(MMCs)in high-voltage applications,they are usually regulated by the nearest level modulation(NLM).Moreover,the large number of SMs causes a ...Due to the large number of submodules(SMs),and modular multilevel converters(MMCs)in high-voltage applications,they are usually regulated by the nearest level modulation(NLM).Moreover,the large number of SMs causes a challenge for the fault diagnosis strategy(FDS).This paper proposes a currentless FDS for MMC with NLM.In FDS,the voltage sensor is relocated to measure the output voltage of the SM.To acquire the capacitor voltage and avoid increasing extra sensors,a capacitor voltage calculation method is proposed.Based on the measurement of output voltages,the faults can be detected and the number of different-type switch open-circuit faults can be confirmed from the numerous SMs in an arm,which narrows the scope of fault localization.Then,the faulty SMs and faulty switches in these SMs are further located without arm current according to the sorting of capacitor voltages in the voltage balancing algorithm.The FDS is independent of the arm current,which can reduce the communication cost in the hierarchical control system of MMC.Furthermore,the proposed FDS not only simplifies the identification of switch open-circuit faults by confirming the scope of faults,but also detects and locates multiple different-type faults in an arm.The effectiveness of the proposed strategy is verified by the simulation results.展开更多
Coastal marshes are transitional areas between terrestrial and aquatic ecosystems.They are sensitive to climate change and anthropogenic activities.In recent decades,the reclamation of coastal marshes has greatly incr...Coastal marshes are transitional areas between terrestrial and aquatic ecosystems.They are sensitive to climate change and anthropogenic activities.In recent decades,the reclamation of coastal marshes has greatly increased,and its effects on microbial communities in coastal marshes have been studied with great interest.Most of these studies have explained the short-term spatiotemporal variation in soil microbial community dynamics.However,the impact of reclamation on the community composition and assembly processes of functional microbes(e.g.,ammonia-oxidizing prokaryotes)is often ignored.In this study,using quantitative polymerase chain reaction and the Ion S5™XL sequencing platform,we investigated the spatiotemporal dynamics,assembly processes,and diversity patterns of ammonia-oxidizing prokaryotes in 1000-year-old reclaimed coastal salt marshes.The taxonomic and phylogenetic diversity and composition of ammonia oxidizers showed apparent spatiotemporal variations with soil reclamation.Phylogenetic null modelling-based analysis showed that across all sites,the archaeal ammonia-oxidizing community was assembled by a deterministic process(84.71%),and deterministic processes were also dominant(55.2%)for ammonia-oxidizing bacterial communities except for communities at 60 years of reclamation.The assembly process and nitrification activity in reclaimed soils were positively correlated.The abundance of the amoA gene and changes in ammonia-oxidizing archaeal and bacterial diversities significantly affected the nitrification activity in reclaimed soils.These findings suggest that long-term coastal salt marsh reclamation affects nitrification by modulating the activities of ammonia-oxidizing microorganisms and regulating their community structures and assembly processes.These results provide a better understanding of the effects of long-term land reclamation on soil nitrogen-cycling microbial communities.展开更多
Intrusion detection aims to detect intrusion behavior and serves as a complement to firewalls.It can detect attack types of malicious network communications and computer usage that cannot be detected by idiomatic fire...Intrusion detection aims to detect intrusion behavior and serves as a complement to firewalls.It can detect attack types of malicious network communications and computer usage that cannot be detected by idiomatic firewalls.Many intrusion detection methods are processed through machine learning.Previous literature has shown that the performance of an intrusion detection method based on hybrid learning or integration approach is superior to that of single learning technology.However,almost no studies focus on how additional representative and concise features can be extracted to process effective intrusion detection among massive and complicated data.In this paper,a new hybrid learning method is proposed on the basis of features such as density,cluster centers,and nearest neighbors(DCNN).In this algorithm,data is represented by the local density of each sample point and the sum of distances from each sample point to cluster centers and to its nearest neighbor.k-NN classifier is adopted to classify the new feature vectors.Our experiment shows that DCNN,which combines K-means,clustering-based density,and k-NN classifier,is effective in intrusion detection.展开更多
Malware attacks on Windows machines pose significant cybersecurity threats,necessitating effective detection and prevention mechanisms.Supervised machine learning classifiers have emerged as promising tools for malwar...Malware attacks on Windows machines pose significant cybersecurity threats,necessitating effective detection and prevention mechanisms.Supervised machine learning classifiers have emerged as promising tools for malware detection.However,there remains a need for comprehensive studies that compare the performance of different classifiers specifically for Windows malware detection.Addressing this gap can provide valuable insights for enhancing cybersecurity strategies.While numerous studies have explored malware detection using machine learning techniques,there is a lack of systematic comparison of supervised classifiers for Windows malware detection.Understanding the relative effectiveness of these classifiers can inform the selection of optimal detection methods and improve overall security measures.This study aims to bridge the research gap by conducting a comparative analysis of supervised machine learning classifiers for detecting malware on Windows systems.The objectives include Investigating the performance of various classifiers,such as Gaussian Naïve Bayes,K Nearest Neighbors(KNN),Stochastic Gradient Descent Classifier(SGDC),and Decision Tree,in detecting Windows malware.Evaluating the accuracy,efficiency,and suitability of each classifier for real-world malware detection scenarios.Identifying the strengths and limitations of different classifiers to provide insights for cybersecurity practitioners and researchers.Offering recommendations for selecting the most effective classifier for Windows malware detection based on empirical evidence.The study employs a structured methodology consisting of several phases:exploratory data analysis,data preprocessing,model training,and evaluation.Exploratory data analysis involves understanding the dataset’s characteristics and identifying preprocessing requirements.Data preprocessing includes cleaning,feature encoding,dimensionality reduction,and optimization to prepare the data for training.Model training utilizes various supervised classifiers,and their performance展开更多
A common necessity for prior unsupervised domain adaptation methods that can improve the domain adaptation in unlabeled target domain dataset is access to source domain data-set and target domain dataset simultaneousl...A common necessity for prior unsupervised domain adaptation methods that can improve the domain adaptation in unlabeled target domain dataset is access to source domain data-set and target domain dataset simultaneously.However,data privacy makes it not always possible to access source domain dataset and target domain dataset in actual industrial equipment simulta-neously,especially for aviation component like Electro-Mechanical Actuator(EMA)whose dataset are often not shareable due to the data copyright and confidentiality.To address this problem,this paper proposes a source free unsupervised domain adaptation framework for EMA fault diagnosis.The proposed framework is a combination of feature network and classifier.Firstly,source domain datasets are only applied to train a source model.Secondly,the well-trained source model is trans-ferred to target domain and classifier is frozen based on source domain hypothesis.Thirdly,nearest centroid filtering is introduced to filter the reliable pseudo labels for unlabeled target domain data-set,and finally,supervised learning and pseudo label clustering are applied to fine-tune the trans-ferred model.In comparison with several traditional unsupervised domain adaptation methods,case studies based on low-and high-frequency monitoring signals on EMA indicate the effectiveness of the proposed method.展开更多
The studypresents theHalfMax InsertionHeuristic (HMIH) as a novel approach to solving theTravelling SalesmanProblem (TSP). The goal is to outperform existing techniques such as the Farthest Insertion Heuristic (FIH) a...The studypresents theHalfMax InsertionHeuristic (HMIH) as a novel approach to solving theTravelling SalesmanProblem (TSP). The goal is to outperform existing techniques such as the Farthest Insertion Heuristic (FIH) andNearest Neighbour Heuristic (NNH). The paper discusses the limitations of current construction tour heuristics,focusing particularly on the significant margin of error in FIH. It then proposes HMIH as an alternative thatminimizes the increase in tour distance and includes more nodes. HMIH improves tour quality by starting withan initial tour consisting of a ‘minimum’ polygon and iteratively adding nodes using our novel Half Max routine.The paper thoroughly examines and compares HMIH with FIH and NNH via rigorous testing on standard TSPbenchmarks. The results indicate that HMIH consistently delivers superior performance, particularly with respectto tour cost and computational efficiency. HMIH’s tours were sometimes 16% shorter than those generated by FIHand NNH, showcasing its potential and value as a novel benchmark for TSP solutions. The study used statisticalmethods, including Friedman’s Non-parametric Test, to validate the performance of HMIH over FIH and NNH.This guarantees that the identified advantages are statistically significant and consistent in various situations. Thiscomprehensive analysis emphasizes the reliability and efficiency of the heuristic, making a compelling case for itsuse in solving TSP issues. The research shows that, in general, HMIH fared better than FIH in all cases studied,except for a few instances (pr439, eil51, and eil101) where FIH either performed equally or slightly better thanHMIH. HMIH’s efficiency is shown by its improvements in error percentage (δ) and goodness values (g) comparedto FIH and NNH. In the att48 instance, HMIH had an error rate of 6.3%, whereas FIH had 14.6% and NNH had20.9%, indicating that HMIH was closer to the optimal solution. HMIH consistently showed superior performanceacross many benchmarks, with lower percentage error and highe展开更多
Standard plenoptic camera can be used to capture multi-dimensional radiation information of high temperature luminous flame to reconstruct the temperature distribution. In this study, a novel method for reconstructing...Standard plenoptic camera can be used to capture multi-dimensional radiation information of high temperature luminous flame to reconstruct the temperature distribution. In this study, a novel method for reconstructing three-dimensional temperature field is proposed. This method is based on the optical tomography combined with standard plenoptic camera. The flame projection information from different planes is contained in one radiation image. In this model, we introduced the effective concept of the nearest neighbor method in the frequency domain to strip the interference of redundant information in the projection and to realize three-dimensional deconvolution. The flame emission intensity received by the pixels on the charge-coupled device sensor can be obtained according to the optical tomographic model. The temperature distributions of the axisymmetric and nonaxisymmetric flames can be reconstructed by solving the mathematical model with the nearest neighbor method. The numerical results show that three-dimensional temperature fields of high temperature luminous flames can be retrieved, proving the validity of the proposed method.展开更多
基金supported by National Key Technology Research and Development Program of China (Grant Nos.2008BAC34B02 and 2008BAC3403)
文摘The object-oriented information extraction technique was used to improve classification accuracy,and addressed the problem that HJ-1 CCD remote sensing images have only four spectral bands with moderate spatial resolution.We used two key techniques:the selection of optimum image segmentation scale and the development of an appropriate object-oriented information extraction strategy.With the principle of minimizing merge cost of merging neighboring pixels/objects,we used spatial autocorrelation index Moran's I and the variance index to select the optimum segmentation scale.The Nearest Neighborhood(NN) classifier based on sampling and a knowledge-based fuzzy classifier were used in the object-oriented information extraction strategy.In this classification step,feature optimization was used to improve information extraction accuracy using reduced data dimension.These two techniques were applied to land cover information extraction for Shanghai city using a HJ-1 CCD image.Results indicate that the information extraction accuracy of the object-oriented method was much higher than that of the pixel-based method.
文摘The time dependent vehicle routing problem with time windows(TDVRPTW) is considered. A multi-type ant system(MTAS) algorithm hybridized with the ant colony system(ACS)and the max-min ant system(MMAS) algorithms is proposed. This combination absorbs the merits of the two algorithms in solutions construction and optimization separately. In order to improve the efficiency of the insertion procedure, a nearest neighbor selection(NNS) mechanism, an insertion local search procedure and a local optimization procedure are specified in detail. And in order to find a balance between good scouting performance and fast convergence rate, an adaptive pheromone updating strategy is proposed in the MTAS. Computational results confirm the MTAS algorithm's good performance with all these strategies on classic vehicle routing problem with time windows(VRPTW) benchmark instances and the TDVRPTW instances, and some better results especially for the number of vehicles and travel times of the best solutions are obtained in comparison with the previous research.
基金This project was supported by Shanghai Shu Guang Project.
文摘Support vector machine (SVM), as a novel approach in pattern recognition, has demonstrated a success in face detection and face recognition. In this paper, a face recognition approach based on the SVM classifier with the nearest neighbor classifier (NNC) is proposed. The principal component analysis (PCA) is used to reduce the dimension and extract features. Then one-against-all stratedy is used to train the SVM classifiers. At the testing stage, we propose an al-
文摘There is a growing concern in traffic accident rate in recent years. Using Mashhad city (Iran second populous city) traffic accident records as case study, this paper applied the combi- nation of geo-information technology and spatial-statistical analysis to bring out the influence of spatial factors in their formation. The aim of the study is to examine 4 clustering analyses to have a better understanding of traffic accidents patterns in complex urban network. In order to deploy the clustering technique in urban roads, 9331 point features for inner city traffic accidents during 12 months have been registered according to their x and y location in geographic information system (GIS). The mentioned areas were analyzed by kernel density estimation (KDE) using ARCMAP and two other analyses using SANET 4th edition software so that the results of network analysis can be compared with traditional KDE method. In addi- tion, this research introduces five classifications for determining the eventfulness of the under study area based on standard deviation and to make priority in creating security in the area. The nearest neighbor and K-function output analysis consist of four curves and regarding the fact that for all fatal, injury and property damage only crashes, the observed value curve is above the 5% confidence interval. Accidents in the study region are more clustered than ex- pected by random chance. The importance of this study is to use GIS as a management system for accident analysis by combination of spatial-statistical methods.
文摘It is well-known that in order to build a strong ensemble, the component learners should be with high diversity as well as high accuracy. If perturbing the training set can cause significant changes in the component learners constructed, then Bagging can effectively improve accuracy. However, for stable learners such as nearest neighbor classifiers, perturbing the training set can hardly produce diverse component learners, therefore Bagging does not work well. This paper adapts Bagging to nearest neighbor classifiers through injecting randomness to distance metrics. In constructing the component learners, both the training set and the distance metric employed for identifying the neighbors are perturbed. A large scale empirical study reported in this paper shows that the proposed BagInRand algorithm can effectively improve the accuracy of nearest neighbor classifiers.
基金Projects(90820302, 60805027) supported by the National Natural Science Foundation of ChinaProject(200805330005) supported by Research Fund for Doctoral Program of Higher Education, ChinaProject(2009FJ4030) supported by Academician Foundation of Hunan Province, China
文摘A novel traffic sign recognition system is presented in this work. Firstly, the color segmentation and shape classifier based on signature feature of region are used to detect traffic signs in input video sequences. Secondly, traffic sign color-image is preprocessed with gray scaling, and normalized to 64×64 size. Then, image features could be obtained by four levels DT-CWT images. Thirdly, 2DICA and nearest neighbor classifier are united to recognize traffic signs. The whole recognition algorithm is implemented for classification of 50 categories of traffic signs and its recognition accuracy reaches 90%. Comparing image representation DT-CWT with the well-established image representation like template, Gabor, and 2DICA with feature selection techniques such as PCA, LPP, 2DPCA at the same time, the results show that combination method of DT-CWT and 2DICA is useful in traffic signs recognition. Experimental results indicate that the proposed algorithm is robust, effective and accurate.
基金partially supported by the Transregional Collaborative Research Centre SFB/TRR 62 Companion-Technology for Cognitive Technical Systems funded by the German Research Foundation(DFG)supported by a scholarship of the German Academic Exchange Service(DAAD)
文摘Many data mining applications have a large amount of data but labeling data is usually difficult, expensive, or time consuming, as it requires human experts for annotation. Semi-supervised learning addresses this problem by using unlabeled data together with labeled data in the training process. Co-Training is a popular semi-supervised learning algorithm that has the assumptions that each example is represented by multiple sets of features (views) and these views are sufficient for learning and independent given the class. However, these assumptions axe strong and are not satisfied in many real-world domains. In this paper, a single-view variant of Co-Training, called Co-Training by Committee (CoBC) is proposed, in which an ensemble of diverse classifiers is used instead of redundant and independent views. We introduce a new labeling confidence measure for unlabeled examples based on estimating the local accuracy of the committee members on its neighborhood. Then we introduce two new learning algorithms, QBC-then-CoBC and QBC-with-CoBC, which combine the merits of committee-based semi-supervised learning and active learning. The random subspace method is applied on both C4.5 decision trees and 1-nearest neighbor classifiers to construct the diverse ensembles used for semi-supervised learning and active learning. Experiments show that these two combinations can outperform other non committee-based ones.
基金the Natural Science Foundation of China (No. 61802404, 61602470)the Strategic Priority Research Program (C) of the Chinese Academy of Sciences (No. XDC02040100)+3 种基金the Fundamental Research Funds for the Central Universities of the China University of Labor Relations (No. 20ZYJS017, 20XYJS003)the Key Research Program of the Beijing Municipal Science & Technology Commission (No. D181100000618003)partially the Key Laboratory of Network Assessment Technology,the Chinese Academy of Sciencesthe Beijing Key Laboratory of Network Security and Protection Technology
文摘Attacks such as APT usually hide communication data in massive legitimate network traffic, and mining structurally complex and latent relationships among flow-based network traffic to detect attacks has become the focus of many initiatives. Effectively analyzing massive network security data with high dimensions for suspicious flow diagnosis is a huge challenge. In addition, the uneven distribution of network traffic does not fully reflect the differences of class sample features, resulting in the low accuracy of attack detection. To solve these problems, a novel approach called the fuzzy entropy weighted natural nearest neighbor(FEW-NNN) method is proposed to enhance the accuracy and efficiency of flowbased network traffic attack detection. First, the FEW-NNN method uses the Fisher score and deep graph feature learning algorithm to remove unimportant features and reduce the data dimension. Then, according to the proposed natural nearest neighbor searching algorithm(NNN_Searching), the density of data points, each class center and the smallest enclosing sphere radius are determined correspondingly. Finally, a fuzzy entropy weighted KNN classification method based on affinity is proposed, which mainly includes the following three steps: 1、 the feature weights of samples are calculated based on fuzzy entropy values, 2、 the fuzzy memberships of samples are determined based on affinity among samples, and 3、 K-neighbors are selected according to the class-conditional weighted Euclidean distance, the fuzzy membership value of the testing sample is calculated based on the membership of k-neighbors, and then all testing samples are classified according to the fuzzy membership value of the samples belonging to each class;that is, the attack type is determined. The method has been applied to the problem of attack detection and validated based on the famous KDD99 and CICIDS-2017 datasets. From the experimental results shown in this paper, it is observed that the FEW-NNN method improves the accuracy and efficiency of f
文摘AIM:To support probe-based confocal laser endomi-croscopy (pCLE) diagnosis by designing software for the automated classification of colonic polyps. METHODS:Intravenous fluorescein pCLE imaging of colorectal lesions was performed on patients under-going screening and surveillance colonoscopies, followed by polypectomies. All resected specimens were reviewed by a reference gastrointestinal pathologist blinded to pCLE information. Histopathology was used as the criterion standard for the differentiation between neoplastic and non-neoplastic lesions. The pCLE video sequences, recorded for each polyp, were analyzed off-line by 2 expert endoscopists who were blinded to the endoscopic characteristics and histopathology. These pCLE videos, along with their histopathology diagnosis, were used to train the automated classification software which is a content-based image retrieval technique followed by k-nearest neighbor classification. The performance of the off-line diagnosis of pCLE videos established by the 2 expert endoscopists was compared with that of automated pCLE software classification. All evaluations were performed using leave-one-patient- out cross-validation to avoid bias. RESULTS:Colorectal lesions (135) were imaged in 71 patients. Based on histopathology, 93 of these 135 lesions were neoplastic and 42 were non-neoplastic. The study found no statistical significance for the difference between the performance of automated pCLE software classification (accuracy 89.6%, sensitivity 92.5%, specificity 83.3%, using leave-one-patient-out cross-validation) and the performance of the off-line diagnosis of pCLE videos established by the 2 expert endoscopists (accuracy 89.6%, sensitivity 91.4%, specificity 85.7%). There was very low power (< 6%) to detect the observed differences. The 95% confidence intervals for equivalence testing were:-0.073 to 0.073 for accuracy, -0.068 to 0.089 for sensitivity and -0.18 to 0.13 for specificity. The classification software proposed in this study is not a "black box" but an informa
基金Supported by The Research Foundation Grant of The Hong Kong Polytechnic University and Yibin University(2005Z3)
文摘The purpose of this paper is to study the convergence problem of the iteration scheme xn+l = λn+1y + (1 - λn+1)Tn+1xn for a family of infinitely many nonexpansive mappings T1, T2,... in a Hilbert space. It is proved that under suitable conditions this iteration scheme converges strongly to the nearest common fixed point of this family of nonexpansive mappings. The results presented in this paper extend and improve some recent results.
文摘In order to reveal the complex network characteristics and evolution principle of China aviation network,the probability distribution and evolution trace of arithmetic average of edge vertices nearest neighbor average degree values of China aviation network were studied based on the statistics data of China civil aviation network in 1988,1994,2001,2008 and 2015.According to the theory and method of complex network,the network system was constructed with the city where the airport was located as the network node and the route between cities as the edge of the network.Based on the statistical data,the arithmetic averages of edge vertices nearest neighbor average degree values of China aviation network in 1988,1994,2001,2008 and 2015 were calculated.Using the probability statistical analysis method,it was found that the arithmetic average of edge vertices nearest neighbor average degree values had the probability distribution of normal function and the position parameters and scale parameters of the probability distribution had linear evolution trace.
基金supported by the State Key Laboratory of Advanced Power Transmission Technology(GEIRI-SKL-2020-011)。
文摘Due to the large number of submodules(SMs),and modular multilevel converters(MMCs)in high-voltage applications,they are usually regulated by the nearest level modulation(NLM).Moreover,the large number of SMs causes a challenge for the fault diagnosis strategy(FDS).This paper proposes a currentless FDS for MMC with NLM.In FDS,the voltage sensor is relocated to measure the output voltage of the SM.To acquire the capacitor voltage and avoid increasing extra sensors,a capacitor voltage calculation method is proposed.Based on the measurement of output voltages,the faults can be detected and the number of different-type switch open-circuit faults can be confirmed from the numerous SMs in an arm,which narrows the scope of fault localization.Then,the faulty SMs and faulty switches in these SMs are further located without arm current according to the sorting of capacitor voltages in the voltage balancing algorithm.The FDS is independent of the arm current,which can reduce the communication cost in the hierarchical control system of MMC.Furthermore,the proposed FDS not only simplifies the identification of switch open-circuit faults by confirming the scope of faults,but also detects and locates multiple different-type faults in an arm.The effectiveness of the proposed strategy is verified by the simulation results.
基金financially supported by the Ningbo Science and Technology Bureau,China (Nos.2021S018 and 2022Z169)the National Natural Science Foundation of China (No.42077026)
文摘Coastal marshes are transitional areas between terrestrial and aquatic ecosystems.They are sensitive to climate change and anthropogenic activities.In recent decades,the reclamation of coastal marshes has greatly increased,and its effects on microbial communities in coastal marshes have been studied with great interest.Most of these studies have explained the short-term spatiotemporal variation in soil microbial community dynamics.However,the impact of reclamation on the community composition and assembly processes of functional microbes(e.g.,ammonia-oxidizing prokaryotes)is often ignored.In this study,using quantitative polymerase chain reaction and the Ion S5™XL sequencing platform,we investigated the spatiotemporal dynamics,assembly processes,and diversity patterns of ammonia-oxidizing prokaryotes in 1000-year-old reclaimed coastal salt marshes.The taxonomic and phylogenetic diversity and composition of ammonia oxidizers showed apparent spatiotemporal variations with soil reclamation.Phylogenetic null modelling-based analysis showed that across all sites,the archaeal ammonia-oxidizing community was assembled by a deterministic process(84.71%),and deterministic processes were also dominant(55.2%)for ammonia-oxidizing bacterial communities except for communities at 60 years of reclamation.The assembly process and nitrification activity in reclaimed soils were positively correlated.The abundance of the amoA gene and changes in ammonia-oxidizing archaeal and bacterial diversities significantly affected the nitrification activity in reclaimed soils.These findings suggest that long-term coastal salt marsh reclamation affects nitrification by modulating the activities of ammonia-oxidizing microorganisms and regulating their community structures and assembly processes.These results provide a better understanding of the effects of long-term land reclamation on soil nitrogen-cycling microbial communities.
文摘Intrusion detection aims to detect intrusion behavior and serves as a complement to firewalls.It can detect attack types of malicious network communications and computer usage that cannot be detected by idiomatic firewalls.Many intrusion detection methods are processed through machine learning.Previous literature has shown that the performance of an intrusion detection method based on hybrid learning or integration approach is superior to that of single learning technology.However,almost no studies focus on how additional representative and concise features can be extracted to process effective intrusion detection among massive and complicated data.In this paper,a new hybrid learning method is proposed on the basis of features such as density,cluster centers,and nearest neighbors(DCNN).In this algorithm,data is represented by the local density of each sample point and the sum of distances from each sample point to cluster centers and to its nearest neighbor.k-NN classifier is adopted to classify the new feature vectors.Our experiment shows that DCNN,which combines K-means,clustering-based density,and k-NN classifier,is effective in intrusion detection.
基金This researchwork is supported by Princess Nourah bint Abdulrahman University Researchers Supporting Project Number(PNURSP2024R411),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Malware attacks on Windows machines pose significant cybersecurity threats,necessitating effective detection and prevention mechanisms.Supervised machine learning classifiers have emerged as promising tools for malware detection.However,there remains a need for comprehensive studies that compare the performance of different classifiers specifically for Windows malware detection.Addressing this gap can provide valuable insights for enhancing cybersecurity strategies.While numerous studies have explored malware detection using machine learning techniques,there is a lack of systematic comparison of supervised classifiers for Windows malware detection.Understanding the relative effectiveness of these classifiers can inform the selection of optimal detection methods and improve overall security measures.This study aims to bridge the research gap by conducting a comparative analysis of supervised machine learning classifiers for detecting malware on Windows systems.The objectives include Investigating the performance of various classifiers,such as Gaussian Naïve Bayes,K Nearest Neighbors(KNN),Stochastic Gradient Descent Classifier(SGDC),and Decision Tree,in detecting Windows malware.Evaluating the accuracy,efficiency,and suitability of each classifier for real-world malware detection scenarios.Identifying the strengths and limitations of different classifiers to provide insights for cybersecurity practitioners and researchers.Offering recommendations for selecting the most effective classifier for Windows malware detection based on empirical evidence.The study employs a structured methodology consisting of several phases:exploratory data analysis,data preprocessing,model training,and evaluation.Exploratory data analysis involves understanding the dataset’s characteristics and identifying preprocessing requirements.Data preprocessing includes cleaning,feature encoding,dimensionality reduction,and optimization to prepare the data for training.Model training utilizes various supervised classifiers,and their performance
基金supported by the National Natural Science Foundation of China(No.52075349)the Aeronautical Science Foundation of China(No.201905019001)the China Scholarship Council(No.202106240078).
文摘A common necessity for prior unsupervised domain adaptation methods that can improve the domain adaptation in unlabeled target domain dataset is access to source domain data-set and target domain dataset simultaneously.However,data privacy makes it not always possible to access source domain dataset and target domain dataset in actual industrial equipment simulta-neously,especially for aviation component like Electro-Mechanical Actuator(EMA)whose dataset are often not shareable due to the data copyright and confidentiality.To address this problem,this paper proposes a source free unsupervised domain adaptation framework for EMA fault diagnosis.The proposed framework is a combination of feature network and classifier.Firstly,source domain datasets are only applied to train a source model.Secondly,the well-trained source model is trans-ferred to target domain and classifier is frozen based on source domain hypothesis.Thirdly,nearest centroid filtering is introduced to filter the reliable pseudo labels for unlabeled target domain data-set,and finally,supervised learning and pseudo label clustering are applied to fine-tune the trans-ferred model.In comparison with several traditional unsupervised domain adaptation methods,case studies based on low-and high-frequency monitoring signals on EMA indicate the effectiveness of the proposed method.
基金the Centre of Excellence in Mobile and e-Services,the University of Zululand,Kwadlangezwa,South Africa.
文摘The studypresents theHalfMax InsertionHeuristic (HMIH) as a novel approach to solving theTravelling SalesmanProblem (TSP). The goal is to outperform existing techniques such as the Farthest Insertion Heuristic (FIH) andNearest Neighbour Heuristic (NNH). The paper discusses the limitations of current construction tour heuristics,focusing particularly on the significant margin of error in FIH. It then proposes HMIH as an alternative thatminimizes the increase in tour distance and includes more nodes. HMIH improves tour quality by starting withan initial tour consisting of a ‘minimum’ polygon and iteratively adding nodes using our novel Half Max routine.The paper thoroughly examines and compares HMIH with FIH and NNH via rigorous testing on standard TSPbenchmarks. The results indicate that HMIH consistently delivers superior performance, particularly with respectto tour cost and computational efficiency. HMIH’s tours were sometimes 16% shorter than those generated by FIHand NNH, showcasing its potential and value as a novel benchmark for TSP solutions. The study used statisticalmethods, including Friedman’s Non-parametric Test, to validate the performance of HMIH over FIH and NNH.This guarantees that the identified advantages are statistically significant and consistent in various situations. Thiscomprehensive analysis emphasizes the reliability and efficiency of the heuristic, making a compelling case for itsuse in solving TSP issues. The research shows that, in general, HMIH fared better than FIH in all cases studied,except for a few instances (pr439, eil51, and eil101) where FIH either performed equally or slightly better thanHMIH. HMIH’s efficiency is shown by its improvements in error percentage (δ) and goodness values (g) comparedto FIH and NNH. In the att48 instance, HMIH had an error rate of 6.3%, whereas FIH had 14.6% and NNH had20.9%, indicating that HMIH was closer to the optimal solution. HMIH consistently showed superior performanceacross many benchmarks, with lower percentage error and highe
基金supported by the National Natural Science Foundation of China (Grant No. 51976044)the National Science and Technology Major Project (Grant No. 2017-V-0016-0069)the Foundation of Heilongjiang Touyan Innovation Team Program。
文摘Standard plenoptic camera can be used to capture multi-dimensional radiation information of high temperature luminous flame to reconstruct the temperature distribution. In this study, a novel method for reconstructing three-dimensional temperature field is proposed. This method is based on the optical tomography combined with standard plenoptic camera. The flame projection information from different planes is contained in one radiation image. In this model, we introduced the effective concept of the nearest neighbor method in the frequency domain to strip the interference of redundant information in the projection and to realize three-dimensional deconvolution. The flame emission intensity received by the pixels on the charge-coupled device sensor can be obtained according to the optical tomographic model. The temperature distributions of the axisymmetric and nonaxisymmetric flames can be reconstructed by solving the mathematical model with the nearest neighbor method. The numerical results show that three-dimensional temperature fields of high temperature luminous flames can be retrieved, proving the validity of the proposed method.