深度嵌入聚类(deep embedding clustering,DEC)算法只通过自编码器,以单一实例重构的方式将数据嵌入到低维矢量化特征空间中进行聚类,而忽略了不同实例之间的关系,导致可能无法很好地区分嵌入空间中的实例。针对上述问题,提出基于对比...深度嵌入聚类(deep embedding clustering,DEC)算法只通过自编码器,以单一实例重构的方式将数据嵌入到低维矢量化特征空间中进行聚类,而忽略了不同实例之间的关系,导致可能无法很好地区分嵌入空间中的实例。针对上述问题,提出基于对比学习的矢量化特征空间嵌入聚类(vectorized feature space embedded clustering based on contrastive learning,VECCL)方法。通过对比学习以辨识数据实例之间异同性的方式,从数据中提取出具有同近异远聚类语义的特征,并作为先验知识带入DEC中,引导自编码器初始化带有深层数据信息的低维聚类特征空间。同时利用软分类标签构造熵损失,与自编码器的重构损失一起作为正则化项引入聚类损失函数中,共同细化聚类。实验结果表明,所提方法提取特征的能力更强,与DEC方法在数据集CIFAR10、CIFAR100和STL10上的实验结果相比,ACC分别提升48.1个百分点、23.1个百分点和41.8个百分点,NMI分别提升41.0个百分点、25.2个百分点和39.0个百分点,ARI分别提升45.4个百分点、16.4个百分点和41.8个百分点。展开更多
随着互联网和面向服务技术的发展,一种新型的Web应用——Mashup服务,开始在互联网上流行并快速增长.如何在众多Mashup服务中找到高质量的服务,已经成为一个大家关注的热点问题.寻找功能相似的服务并进行聚类,能有效提升服务发现的精度...随着互联网和面向服务技术的发展,一种新型的Web应用——Mashup服务,开始在互联网上流行并快速增长.如何在众多Mashup服务中找到高质量的服务,已经成为一个大家关注的热点问题.寻找功能相似的服务并进行聚类,能有效提升服务发现的精度与效率.目前国内外主流方法为挖掘Mashup服务中隐含的功能信息,进一步采用特定聚类算法如K-means等进行聚类.然而Mashup服务文档通常为短文本,基于传统的挖掘算法如LDA无法有效处理短文本,导致聚类效果并不理想.针对这一问题,提出一种基于非负矩阵分解的TWE-NMF(nonnegative matrix factorization combining tags and word embedding)模型对Mashup服务进行主题建模.所提方法首先对Mashup服务规范化处理,其次采用一种基于改进的Gibbs采样的狄利克雷过程混合模型,自动估算主题的数量,随后将词嵌入和服务标签等信息与非负矩阵分解相结合,求解Mashup服务主题特征,并通过谱聚类算法将服务聚类.最后,对所提方法的性能进行了综合评价,实验结果表明,与现有的服务聚类方法相比,所提方法在准确率、召回率、F-measure、纯度和熵等评价指标方面都有显著提高.展开更多
Weather is a key factor affecting the control of air traffic.Accurate recognition and classification of similar weather scenes in the terminal area is helpful for rapid decision-making in air trafficflow management.Curren...Weather is a key factor affecting the control of air traffic.Accurate recognition and classification of similar weather scenes in the terminal area is helpful for rapid decision-making in air trafficflow management.Current researches mostly use traditional machine learning methods to extract features of weather scenes,and clustering algorithms to divide similar scenes.Inspired by the excellent performance of deep learning in image recognition,this paper proposes a terminal area similar weather scene classification method based on improved deep convolution embedded clustering(IDCEC),which uses the com-bination of the encoding layer and the decoding layer to reduce the dimensionality of the weather image,retaining useful information to the greatest extent,and then uses the combination of the pre-trained encoding layer and the clustering layer to train the clustering model of the similar scenes in the terminal area.Finally,term-inal area of Guangzhou Airport is selected as the research object,the method pro-posed in this article is used to classify historical weather data in similar scenes,and the performance is compared with other state-of-the-art methods.The experi-mental results show that the proposed IDCEC method can identify similar scenes more accurately based on the spatial distribution characteristics and severity of weather;at the same time,compared with the actualflight volume in the Guangz-hou terminal area,IDCEC's recognition results of similar weather scenes are con-sistent with the recognition of experts in thefield.展开更多
Purpose-When a large number of project proposals are evaluated to alocate available funds,grouping them based on their simiarites is benefciaL.Current approaches to group proposals are primarily based on manual matchi...Purpose-When a large number of project proposals are evaluated to alocate available funds,grouping them based on their simiarites is benefciaL.Current approaches to group proposals are primarily based on manual matching of similar topics,discipline areas and keywordls declared by project applicants.When the number of proposals increases,this task becomes complex and requires excessive time.This paper aims to demonstrate how to ffctively use the rich information in the titles and abstracts of Turkish project propsals to group them atmaially.Design/methodology/approach-This study proposes a model that effectively groups Turkish project proposals by combining word embedding,clustering and classification technigues.The proposed model uses FastText,BERT and term frequency/inverse document frequency(TF/IDF)word-embedding techniques to extract terms from the titles and abstracts of project proposals in Turkish.The extracted terms were grouped using both the clustering and classification techniques.Natural groups contained within the corpus were discovered using k-means,k-means++,k-medoids and agglomerative clustering algorithms,Additionally,this study employs classification approaches to predict the target class for each document in the corpus.To classify project proposals,var ious classifiers,including k nearest neighbors(KNN),support vector machines(SVM),artificial neural networks(ANN),cassftcation and regression trees(CART)and random forest(RF),are used.Empirical experiments were conducted to validate the effectiveness of the proposed method by using real data from the Istanbul Development Agency.Findings-The results show that the generated word embeddings an fftvely represent proposal texts as vectors,and can be used as inputs for dustering or casificatiomn algorithms.Using clustering algorithms,the document corpus is divided into five groups.In adition,the results demonstrate that the proposals can easily be categoried into predefmned categories using cassifiation algorithms.SVM-Linear achieved the highest predicti展开更多
At present,the proportion of new energy in the power grid is increasing,and the random fluctuations in power output increase the risk of cascading failures in the power grid.In this paper,we propose a method for ident...At present,the proportion of new energy in the power grid is increasing,and the random fluctuations in power output increase the risk of cascading failures in the power grid.In this paper,we propose a method for identifying high-risk scenarios of interlocking faults in new energy power grids based on a deep embedding clustering(DEC)algorithm and apply it in a risk assessment of cascading failures in different operating scenarios for new energy power grids.First,considering the real-time operation status and system structure of new energy power grids,the scenario cascading failure risk indicator is established.Based on this indicator,the risk of cascading failure is calculated for the scenario set,the scenarios are clustered based on the DEC algorithm,and the scenarios with the highest indicators are selected as the significant risk scenario set.The results of simulations with an example power grid show that our method can effectively identify scenarios with a high risk of cascading failures from a large number of scenarios.展开更多
With the improvement of current online communication schemes,it is now possible to successfully distribute and transport secured digital Content via the communication channel at a faster transmission rate.Traditional ...With the improvement of current online communication schemes,it is now possible to successfully distribute and transport secured digital Content via the communication channel at a faster transmission rate.Traditional steganography and cryptography concepts are used to achieve the goal of concealing secret Content on a media and encrypting it before transmission.Both of the techniques mentioned above aid in the confidentiality of feature content.The proposed approach concerns secret content embodiment in selected pixels on digital image layers such as Red,Green,and Blue.The private Content originated from a medical client and was forwarded to a medical practitioner on the server end through the internet.The K-Means clustering principle uses the contouring approach to frame the pixel clusters on the image layers.The content embodiment procedure is performed on the selected pixel groups of all layers of the image using the Least Significant Bit(LSB)substitution technique to build the secret Content embedded image known as the stego image,which is subsequently transmitted across the internet medium to the server end.The experimental results are computed using the inputs from“Open-Access Medical Image Repositories(aylward.org)”and demonstrate the scheme’s impudence as the Content concealing procedure progresses.展开更多
文摘深度嵌入聚类(deep embedding clustering,DEC)算法只通过自编码器,以单一实例重构的方式将数据嵌入到低维矢量化特征空间中进行聚类,而忽略了不同实例之间的关系,导致可能无法很好地区分嵌入空间中的实例。针对上述问题,提出基于对比学习的矢量化特征空间嵌入聚类(vectorized feature space embedded clustering based on contrastive learning,VECCL)方法。通过对比学习以辨识数据实例之间异同性的方式,从数据中提取出具有同近异远聚类语义的特征,并作为先验知识带入DEC中,引导自编码器初始化带有深层数据信息的低维聚类特征空间。同时利用软分类标签构造熵损失,与自编码器的重构损失一起作为正则化项引入聚类损失函数中,共同细化聚类。实验结果表明,所提方法提取特征的能力更强,与DEC方法在数据集CIFAR10、CIFAR100和STL10上的实验结果相比,ACC分别提升48.1个百分点、23.1个百分点和41.8个百分点,NMI分别提升41.0个百分点、25.2个百分点和39.0个百分点,ARI分别提升45.4个百分点、16.4个百分点和41.8个百分点。
文摘随着互联网和面向服务技术的发展,一种新型的Web应用——Mashup服务,开始在互联网上流行并快速增长.如何在众多Mashup服务中找到高质量的服务,已经成为一个大家关注的热点问题.寻找功能相似的服务并进行聚类,能有效提升服务发现的精度与效率.目前国内外主流方法为挖掘Mashup服务中隐含的功能信息,进一步采用特定聚类算法如K-means等进行聚类.然而Mashup服务文档通常为短文本,基于传统的挖掘算法如LDA无法有效处理短文本,导致聚类效果并不理想.针对这一问题,提出一种基于非负矩阵分解的TWE-NMF(nonnegative matrix factorization combining tags and word embedding)模型对Mashup服务进行主题建模.所提方法首先对Mashup服务规范化处理,其次采用一种基于改进的Gibbs采样的狄利克雷过程混合模型,自动估算主题的数量,随后将词嵌入和服务标签等信息与非负矩阵分解相结合,求解Mashup服务主题特征,并通过谱聚类算法将服务聚类.最后,对所提方法的性能进行了综合评价,实验结果表明,与现有的服务聚类方法相比,所提方法在准确率、召回率、F-measure、纯度和熵等评价指标方面都有显著提高.
基金supported by the Fundamental Research Funds for the CentralUniversities under Grant NS2020045. Y.L.G received the grant.
文摘Weather is a key factor affecting the control of air traffic.Accurate recognition and classification of similar weather scenes in the terminal area is helpful for rapid decision-making in air trafficflow management.Current researches mostly use traditional machine learning methods to extract features of weather scenes,and clustering algorithms to divide similar scenes.Inspired by the excellent performance of deep learning in image recognition,this paper proposes a terminal area similar weather scene classification method based on improved deep convolution embedded clustering(IDCEC),which uses the com-bination of the encoding layer and the decoding layer to reduce the dimensionality of the weather image,retaining useful information to the greatest extent,and then uses the combination of the pre-trained encoding layer and the clustering layer to train the clustering model of the similar scenes in the terminal area.Finally,term-inal area of Guangzhou Airport is selected as the research object,the method pro-posed in this article is used to classify historical weather data in similar scenes,and the performance is compared with other state-of-the-art methods.The experi-mental results show that the proposed IDCEC method can identify similar scenes more accurately based on the spatial distribution characteristics and severity of weather;at the same time,compared with the actualflight volume in the Guangz-hou terminal area,IDCEC's recognition results of similar weather scenes are con-sistent with the recognition of experts in thefield.
文摘Purpose-When a large number of project proposals are evaluated to alocate available funds,grouping them based on their simiarites is benefciaL.Current approaches to group proposals are primarily based on manual matching of similar topics,discipline areas and keywordls declared by project applicants.When the number of proposals increases,this task becomes complex and requires excessive time.This paper aims to demonstrate how to ffctively use the rich information in the titles and abstracts of Turkish project propsals to group them atmaially.Design/methodology/approach-This study proposes a model that effectively groups Turkish project proposals by combining word embedding,clustering and classification technigues.The proposed model uses FastText,BERT and term frequency/inverse document frequency(TF/IDF)word-embedding techniques to extract terms from the titles and abstracts of project proposals in Turkish.The extracted terms were grouped using both the clustering and classification techniques.Natural groups contained within the corpus were discovered using k-means,k-means++,k-medoids and agglomerative clustering algorithms,Additionally,this study employs classification approaches to predict the target class for each document in the corpus.To classify project proposals,var ious classifiers,including k nearest neighbors(KNN),support vector machines(SVM),artificial neural networks(ANN),cassftcation and regression trees(CART)and random forest(RF),are used.Empirical experiments were conducted to validate the effectiveness of the proposed method by using real data from the Istanbul Development Agency.Findings-The results show that the generated word embeddings an fftvely represent proposal texts as vectors,and can be used as inputs for dustering or casificatiomn algorithms.Using clustering algorithms,the document corpus is divided into five groups.In adition,the results demonstrate that the proposals can easily be categoried into predefmned categories using cassifiation algorithms.SVM-Linear achieved the highest predicti
基金funded by the State Grid Limited Science and Technology Project of China,Grant Number SGSXDK00DJJS2200144.
文摘At present,the proportion of new energy in the power grid is increasing,and the random fluctuations in power output increase the risk of cascading failures in the power grid.In this paper,we propose a method for identifying high-risk scenarios of interlocking faults in new energy power grids based on a deep embedding clustering(DEC)algorithm and apply it in a risk assessment of cascading failures in different operating scenarios for new energy power grids.First,considering the real-time operation status and system structure of new energy power grids,the scenario cascading failure risk indicator is established.Based on this indicator,the risk of cascading failure is calculated for the scenario set,the scenarios are clustered based on the DEC algorithm,and the scenarios with the highest indicators are selected as the significant risk scenario set.The results of simulations with an example power grid show that our method can effectively identify scenarios with a high risk of cascading failures from a large number of scenarios.
文摘With the improvement of current online communication schemes,it is now possible to successfully distribute and transport secured digital Content via the communication channel at a faster transmission rate.Traditional steganography and cryptography concepts are used to achieve the goal of concealing secret Content on a media and encrypting it before transmission.Both of the techniques mentioned above aid in the confidentiality of feature content.The proposed approach concerns secret content embodiment in selected pixels on digital image layers such as Red,Green,and Blue.The private Content originated from a medical client and was forwarded to a medical practitioner on the server end through the internet.The K-Means clustering principle uses the contouring approach to frame the pixel clusters on the image layers.The content embodiment procedure is performed on the selected pixel groups of all layers of the image using the Least Significant Bit(LSB)substitution technique to build the secret Content embedded image known as the stego image,which is subsequently transmitted across the internet medium to the server end.The experimental results are computed using the inputs from“Open-Access Medical Image Repositories(aylward.org)”and demonstrate the scheme’s impudence as the Content concealing procedure progresses.