In the tobacco industry,insider employee attack is a thorny problem that is difficult to detect.To solve this issue,this paper proposes an insider threat detection method based on heterogeneous graph embedding.First,t...In the tobacco industry,insider employee attack is a thorny problem that is difficult to detect.To solve this issue,this paper proposes an insider threat detection method based on heterogeneous graph embedding.First,the interrelationships between logs are fully considered,and log entries are converted into heterogeneous graphs based on these relationships.Second,the heterogeneous graph embedding is adopted and each log entry is represented as a low-dimensional feature vector.Then,normal logs and malicious logs are classified into different clusters by clustering algorithm to identify malicious logs.Finally,the effectiveness and superiority of the method is verified through experiments on the CERT dataset.The experimental results show that this method has better performance compared to some baseline methods.展开更多
Event detection(ED)seeks to recognize event triggers and classify them into the predefined event types.Chinese ED is formulated as a character-level task owing to the uncertain word boundaries.Prior methods try to inc...Event detection(ED)seeks to recognize event triggers and classify them into the predefined event types.Chinese ED is formulated as a character-level task owing to the uncertain word boundaries.Prior methods try to incorpo-rate word-level information into characters to enhance their semantics.However,they experience two problems.First,they fail to incorporate word-level information into each character the word encompasses,causing the insufficient word-charac-ter interaction problem.Second,they struggle to distinguish events of similar types with limited annotated instances,which is called the event confusing problem.This paper proposes a novel model named Label-Aware Heterogeneous Graph Attention Network(L-HGAT)to address these two problems.Specifically,we first build a heterogeneous graph of two node types and three edge types to maximally preserve word-character interactions,and then deploy a heterogeneous graph attention network to enhance the semantic propagation between characters and words.Furthermore,we design a pushing-away game to enlarge the predicting gap between the ground-truth event type and its confusing counterpart for each character.Experimental results show that our L-HGAT model consistently achieves superior performance over prior competitive methods.展开更多
Real-world complex networks are inherently heterogeneous;they have different types of nodes,attributes,and relationships.In recent years,various methods have been proposed to automatically learn how to encode the stru...Real-world complex networks are inherently heterogeneous;they have different types of nodes,attributes,and relationships.In recent years,various methods have been proposed to automatically learn how to encode the structural and semantic information contained in heterogeneous information networks(HINs)into low-dimensional embeddings;this task is called heterogeneous network embedding(HNE).Efficient HNE techniques can benefit various HIN-based machine learning tasks such as node classification,recommender systems,and information retrieval.Here,we provide a comprehensive survey of key advancements in the area of HNE.First,we define an encoder-decoder-based HNE model taxonomy.Then,we systematically overview,compare,and summarize various state-of-the-art HNE models and analyze the advantages and disadvantages of various model categories to identify more potentially competitive HNE frameworks.We also summarize the application fields,benchmark datasets,open source tools,andperformance evaluation in theHNEarea.Finally,wediscuss open issues and suggest promising future directions.We anticipate that this survey will provide deep insights into research in the field of HNE.展开更多
Heterogeneous information network (HIN)-structured data provide an effective model for practical purposes in real world. Network embedding is fundamental for supporting the network-based analysis and prediction tasks....Heterogeneous information network (HIN)-structured data provide an effective model for practical purposes in real world. Network embedding is fundamental for supporting the network-based analysis and prediction tasks. Methods of network embedding that are currently popular normally fail to effectively preserve the semantics of HIN. In this study, we propose AGA2Vec, a generative adversarial model for HIN embedding that uses attention mechanisms and meta-paths. To capture the semantic information from multi-typed entities and relations in HIN, we develop a weighted meta-path strategy to preserve the proximity of HIN. We then use an autoencoder and a generative adversarial model to obtain robust representations of HIN. The results of experiments on several real-world datasets show that the proposed approach outperforms state-of-the-art approaches for HIN embedding.展开更多
Predicting interactions between drugs and target proteins has become an essential task in the drug discovery process.Although the method of validation via wet-lab experiments has become available,experimental methods ...Predicting interactions between drugs and target proteins has become an essential task in the drug discovery process.Although the method of validation via wet-lab experiments has become available,experimental methods for drug-target interaction(DTI)identification remain either time consuming or heavily dependent on domain expertise.Therefore,various computational models have been proposed to predict possible interactions between drugs and target proteins.However,most prediction methods do not consider the topological structures characteristics of the relationship.In this paper,we propose a relational topologybased heterogeneous network embedding method to predict drug-target interactions,abbreviated as RTHNE_DTI.We first construct a heterogeneous information network based on the interaction between different types of nodes,to enhance the ability of association discovery by fully considering the topology of the network.Then drug and target protein nodes can be represented by the other types of nodes.According to the different topological structure of the relationship between the nodes,we divide the relationship in the heterogeneous network into two categories and model them separately.Extensive experiments on the realworld drug datasets,RTHNE_DTI produces high efficiency and outperforms other state-of-the-art methods.RTHNE_DTI can be further used to predict the interaction between unknown interaction drug-target pairs.展开更多
As a powerful tool for elucidating the embedding representation of graph-structured data,Graph Neural Networks(GNNs),which are a series of powerful tools built on homogeneous networks,have been widely used in various ...As a powerful tool for elucidating the embedding representation of graph-structured data,Graph Neural Networks(GNNs),which are a series of powerful tools built on homogeneous networks,have been widely used in various data mining tasks.It is a huge challenge to apply a GNN to an embedding Heterogeneous Information Network(HIN).The main reason for this challenge is that HINs contain many different types of nodes and different types of relationships between nodes.HIN contains rich semantic and structural information,which requires a specially designed graph neural network.However,the existing HIN-based graph neural network models rarely consider the interactive information hidden between the meta-paths of HIN in the poor embedding of nodes in the HIN.In this paper,we propose an Attention-aware Heterogeneous graph Neural Network(AHNN)model to effectively extract useful information from HIN and use it to learn the embedding representation of nodes.Specifically,we first use node-level attention to aggregate and update the embedding representation of nodes,and then concatenate the embedding representation of the nodes on different meta-paths.Finally,the semantic-level neural network is proposed to extract the feature interaction relationships on different meta-paths and learn the final embedding of nodes.Experimental results on three widely used datasets showed that the AHNN model could significantly outperform the state-of-the-art models.展开更多
We search a variety of things over the Internet in our daily lives, and numerous search engines are available to get us more relevant results. With the rapid technological advancement, the internet has become a major ...We search a variety of things over the Internet in our daily lives, and numerous search engines are available to get us more relevant results. With the rapid technological advancement, the internet has become a major source of obtaining information. Further, the advent of the Web2.0 era has led to an increased interaction between the user and the website. It has become challenging to provide information to users as per their interests. Because of copyright restrictions, most of existing research studies are confronting the lack of availability of the content of candidates recommending articles. The content of such articles is not always available freely and hence leads to inadequate recommendation results. Moreover, various research studies base recommendation on user profiles. Therefore, their recommendation needs a significant number of registered users in the system. In recent years, research work proves that Knowledge graphs have yielded better in generating quality recommendation results and alleviating sparsity and cold start issues. Network embedding techniques try to learn high quality feature vectors automatically from network structures, enabling vector-based measurers of node relatedness. Keeping the strength of Network embedding techniques, the proposed citation-based recommendation approach makes use of heterogeneous network embedding in generating recommendation results. The novelty of this paper is in exploiting the performance of a network embedding approach i.e., matapath2vec to generate paper recommendations. Unlike existing approaches, the proposed method has the capability of learning low-dimensional latent representation of nodes (i.e., research papers) in a network. We apply metapath2vec on a knowledge network built by the ACL Anthology Network (all about NLP) and use the node relatedness to generate item (research article) recommendations.展开更多
Heterogeneous information networks,which consist of multi-typed vertices representing objects and multi-typed edges representing relations between objects,are ubiquitous in the real world.In this paper,we study the pr...Heterogeneous information networks,which consist of multi-typed vertices representing objects and multi-typed edges representing relations between objects,are ubiquitous in the real world.In this paper,we study the problem of entity matching for heterogeneous information networks based on distributed network embedding and multi-layer perceptron with a highway network,and we propose a new method named DEM short for Deep Entity Matching.In contrast to the traditional entity matching methods,DEM utilizes the multi-layer perceptron with a highway network to explore the hidden relations to improve the performance of matching.Importantly,we incorporate DEM with the network embedding methodology,enabling highly efficient computing in a vectorized manner.DEM's generic modeling of both the network structure and the entity attributes enables it to model various heterogeneous information networks flexibly.To illustrate its functionality,we apply the DEM algorithm to two real-world entity matching applications:user linkage under the social network analysis scenario that predicts the same or matched users in different social platforms and record linkage that predicts the same or matched records in different citation networks.Extensive experiments on real-world datasets demonstrate DEM's effectiveness and rationality.展开更多
Community discovery is an important task in social network analysis.However,most existing methods for community discovery rely on the topological structure alone.These methods ignore the rich information available in ...Community discovery is an important task in social network analysis.However,most existing methods for community discovery rely on the topological structure alone.These methods ignore the rich information available in the content data.In order to solve this issue,in this paper,we present a community discovery method based on heterogeneous information network decomposition and embedding.Unlike traditional methods,our method takes into account topology,node content and edge content,which can supply abundant evidence for community discovery.First,an embedding-based similarity evaluation method is proposed,which decomposes the heterogeneous information network into several subnetworks,and extracts their potential deep representation to evaluate the similarities between nodes.Second,a bottom-up community discovery algorithm is proposed.Via leader nodes selection,initial community generation,and community expansion,communities can be found more efficiently.Third,some incremental maintenance strategies for the changes of networks are proposed.We conduct experimental studies based on three real-world social networks.Experiments demonstrate the effectiveness and the efficiency of our proposed method.Compared with the traditional methods,our method improves normalized mutual information(NMI)and the modularity by an average of 12%and 37%respectively.展开更多
基金Supported by the National Natural Science Foundation of China(No.62203390)the Science and Technology Project of China TobaccoZhejiang Industrial Co.,Ltd(No.ZJZY2022E004)。
文摘In the tobacco industry,insider employee attack is a thorny problem that is difficult to detect.To solve this issue,this paper proposes an insider threat detection method based on heterogeneous graph embedding.First,the interrelationships between logs are fully considered,and log entries are converted into heterogeneous graphs based on these relationships.Second,the heterogeneous graph embedding is adopted and each log entry is represented as a low-dimensional feature vector.Then,normal logs and malicious logs are classified into different clusters by clustering algorithm to identify malicious logs.Finally,the effectiveness and superiority of the method is verified through experiments on the CERT dataset.The experimental results show that this method has better performance compared to some baseline methods.
基金This work was supported by the National Key Research and Development Program of China under Grant No.2021YFB3100600the Youth Innovation Promotion Association of Chinese Academy of Sciences under Grant No.2021153the State Key Program of National Natural Science Foundation of China under Grant No.U2336202.
文摘Event detection(ED)seeks to recognize event triggers and classify them into the predefined event types.Chinese ED is formulated as a character-level task owing to the uncertain word boundaries.Prior methods try to incorpo-rate word-level information into characters to enhance their semantics.However,they experience two problems.First,they fail to incorporate word-level information into each character the word encompasses,causing the insufficient word-charac-ter interaction problem.Second,they struggle to distinguish events of similar types with limited annotated instances,which is called the event confusing problem.This paper proposes a novel model named Label-Aware Heterogeneous Graph Attention Network(L-HGAT)to address these two problems.Specifically,we first build a heterogeneous graph of two node types and three edge types to maximally preserve word-character interactions,and then deploy a heterogeneous graph attention network to enhance the semantic propagation between characters and words.Furthermore,we design a pushing-away game to enlarge the predicting gap between the ground-truth event type and its confusing counterpart for each character.Experimental results show that our L-HGAT model consistently achieves superior performance over prior competitive methods.
基金supported by the National Key Research and Development Plan of China(2017YFB0503700,2016YFB0501801)the National Natural Science Foundation of China(61170026,62173157)+1 种基金the Thirteen Five-Year Research Planning Project of National Language Committee(No.YB135-149)the Fundamental Research Funds for the Central Universities(Nos.CCNU20QN022,CCNU20QN021,CCNU20ZT012).
文摘Real-world complex networks are inherently heterogeneous;they have different types of nodes,attributes,and relationships.In recent years,various methods have been proposed to automatically learn how to encode the structural and semantic information contained in heterogeneous information networks(HINs)into low-dimensional embeddings;this task is called heterogeneous network embedding(HNE).Efficient HNE techniques can benefit various HIN-based machine learning tasks such as node classification,recommender systems,and information retrieval.Here,we provide a comprehensive survey of key advancements in the area of HNE.First,we define an encoder-decoder-based HNE model taxonomy.Then,we systematically overview,compare,and summarize various state-of-the-art HNE models and analyze the advantages and disadvantages of various model categories to identify more potentially competitive HNE frameworks.We also summarize the application fields,benchmark datasets,open source tools,andperformance evaluation in theHNEarea.Finally,wediscuss open issues and suggest promising future directions.We anticipate that this survey will provide deep insights into research in the field of HNE.
基金This work was supported by the National Natural Science Foundation of China under Grant No.61672161the Youth Research Fund of Shanghai Municipal Health and Family Planning Commission of China under Grant No.2015Y0195。
文摘Heterogeneous information network (HIN)-structured data provide an effective model for practical purposes in real world. Network embedding is fundamental for supporting the network-based analysis and prediction tasks. Methods of network embedding that are currently popular normally fail to effectively preserve the semantics of HIN. In this study, we propose AGA2Vec, a generative adversarial model for HIN embedding that uses attention mechanisms and meta-paths. To capture the semantic information from multi-typed entities and relations in HIN, we develop a weighted meta-path strategy to preserve the proximity of HIN. We then use an autoencoder and a generative adversarial model to obtain robust representations of HIN. The results of experiments on several real-world datasets show that the proposed approach outperforms state-of-the-art approaches for HIN embedding.
基金funded by the National Natural Science Foundation of China,grant number 61402220the key program of Scientific Research Fund of Hunan Provincial Education Department,grant number 19A439the Project supported by the Natural Science Foundation of Hunan Province,China,grant number 2020J4525 and grant number 2022J30495.
文摘Predicting interactions between drugs and target proteins has become an essential task in the drug discovery process.Although the method of validation via wet-lab experiments has become available,experimental methods for drug-target interaction(DTI)identification remain either time consuming or heavily dependent on domain expertise.Therefore,various computational models have been proposed to predict possible interactions between drugs and target proteins.However,most prediction methods do not consider the topological structures characteristics of the relationship.In this paper,we propose a relational topologybased heterogeneous network embedding method to predict drug-target interactions,abbreviated as RTHNE_DTI.We first construct a heterogeneous information network based on the interaction between different types of nodes,to enhance the ability of association discovery by fully considering the topology of the network.Then drug and target protein nodes can be represented by the other types of nodes.According to the different topological structure of the relationship between the nodes,we divide the relationship in the heterogeneous network into two categories and model them separately.Extensive experiments on the realworld drug datasets,RTHNE_DTI produces high efficiency and outperforms other state-of-the-art methods.RTHNE_DTI can be further used to predict the interaction between unknown interaction drug-target pairs.
基金supported by the Key Scientific Guiding Project for the Central Universities Research Funds(No.N2008005)the Major Science and Technology Project of Liaoning Province of China(No.2020JH1/10100008)the National Key Research and Development Program of China(No.2018YFB1701104)。
文摘As a powerful tool for elucidating the embedding representation of graph-structured data,Graph Neural Networks(GNNs),which are a series of powerful tools built on homogeneous networks,have been widely used in various data mining tasks.It is a huge challenge to apply a GNN to an embedding Heterogeneous Information Network(HIN).The main reason for this challenge is that HINs contain many different types of nodes and different types of relationships between nodes.HIN contains rich semantic and structural information,which requires a specially designed graph neural network.However,the existing HIN-based graph neural network models rarely consider the interactive information hidden between the meta-paths of HIN in the poor embedding of nodes in the HIN.In this paper,we propose an Attention-aware Heterogeneous graph Neural Network(AHNN)model to effectively extract useful information from HIN and use it to learn the embedding representation of nodes.Specifically,we first use node-level attention to aggregate and update the embedding representation of nodes,and then concatenate the embedding representation of the nodes on different meta-paths.Finally,the semantic-level neural network is proposed to extract the feature interaction relationships on different meta-paths and learn the final embedding of nodes.Experimental results on three widely used datasets showed that the AHNN model could significantly outperform the state-of-the-art models.
文摘We search a variety of things over the Internet in our daily lives, and numerous search engines are available to get us more relevant results. With the rapid technological advancement, the internet has become a major source of obtaining information. Further, the advent of the Web2.0 era has led to an increased interaction between the user and the website. It has become challenging to provide information to users as per their interests. Because of copyright restrictions, most of existing research studies are confronting the lack of availability of the content of candidates recommending articles. The content of such articles is not always available freely and hence leads to inadequate recommendation results. Moreover, various research studies base recommendation on user profiles. Therefore, their recommendation needs a significant number of registered users in the system. In recent years, research work proves that Knowledge graphs have yielded better in generating quality recommendation results and alleviating sparsity and cold start issues. Network embedding techniques try to learn high quality feature vectors automatically from network structures, enabling vector-based measurers of node relatedness. Keeping the strength of Network embedding techniques, the proposed citation-based recommendation approach makes use of heterogeneous network embedding in generating recommendation results. The novelty of this paper is in exploiting the performance of a network embedding approach i.e., matapath2vec to generate paper recommendations. Unlike existing approaches, the proposed method has the capability of learning low-dimensional latent representation of nodes (i.e., research papers) in a network. We apply metapath2vec on a knowledge network built by the ACL Anthology Network (all about NLP) and use the node relatedness to generate item (research article) recommendations.
基金supported by the National Natural Science Foundation of China Youth Fund under Grant No.61902001.
文摘Heterogeneous information networks,which consist of multi-typed vertices representing objects and multi-typed edges representing relations between objects,are ubiquitous in the real world.In this paper,we study the problem of entity matching for heterogeneous information networks based on distributed network embedding and multi-layer perceptron with a highway network,and we propose a new method named DEM short for Deep Entity Matching.In contrast to the traditional entity matching methods,DEM utilizes the multi-layer perceptron with a highway network to explore the hidden relations to improve the performance of matching.Importantly,we incorporate DEM with the network embedding methodology,enabling highly efficient computing in a vectorized manner.DEM's generic modeling of both the network structure and the entity attributes enables it to model various heterogeneous information networks flexibly.To illustrate its functionality,we apply the DEM algorithm to two real-world entity matching applications:user linkage under the social network analysis scenario that predicts the same or matched users in different social platforms and record linkage that predicts the same or matched records in different citation networks.Extensive experiments on real-world datasets demonstrate DEM's effectiveness and rationality.
基金The work was supported by the National Key Research and Development Program of China under Grant No.2018YFB1003404the National Natural Science Foundation of China under Grant Nos.61672142,U1435216 and 61602103.
文摘Community discovery is an important task in social network analysis.However,most existing methods for community discovery rely on the topological structure alone.These methods ignore the rich information available in the content data.In order to solve this issue,in this paper,we present a community discovery method based on heterogeneous information network decomposition and embedding.Unlike traditional methods,our method takes into account topology,node content and edge content,which can supply abundant evidence for community discovery.First,an embedding-based similarity evaluation method is proposed,which decomposes the heterogeneous information network into several subnetworks,and extracts their potential deep representation to evaluate the similarities between nodes.Second,a bottom-up community discovery algorithm is proposed.Via leader nodes selection,initial community generation,and community expansion,communities can be found more efficiently.Third,some incremental maintenance strategies for the changes of networks are proposed.We conduct experimental studies based on three real-world social networks.Experiments demonstrate the effectiveness and the efficiency of our proposed method.Compared with the traditional methods,our method improves normalized mutual information(NMI)and the modularity by an average of 12%and 37%respectively.