MATLAB software and optimal complete subgraph algorithm were used to extract and reveal the microsatellite distribution features in the complete genomes of the tobacco vein clearing virus (NC-003 378.1) from the NCB...MATLAB software and optimal complete subgraph algorithm were used to extract and reveal the microsatellite distribution features in the complete genomes of the tobacco vein clearing virus (NC-003 378.1) from the NCBI database.The results showed that the repetitions number and their location of the N-base group has been extracted and displayed.The largest repetitions of N-base group in the complete genomes of the tobacco vein clearing virus was decreased as the exponential function with the increasing of N.The method used in this study could be applied to the extraction and revealing of the microsatellite distribution features in the complete genomes of other viruses,thereby provided a basis for the research of the structure and the law of function,inheritance and variation by the using of the microsatellite distribution features.展开更多
Graph pattern matching(GPM)can be used to mine the key information in graphs.Exact GPM is one of the most commonly used methods among all the GPM-related methods,which aims to exactly find all subgraphs for a given qu...Graph pattern matching(GPM)can be used to mine the key information in graphs.Exact GPM is one of the most commonly used methods among all the GPM-related methods,which aims to exactly find all subgraphs for a given query graph in a data graph.The exact GPM has been widely used in biological data analyses,social network analyses and other fields.In this paper,the applications of the exact GPM were first introduced,and the research progress of the exact GPM was summarized.Then,the related algorithms were introduced in detail,and the experiments on the state-of-the-art exact GPM algorithms were conducted to compare their performance.Based on the experimental results,the applicable scenarios of the algorithms were pointed out.New research opportunities in this area were proposed.展开更多
Social robot accounts controlled by artificial intelligence or humans are active in social networks,bringing negative impacts to network security and social life.Existing social robot detection methods based on graph ...Social robot accounts controlled by artificial intelligence or humans are active in social networks,bringing negative impacts to network security and social life.Existing social robot detection methods based on graph neural networks suffer from the problem of many social network nodes and complex relationships,which makes it difficult to accurately describe the difference between the topological relations of nodes,resulting in low detection accuracy of social robots.This paper proposes a social robot detection method with the use of an improved neural network.First,social relationship subgraphs are constructed by leveraging the user’s social network to disentangle intricate social relationships effectively.Then,a linear modulated graph attention residual network model is devised to extract the node and network topology features of the social relation subgraph,thereby generating comprehensive social relation subgraph features,and the feature-wise linear modulation module of the model can better learn the differences between the nodes.Next,user text content and behavioral gene sequences are extracted to construct social behavioral features combined with the social relationship subgraph features.Finally,social robots can be more accurately identified by combining user behavioral and relationship features.By carrying out experimental studies based on the publicly available datasets TwiBot-20 and Cresci-15,the suggested method’s detection accuracies can achieve 86.73%and 97.86%,respectively.Compared with the existing mainstream approaches,the accuracy of the proposed method is 2.2%and 1.35%higher on the two datasets.The results show that the method proposed in this paper can effectively detect social robots and maintain a healthy ecological environment of social networks.展开更多
The eccentricity matrix of a graph is obtained from the distance matrix by keeping the entries that are largest in their row or column,and replacing the remaining entries by zero.This matrix can be interpreted as an o...The eccentricity matrix of a graph is obtained from the distance matrix by keeping the entries that are largest in their row or column,and replacing the remaining entries by zero.This matrix can be interpreted as an opposite to the adjacency matrix,which is on the contrary obtained from the distance matrix by keeping only the entries equal to 1.In the paper,we determine graphs having the second largest eigenvalue of eccentricity matrix less than 1.展开更多
gStore is an open-source native Resource Description Framework (RDF) triple store that answers SPARQL queries by subgraph matching over RDF graphs. However, there are some deficiencies in the original system design,...gStore is an open-source native Resource Description Framework (RDF) triple store that answers SPARQL queries by subgraph matching over RDF graphs. However, there are some deficiencies in the original system design, such as answering simple queries (including one-triple pattern queries). To improve the efficiency of the system, we reconsider the system design in this paper. Specifically, we propose a new query plan generation module that generates different query plans according to the structures of query graphs. Furthermore, we re-design our vertex encoding strategy to achieve more pruning power and a new multi-join algorithm to speed up the subgraph matching process. Extensive experiments on synthetic and real RDF datasets show that our method outperforms the state-of-the-art algorithms significantly.展开更多
Acid-base dissociation constant(pK_(a)) is a key physicochemical parameter in chemical science, especially in organic synthesis and drug discovery. Current methodologies for pK_(a) prediction still suffer from limited...Acid-base dissociation constant(pK_(a)) is a key physicochemical parameter in chemical science, especially in organic synthesis and drug discovery. Current methodologies for pK_(a) prediction still suffer from limited applicability domain and lack of chemical insight. Here we present MF-SuP-pK_(a)(multi-fidelity modeling with subgraph pooling for pK_(a) prediction), a novel pK_(a) prediction model that utilizes subgraph pooling, multi-fidelity learning and data augmentation. In our model, a knowledgeaware subgraph pooling strategy was designed to capture the local and global environments around the ionization sites for micro-pK_(a) prediction. To overcome the scarcity of accurate pK_(a) data, lowfidelity data(computational pK_(a)) was used to fit the high-fidelity data(experimental pK_(a)) through transfer learning. The final MF-SuP-pK_(a) model was constructed by pre-training on the augmented ChEMBL data set and fine-tuning on the DataWarrior data set. Extensive evaluation on the DataWarrior data set and three benchmark data sets shows that MF-SuP-pK_(a) achieves superior performances to the state-of-theart pK_(a) prediction models while requires much less high-fidelity training data. Compared with Attentive FP, MF-SuP-pK_(a) achieves 23.83% and 20.12% improvement in terms of mean absolute error(MAE) on the acidic and basic sets, respectively.展开更多
Discovering regularities between entities in temporal graphs is vital for many real-world applications(e.g.,social recommendation,emergency event detection,and cyberattack event detection).This paper proposes temporal...Discovering regularities between entities in temporal graphs is vital for many real-world applications(e.g.,social recommendation,emergency event detection,and cyberattack event detection).This paper proposes temporal graph association rules(TGARs)that extend traditional graph-pattern association rules in a static graph by incorporating the unique temporal information and constraints.We introduce quality measures(e.g.,support,confidence,and diversification)to characterize meaningful TGARs that are useful and diversified.In addition,the proposed support metric is an upper bound for alternative metrics,allowing us to guarantee a superset of patterns.We extend conventional confidence measures in terms of maximal occurrences of TGARs.The diversification score strikes a balance between interestingness and diversity.Although the problem is NP-hard,we develop an effective discovery algorithm for TGARs that integrates TGARs generation and TGARs selection and shows that mining TGARs is feasible over a temporal graph.We propose pruning strategies to filter TGARs that have low support or cannot make top-k as early as possible.Moreover,we design an auxiliary data structure to prune the TGARs that do not meet the constraints during the TGARs generation process to avoid conducting repeated subgraph matching for each extension in the search space.We experimentally verify the effectiveness,efficiency,and scalability of our algorithms in discovering diversified top-k TGARs from temporal graphs in real-life applications.展开更多
基金Supported by the Eleventh Five-year Development Planning Project for Instructional Science in Hubei Province (2006B131)~~
文摘MATLAB software and optimal complete subgraph algorithm were used to extract and reveal the microsatellite distribution features in the complete genomes of the tobacco vein clearing virus (NC-003 378.1) from the NCBI database.The results showed that the repetitions number and their location of the N-base group has been extracted and displayed.The largest repetitions of N-base group in the complete genomes of the tobacco vein clearing virus was decreased as the exponential function with the increasing of N.The method used in this study could be applied to the extraction and revealing of the microsatellite distribution features in the complete genomes of other viruses,thereby provided a basis for the research of the structure and the law of function,inheritance and variation by the using of the microsatellite distribution features.
文摘Graph pattern matching(GPM)can be used to mine the key information in graphs.Exact GPM is one of the most commonly used methods among all the GPM-related methods,which aims to exactly find all subgraphs for a given query graph in a data graph.The exact GPM has been widely used in biological data analyses,social network analyses and other fields.In this paper,the applications of the exact GPM were first introduced,and the research progress of the exact GPM was summarized.Then,the related algorithms were introduced in detail,and the experiments on the state-of-the-art exact GPM algorithms were conducted to compare their performance.Based on the experimental results,the applicable scenarios of the algorithms were pointed out.New research opportunities in this area were proposed.
基金This work was supported in part by the National Natural Science Foundation of China under Grants 62273272,62303375 and 61873277in part by the Key Research and Development Program of Shaanxi Province under Grant 2023-YBGY-243+2 种基金in part by the Natural Science Foundation of Shaanxi Province under Grants 2022JQ-606 and 2020-JQ758in part by the Research Plan of Department of Education of Shaanxi Province under Grant 21JK0752in part by the Youth Innovation Team of Shaanxi Universities.
文摘Social robot accounts controlled by artificial intelligence or humans are active in social networks,bringing negative impacts to network security and social life.Existing social robot detection methods based on graph neural networks suffer from the problem of many social network nodes and complex relationships,which makes it difficult to accurately describe the difference between the topological relations of nodes,resulting in low detection accuracy of social robots.This paper proposes a social robot detection method with the use of an improved neural network.First,social relationship subgraphs are constructed by leveraging the user’s social network to disentangle intricate social relationships effectively.Then,a linear modulated graph attention residual network model is devised to extract the node and network topology features of the social relation subgraph,thereby generating comprehensive social relation subgraph features,and the feature-wise linear modulation module of the model can better learn the differences between the nodes.Next,user text content and behavioral gene sequences are extracted to construct social behavioral features combined with the social relationship subgraph features.Finally,social robots can be more accurately identified by combining user behavioral and relationship features.By carrying out experimental studies based on the publicly available datasets TwiBot-20 and Cresci-15,the suggested method’s detection accuracies can achieve 86.73%and 97.86%,respectively.Compared with the existing mainstream approaches,the accuracy of the proposed method is 2.2%and 1.35%higher on the two datasets.The results show that the method proposed in this paper can effectively detect social robots and maintain a healthy ecological environment of social networks.
基金supported by the Special Fund for Taishan Scholars Projectthe IC Program of Shandong Institutions of Higher Learning For Youth Innovative Talents+1 种基金supported by the National Natural Science Foundation of China (Grant No. 12371353)supported by the Science Fund of the Republic of Serbia grant number 7749676:Spectrally Constrained Signed Graphs with Applications in Coding Theory and Control Theory–SCSG-ctct
文摘The eccentricity matrix of a graph is obtained from the distance matrix by keeping the entries that are largest in their row or column,and replacing the remaining entries by zero.This matrix can be interpreted as an opposite to the adjacency matrix,which is on the contrary obtained from the distance matrix by keeping only the entries equal to 1.In the paper,we determine graphs having the second largest eigenvalue of eccentricity matrix less than 1.
文摘gStore is an open-source native Resource Description Framework (RDF) triple store that answers SPARQL queries by subgraph matching over RDF graphs. However, there are some deficiencies in the original system design, such as answering simple queries (including one-triple pattern queries). To improve the efficiency of the system, we reconsider the system design in this paper. Specifically, we propose a new query plan generation module that generates different query plans according to the structures of query graphs. Furthermore, we re-design our vertex encoding strategy to achieve more pruning power and a new multi-join algorithm to speed up the subgraph matching process. Extensive experiments on synthetic and real RDF datasets show that our method outperforms the state-of-the-art algorithms significantly.
基金financially supported by National Key Research and Development Program of China (2021YFF1201400)National Natural Science Foundation of China (22220102001)Natural Science Foundation of Zhejiang Province (LZ19H300001, LD22H300001, China)。
文摘Acid-base dissociation constant(pK_(a)) is a key physicochemical parameter in chemical science, especially in organic synthesis and drug discovery. Current methodologies for pK_(a) prediction still suffer from limited applicability domain and lack of chemical insight. Here we present MF-SuP-pK_(a)(multi-fidelity modeling with subgraph pooling for pK_(a) prediction), a novel pK_(a) prediction model that utilizes subgraph pooling, multi-fidelity learning and data augmentation. In our model, a knowledgeaware subgraph pooling strategy was designed to capture the local and global environments around the ionization sites for micro-pK_(a) prediction. To overcome the scarcity of accurate pK_(a) data, lowfidelity data(computational pK_(a)) was used to fit the high-fidelity data(experimental pK_(a)) through transfer learning. The final MF-SuP-pK_(a) model was constructed by pre-training on the augmented ChEMBL data set and fine-tuning on the DataWarrior data set. Extensive evaluation on the DataWarrior data set and three benchmark data sets shows that MF-SuP-pK_(a) achieves superior performances to the state-of-theart pK_(a) prediction models while requires much less high-fidelity training data. Compared with Attentive FP, MF-SuP-pK_(a) achieves 23.83% and 20.12% improvement in terms of mean absolute error(MAE) on the acidic and basic sets, respectively.
基金This work was partially supported by the National Key Research and Development Program(No.2018YFB1800203)National Natural Science Foundation of China(No.U19B2024)Postgraduate Scientific Research Innovation Project of Hunan Province(No.CX20210038).
文摘Discovering regularities between entities in temporal graphs is vital for many real-world applications(e.g.,social recommendation,emergency event detection,and cyberattack event detection).This paper proposes temporal graph association rules(TGARs)that extend traditional graph-pattern association rules in a static graph by incorporating the unique temporal information and constraints.We introduce quality measures(e.g.,support,confidence,and diversification)to characterize meaningful TGARs that are useful and diversified.In addition,the proposed support metric is an upper bound for alternative metrics,allowing us to guarantee a superset of patterns.We extend conventional confidence measures in terms of maximal occurrences of TGARs.The diversification score strikes a balance between interestingness and diversity.Although the problem is NP-hard,we develop an effective discovery algorithm for TGARs that integrates TGARs generation and TGARs selection and shows that mining TGARs is feasible over a temporal graph.We propose pruning strategies to filter TGARs that have low support or cannot make top-k as early as possible.Moreover,we design an auxiliary data structure to prune the TGARs that do not meet the constraints during the TGARs generation process to avoid conducting repeated subgraph matching for each extension in the search space.We experimentally verify the effectiveness,efficiency,and scalability of our algorithms in discovering diversified top-k TGARs from temporal graphs in real-life applications.