本文以1900-2019年"Web of Science(WOS)"核心合集中的中国科学院(中科院)部分论文数据为面板数据对热点学科、科研社区及相关权威专家进行了分析研究.首先对艺术与人文、生命科学与生物医学、自然科学、社会科学、应用科学...本文以1900-2019年"Web of Science(WOS)"核心合集中的中国科学院(中科院)部分论文数据为面板数据对热点学科、科研社区及相关权威专家进行了分析研究.首先对艺术与人文、生命科学与生物医学、自然科学、社会科学、应用科学五大学科数据进行分析,发现应用科学(Technology)发表论文年增速最快,且研究热点为计算机科学(Computer Science);其次针对研究热点应用Neo4j图数据库构建论文语义网络图,对实体关系进行优化,提升了社区内部关联度;并基于Louvain社区发现算法进行了相关优化和数据挖掘,分析了其背后的优秀科研团队;最后针对挖掘出的社区,利用PageRank算法筛选出高产出的权威科研人员,为科研合作和人才发现甚至国家学科布局提供参考.实验表明,通过Neo4j图数据库中实体数据索引设计,查询性能提升高达16倍;通过对Louvain算法关系属性weight添加机构影响维度,社区模块度提升了84%.展开更多
Community detection is a vital task in many fields,such as social networks and financial analysis,to name a few.The Louvain method,the main workhorse of community detection,is a popular heuristic method.To apply it to...Community detection is a vital task in many fields,such as social networks and financial analysis,to name a few.The Louvain method,the main workhorse of community detection,is a popular heuristic method.To apply it to large-scale graph networks,researchers have proposed several parallel Louvain methods(PLMs),which suffer from two challenges:the latency in the information synchronization,and the community swap.To tackle these two challenges,we propose an isolate sets based parallel Louvain method(IPLM)and a fusion IPLM with the hashtables based Louvain method(FIPLM),which are based on a novel graph partition algorithm.Our graph partition algorithm divides the graph network into subgraphs called isolate sets,in which the vertices are relatively decoupled from others.We first describe the concepts and properties of the isolate set.Second we propose an algorithm to divide the graph network into isolate sets,which enjoys the same computation complexity as the breadth-first search.Third,we propose IPLM,which can efficiently calculate and update vertices information in parallel without latency or community swap.Finally,we achieve further acceleration by FIPLM,which maintains a high quality of community detection with a faster speedup than IPLM.Our two methods are for shared-memory architecture,and we implement our methods on an 8-core PC;the experiments show that IPLM achieves a maximum speedup of 4.62x and outputs higher modularity(maximum 4.76%)than the serial Louvain method on 14 of 18 datasets.Moreover,FIPLM achieves a maximum speedup of 7.26x.展开更多
兴趣点推荐是一种基于上下文信息的位置感知的个性化推荐。由于用户签到行为具有高稀疏性,为兴趣点推荐的精确度带来了很大的挑战。针对该问题,提出了一种融合相似度和地理信息的兴趣点推荐模型,称为SIGFM。首先利用潜在迪利克雷分配(La...兴趣点推荐是一种基于上下文信息的位置感知的个性化推荐。由于用户签到行为具有高稀疏性,为兴趣点推荐的精确度带来了很大的挑战。针对该问题,提出了一种融合相似度和地理信息的兴趣点推荐模型,称为SIGFM。首先利用潜在迪利克雷分配(Laten Dirichlet Allocation,LDA)模型挖掘用户相关兴趣特征并进行相似性度量,利用Louvain Community Detection(LCD)算法与用户签到数据进行相似性度量,使两种相似度相融合;然后使用地理信息获取用户的签到特征;最后将融合相似度和地理信息结合到一起获得一个新的模型。在真实数据集上的实验结果表明,SIGFM模型有效解决了数据稀疏性与冷启动问题,优于其他POIs的推荐算法。展开更多
文摘本文以1900-2019年"Web of Science(WOS)"核心合集中的中国科学院(中科院)部分论文数据为面板数据对热点学科、科研社区及相关权威专家进行了分析研究.首先对艺术与人文、生命科学与生物医学、自然科学、社会科学、应用科学五大学科数据进行分析,发现应用科学(Technology)发表论文年增速最快,且研究热点为计算机科学(Computer Science);其次针对研究热点应用Neo4j图数据库构建论文语义网络图,对实体关系进行优化,提升了社区内部关联度;并基于Louvain社区发现算法进行了相关优化和数据挖掘,分析了其背后的优秀科研团队;最后针对挖掘出的社区,利用PageRank算法筛选出高产出的权威科研人员,为科研合作和人才发现甚至国家学科布局提供参考.实验表明,通过Neo4j图数据库中实体数据索引设计,查询性能提升高达16倍;通过对Louvain算法关系属性weight添加机构影响维度,社区模块度提升了84%.
基金supported by the Key Program of National Natural Science Foundation of China under Grant No.61732018the National Natural Science Foundation of China under Grant No.61902415the Open Foundation of Science and Technology on Parallel and Distributed Laboratory(School of Computer,National University of Defense Technology)under Grant No.6142110190201.
文摘Community detection is a vital task in many fields,such as social networks and financial analysis,to name a few.The Louvain method,the main workhorse of community detection,is a popular heuristic method.To apply it to large-scale graph networks,researchers have proposed several parallel Louvain methods(PLMs),which suffer from two challenges:the latency in the information synchronization,and the community swap.To tackle these two challenges,we propose an isolate sets based parallel Louvain method(IPLM)and a fusion IPLM with the hashtables based Louvain method(FIPLM),which are based on a novel graph partition algorithm.Our graph partition algorithm divides the graph network into subgraphs called isolate sets,in which the vertices are relatively decoupled from others.We first describe the concepts and properties of the isolate set.Second we propose an algorithm to divide the graph network into isolate sets,which enjoys the same computation complexity as the breadth-first search.Third,we propose IPLM,which can efficiently calculate and update vertices information in parallel without latency or community swap.Finally,we achieve further acceleration by FIPLM,which maintains a high quality of community detection with a faster speedup than IPLM.Our two methods are for shared-memory architecture,and we implement our methods on an 8-core PC;the experiments show that IPLM achieves a maximum speedup of 4.62x and outputs higher modularity(maximum 4.76%)than the serial Louvain method on 14 of 18 datasets.Moreover,FIPLM achieves a maximum speedup of 7.26x.
文摘兴趣点推荐是一种基于上下文信息的位置感知的个性化推荐。由于用户签到行为具有高稀疏性,为兴趣点推荐的精确度带来了很大的挑战。针对该问题,提出了一种融合相似度和地理信息的兴趣点推荐模型,称为SIGFM。首先利用潜在迪利克雷分配(Laten Dirichlet Allocation,LDA)模型挖掘用户相关兴趣特征并进行相似性度量,利用Louvain Community Detection(LCD)算法与用户签到数据进行相似性度量,使两种相似度相融合;然后使用地理信息获取用户的签到特征;最后将融合相似度和地理信息结合到一起获得一个新的模型。在真实数据集上的实验结果表明,SIGFM模型有效解决了数据稀疏性与冷启动问题,优于其他POIs的推荐算法。