随着文本信息的迅猛增长,数据挖掘已成为知识发现的重要方法。短文本相似性(short text similarity,STSim)度量是数据挖掘研究的重要技术。为了更好的提高短文本相似性度量精度,本文提出了基于加权网络改进的中文短文本相似性度量的一...随着文本信息的迅猛增长,数据挖掘已成为知识发现的重要方法。短文本相似性(short text similarity,STSim)度量是数据挖掘研究的重要技术。为了更好的提高短文本相似性度量精度,本文提出了基于加权网络改进的中文短文本相似性度量的一种新模型。首先,基于词语间的共现频次对语义网络进行加权,利用加权复杂网络表征短文本;其次,考虑短文本加权复杂网络权重识别度低的特点及每个词语节点的位置,计算短文本中每个词语的加权复杂网络综合特征值;最后,根据新模型计算短文本相似性,并通过聚类实验评价其优劣。实验结果表明,新提出的相似性度量模型优于STSim模型。展开更多
A better understanding of previous accidents is an effective way to reduce the occurrence of similar accidents in the future. In this paper, a complex network approach is adopted to construct a directed weighted hazar...A better understanding of previous accidents is an effective way to reduce the occurrence of similar accidents in the future. In this paper, a complex network approach is adopted to construct a directed weighted hazard network(DWHN) to analyze topological features and evolution of accidents in the subway construction. The nodes are hazards and accidents, the edges are multiple relationships of these nodes and the weight of edges are occurrence times of repetitive relationships. The results indicate that the DWHN possesses the property of small-world with small average path length and large clustering coefficient, indicating that hazards have better connectivity and will spread widely and quickly in the network. Moreover,the DWHN has the property of scale-free network for the cumulative degree distribution follows a power-law distribution.It makes DWHN more vulnerable to target attacks. Controlling key nodes with higher degree, strength and betweenness centrality will destroy the connectivity of DWHN and mitigate the spreading of accidents in the network. This study is helpful for discovering inner relationships and evolutionary features of hazards and accidents in the subway construction.展开更多
文摘随着文本信息的迅猛增长,数据挖掘已成为知识发现的重要方法。短文本相似性(short text similarity,STSim)度量是数据挖掘研究的重要技术。为了更好的提高短文本相似性度量精度,本文提出了基于加权网络改进的中文短文本相似性度量的一种新模型。首先,基于词语间的共现频次对语义网络进行加权,利用加权复杂网络表征短文本;其次,考虑短文本加权复杂网络权重识别度低的特点及每个词语节点的位置,计算短文本中每个词语的加权复杂网络综合特征值;最后,根据新模型计算短文本相似性,并通过聚类实验评价其优劣。实验结果表明,新提出的相似性度量模型优于STSim模型。
基金supported by the Co-Funding of National Natural Science Foundation of China and Shenhua Group Corporation Ltd(Grant No.U1261212)the Program of Major Achievements Transformation and Industrialization of Beijing Education Commission,China(Grant No.ZDZH20141141301)
文摘A better understanding of previous accidents is an effective way to reduce the occurrence of similar accidents in the future. In this paper, a complex network approach is adopted to construct a directed weighted hazard network(DWHN) to analyze topological features and evolution of accidents in the subway construction. The nodes are hazards and accidents, the edges are multiple relationships of these nodes and the weight of edges are occurrence times of repetitive relationships. The results indicate that the DWHN possesses the property of small-world with small average path length and large clustering coefficient, indicating that hazards have better connectivity and will spread widely and quickly in the network. Moreover,the DWHN has the property of scale-free network for the cumulative degree distribution follows a power-law distribution.It makes DWHN more vulnerable to target attacks. Controlling key nodes with higher degree, strength and betweenness centrality will destroy the connectivity of DWHN and mitigate the spreading of accidents in the network. This study is helpful for discovering inner relationships and evolutionary features of hazards and accidents in the subway construction.