期刊文献+

基于信息关联拓扑的互联网社交关系挖掘 被引量:3

Info-association topology based social relationship mining on Internet
下载PDF
导出
摘要 针对目前基于监督学习的关系抽取方法需要标注大量训练数据和预先定义关系类型,提出了一种基于词语共现信息构建关联网络并在关联网络上进行图聚类分析的人物关系提取方法。首先,从新闻标题数据获得关联度较高的500个人物对用于关系抽取研究;然后,抓取关联人物对所在新闻数据,对其进行预处理,并利用词频—逆向文档频率(TF-IDF)得到人物对共现句子中的关键词;其次,基于词语共现信息得到词语之间的关联,进而建立关键词关联网络;最后,利用对关联网络进行图聚类分析以获得人物关系。在关系抽取的实验中,与传统基于词语共现和模式匹配的中文实体关系提取方法相比,所提方法在准确率、召回率和平衡F分数(F-score)上分别提升了5.5,3.7和4.4个百分点。实验结果表明,所提算法能够在没有标注训练数据的条件下,有效地从新闻数据中抽取丰富且高质量的人物关系数据。 To solve the problems of needing labeling a great number of training data and pre-defining relation types in relation extraction methods based on supervised learning, a method for personal relation extraction by constructing the correlation network based on word co-occurrence information and performing graph clustering analysis on the correlation network was proposed. Firstly, 500 highly related person pairs for the research of relation extraction were gotten from the news title data. Secondly, the news data which contained related person pairs were crawled and performed pre-processing, and the keywords in the sentences which contained person pairs were gotten by the Term Frequency-Inverse Document Frequency( TFIDF). Thirdly, the correlation between the words was acquired by the words co-occurrence information, and the key-words correlation network was constructed. Finally, the personal relations were acquired by the graph clustering analysis on the correlation network. In the relation extraction experiments, compared with the traditional algorithm of Chinese relation extraction based on word co-occurrence and pattern matching technology, the precision, recall and F-score of the proposed method were improved by 5. 5, 3. 7 and 4. 4 percentage points respectively. The experimental results show that the proposed algorithm can effectively extract abundant and high-quality personal relation data from news data without labeling training data.
出处 《计算机应用》 CSCD 北大核心 2016年第7期1875-1880,共6页 journal of Computer Applications
基金 国家自然科学基金资助项目(61332004) 苏州市科技计划项目产业技术创新专项(民生科技)(SS201509)~~
关键词 社会关系抽取 共现统计 词语关联度 关联网络 图聚类 social relation extraction co-occurrence statistics word correlation correlation network graph clustering
  • 相关文献

参考文献20

二级参考文献126

共引文献350

同被引文献3

引证文献3

二级引证文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部