期刊文献+

基于远程监督和深度学习的非物质文化遗产文本属性抽取研究 被引量:4

Research of Attribute Extraction in Intangible Cultural Heritage Texts Based on Distant Supervision and Deep Learning
原文传递
导出
摘要 [目的/意义]对非物质文化遗产(以下简称非遗)文本展开属性抽取研究,有利于非遗的知识图谱建设和非遗文化传播。[方法/过程]首先,基于非遗属性表,借助远程监督方法,构建大规模非遗文本属性标注语料。其次,基于深度学习方法构建CNN-BiLSTM-Att-CRF模型,对标注的非遗属性语料库中的属性值进行抽取,并同相关基线模型进行比较。[结果/结论]通过对标注语料库进行抽样检测,发现基于远程监督的非遗属性抽取标注语料库质量较高;本文提出的模型在多个非遗属性中属性抽取表现最优,同时平均的非遗属性抽取性能最佳。 [Purpose/significance]It is of great importance to conduct research on attribute extraction of intangible cultural heritage texts,which is useful to the construction of its knowledge base and the culture broadcast.[Method/process]Firstly,we build the large annotated cultural heritage corpus based on distant supervision,which contain five attributes.Then,we design and propose the CNN-BiLSTM-Att-CRF model to extract the attributes from annotated corpus.We compare the proposed model with other baseline models.[Result/conclusion]We find the quality of annotated corpus based on distant supervision is high by sam-pling the annotated corpus;the proposed model shows superior performance on several attributes and the average performance is the best.
作者 范涛 王昊 张宝隆 Fan Tao
出处 《情报理论与实践》 CSSCI 北大核心 2021年第10期1-7,17,共8页 Information Studies:Theory & Application
基金 国家社会科学基金重点项目“大数据环境下领域知识加工与组织模式研究”(项目编号:20ATQ006) 南京大学文科青年跨学科团队专项“面向人文计算的方志文本的语义分析和知识图谱研究”的成果 江苏青年社科英才和南京大学仲英青年学者(Tang Schloar)等人才培养计划的支持。
关键词 非物质文化遗产 自然语言处理 远程监督 属性抽取 神经网络 深度学习 intangible cultural heritage natural language processing distant supervision attribute extraction neural net-work deep learning
  • 相关文献

参考文献11

二级参考文献124

  • 1李妮,关焕梅,杨飘,董文永.基于BERT-IDCNN-CRF的中文命名实体识别方法[J].山东大学学报(理学版),2020,55(1):102-109. 被引量:53
  • 2钱军,杨欣,杨娟.情报研究方法的聚类分析[J].情报科学,2006,24(10):1561-1567. 被引量:34
  • 3符福峘,陆婷.论情报学方法论体系的构建、发展和应用[J].情报理论与实践,2007,30(2):149-153. 被引量:10
  • 4陈小荷,冯敏萱,徐润华,等.先秦文献信息处理[M].北京:世界图书出版公司北京公司,2013:146-168. 被引量:5
  • 5Sang E F T K, De Meulder F. Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition [ C ]//Special Interest Group on Natural Language Learning of the Association for Computational Linguistics. Proceedings of the Sev- enth Conference on Natural Language Learning at HLT-NAACL. Edmonton: CONLL, 2003:142 - 147. 被引量:1
  • 6Busa R. The annals of humanities computing: The index thomistic- us[J]. Computers and the Humanities, 1980,14(2) :83 -90. 被引量:1
  • 7Unsworth J. What is humanities computing and what is not [ EB/ OL ]. [ 2015 - 03 - 26 ]. http ://computerphilologie. uni - muench en. de/jgO2/unsworth, html. 被引量:1
  • 8Lafferty J, McCallum A, Pereira F. Conditional random fields : Prob- abilistic models for segmenting and labeling sequence data [ C ]// The International Machine Learning Society. Proceedings of 18th International Conference on Machine Learning. Williamstown: Williams College, 2001:282 -289. 被引量:1
  • 9CRF++ [ EB/OL]. [ 2015 - 05 - 21 ]. http://sourceforge, net/ projects/crfpp/. 被引量:1
  • 10Jaynes E T. On the rationale of maximum entropy methods[ J]. In- stitute of Electrical and Electronics Engineers, 1982,70(9) :939 - 952. 被引量:1

共引文献161

同被引文献71

引证文献4

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部