摘要
[目的/意义]对非物质文化遗产(以下简称非遗)文本展开属性抽取研究,有利于非遗的知识图谱建设和非遗文化传播。[方法/过程]首先,基于非遗属性表,借助远程监督方法,构建大规模非遗文本属性标注语料。其次,基于深度学习方法构建CNN-BiLSTM-Att-CRF模型,对标注的非遗属性语料库中的属性值进行抽取,并同相关基线模型进行比较。[结果/结论]通过对标注语料库进行抽样检测,发现基于远程监督的非遗属性抽取标注语料库质量较高;本文提出的模型在多个非遗属性中属性抽取表现最优,同时平均的非遗属性抽取性能最佳。
[Purpose/significance]It is of great importance to conduct research on attribute extraction of intangible cultural heritage texts,which is useful to the construction of its knowledge base and the culture broadcast.[Method/process]Firstly,we build the large annotated cultural heritage corpus based on distant supervision,which contain five attributes.Then,we design and propose the CNN-BiLSTM-Att-CRF model to extract the attributes from annotated corpus.We compare the proposed model with other baseline models.[Result/conclusion]We find the quality of annotated corpus based on distant supervision is high by sam-pling the annotated corpus;the proposed model shows superior performance on several attributes and the average performance is the best.
出处
《情报理论与实践》
CSSCI
北大核心
2021年第10期1-7,17,共8页
Information Studies:Theory & Application
基金
国家社会科学基金重点项目“大数据环境下领域知识加工与组织模式研究”(项目编号:20ATQ006)
南京大学文科青年跨学科团队专项“面向人文计算的方志文本的语义分析和知识图谱研究”的成果
江苏青年社科英才和南京大学仲英青年学者(Tang Schloar)等人才培养计划的支持。
关键词
非物质文化遗产
自然语言处理
远程监督
属性抽取
神经网络
深度学习
intangible cultural heritage
natural language processing
distant supervision
attribute extraction
neural net-work
deep learning