期刊文献+

基于深度主动学习的科技文献摘要结构功能识别研究

Identifying Structural Function of Scientific Literature Abstracts Based on Deep Active Learning
原文传递
导出
摘要 【目的】探究不同深度主动学习方法对科技文献摘要的结构功能识别效果和标注成本。【方法】提出基于主动学习和序列标注的科技文献摘要结构功能识别方法,构建考虑句间上下文序列信息的SciBERTBiLSTM-CRF模型(SBCA),然后分别提出基于摘要单句和摘要全文两个维度的基于不确定性的主动学习策略,并在PubMed 20K数据集上进行实验。【结果】SBCA模型具有最佳的识别效果,与不考虑序列信息仅使用SciBERT模型相比,F1值提升了11.93个百分点。使用基于整篇摘要的最小置信度策略达到SBCA模型的最优F1值仅需使用60%数据,使用基于单句的最小置信度策略达到SBCA模型的最优F1值仅需使用65%数据。【局限】本研究中仅构建了基于不确定性的主动学习查询策略,未考虑构建其他类别的查询策略。【结论】基于深度主动学习的方法有助于在更低注释成本的前提下进行摘要结构功能识别。 [Objective]This paper explores different DeepAL methods for identifying the structural function of scientific literature abstracts and their labeling costs.[Methods]Firstly,we constructed a SciBERT-BiLSTM-CRF model for the abstracts(SBCA),which utilized the contextual sequence information between sentences.Then,we developed an uncertainty active learning strategy for single sentences and full text of the abstracts.Finally,we conducted experiments on the PubMed 20K dataset.[Results]The SBCA model showed the best recognition performance and increased the F1 value by 11.93%,compared to the SciBERT model without sequence information.Using the Least Confidence strategy based on the abstracts,our SBCA model achieved its optimal F1 value with 60%of the experimental data.Using the Least Confidence strategy based on sentences,the SBCA model achieved optimal F1 value with 65%of the experimental data.[Limitations]In the future,we need to examine different active learning strategies in more fields or multi-language datasets.[Conclusions]The new model based on deep active learning could identify the structural function of scientific literature with a lower annotation cost.
作者 毛进 陈子洋 Mao Jin;Chen Ziyang(Center for Studies of Information Resources,Wuhan University,Wuhan 430072,China;School of Information Management,Wuhan University,Wuhan 430072,China)
出处 《数据分析与知识发现》 EI CSCD 北大核心 2024年第6期44-55,共12页 Data Analysis and Knowledge Discovery
基金 国家自然科学基金项目(项目编号:72174154) 高校人文社会科学重点研究基地重大项目(项目编号:22JJD870005)的研究成果之一。
关键词 深度学习 文献结构功能识别 语步 主动学习 知识组织 Deep Learning Document Structural Function Identification Move Active Learning Knowledge Organization
  • 相关文献

参考文献13

二级参考文献114

  • 1张智雄,刘欢,丁良萍,吴朋民,于改红.不同深度学习模型的科技论文摘要语步识别效果对比研究[J].数据分析与知识发现,2019,3(12):1-9. 被引量:22
  • 2王细薇,樊兴华,赵军.一种基于特征扩展的中文短文本分类方法[J].计算机应用,2009,29(3):843-845. 被引量:36
  • 3吴健,吴朝晖,李莹,邓水光.基于本体论和词汇语义相似度的Web服务发现[J].计算机学报,2005,28(4):595-602. 被引量:218
  • 4Chen, Q. & G. Ge. A Corpus-based Lexical Study on Fre- quency and Distribution of Coxhead' s AWL Word Fami- lies in Medical Research Articles [ J ]. English for Spe- cific Purposes, 2007 (26). 被引量:1
  • 5Coxhead, A. A New Academic Word List[J]. TESOL Quar- terly, 2000(2). 被引量:1
  • 6Farrell, P. A Lexical Analysis of the English of Electronics and a Study of Semi-technical Vocabulary[ M ]. Dublin : Trinity College, 1990. 被引量:1
  • 7Granger, S. Lexico-grammatical Patterns of EAP Verbs : How Do Learners Cope? [C]. Paper Presented at the Explo- ring the Lexis-Grammar Interface Conference, Hanover (Germany), 2006. 被引量:1
  • 8Kuhen, P. Assessment of Academic Literacy Skills: Preparing Minority and Limited English ProJ'wient (LEP) Students for Post-secondary Education [ M ]. Fresno, CA : Califor- nia State University, Fresno, 1996. 被引量:1
  • 9Martinez, I. A., S. C. Beck. & B. Carolina. Academic Vocabulary in Agriculture Research Articles: A Corpus- based Study [ J ]. English for Specific Purposes, 2009 (3). 被引量:1
  • 10Swales, J. M. Genre Analysis: English in Academic and Re- search Settings [ M ]. Cambridge: Cambridge University Press, 1990. 被引量:1

共引文献83

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部