期刊文献+

藏医药抽取式机器阅读理解数据集研究

A Study on Reading Comprehension Dataset of Tibetan Medicine Extractive Machine
下载PDF
导出
摘要 藏文机器阅读理解领域尚处于起步阶段,构建一份高质量的语料库成为推动该领域发展的当务之急。本研究采用众包方式,对藏医经典著作《四部医典》中的藏医植物药材与名词解释部分进行精细标注。结合藏文掩码数据扩充策略,有效扩充了数据集的规模,最终整理出13k条有效问答对。基于该数据集,通过优化传统的注意力机制,提出了一个高效的藏文机器阅读理解模型。文章的研究不仅对于推动藏文信息处理技术的深入发展具有重要意义,更有助于提升机器对藏文文本的理解能力,从而为藏文化的传承和保护提供有力支持。 The field of Tibetan machine reading comprehension is still in its infancy,and the construction of a highquality corpus has become an urgent task to promote the development of this field.This study adopted a crowdsourcing approach to finely annotate the Tibetan medical compilation and terminology explanations in the Tibetan medical classics,the"The Four Medical Tantras."Combined with the Tibetan masked data enrichment strategy,the scale of the dataset was effectively expanded,and finally 13,000 effective question-answer pairs were sorted out.Based on the dataset,an efficient model of Tibetan machine reading comprehension is proposed by optimizing the traditional attention mechanism.The research in this paper is not only of great significance for promoting the in-depth development of Tibetan information processing technology,but also helps to improvethe ability of machines to understand Tibetan texts,so as to provide strong support for the inheritance and protection of Tibetan culture.
作者 旦增罗布 拉巴次仁 王浩畅 小次仁 Danzeng Luobu;Laba Ciren;Wang Haochang;Xiao Ciren(Shannan Power Supply Co-mpany of State Grid Tibet Electric Power Company Limited,Lhoka 856000,China;University of Tibetan Medicine,Lhasa 850000,China;School of Computer and Information Technology,Northeast Petroleum University,Daqing 163318,China)
出处 《西藏科技》 2024年第9期73-80,共8页 Xizang Science And Technology
基金 2023年藏医博士点建设及中藏药博士点培育科研资助计划项目(BSDJS-23-15) 国家自然科学基金(61402099)。
关键词 藏文机器阅读理解 四部医典 藏文语料库 注意力机制 Tibetan machine reading comprehension The Four Medical Tantras Tibetan corpus Attention mechanism
  • 相关文献

参考文献6

二级参考文献88

共引文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部