摘要
目前,在中国高校数字图书馆,学位论文数据库是重要的数字资源,然而,其元数据录入一直依赖手工完成,效率低,耗费大量的人力。针对这一问题,采用基于文档特征与规则模式匹配的方法,利用正则表达式研究学位论文元数据的自动抽取,该算法包括信息定位和元数据抽取两个模块。实验数据表明,该算法具有较高的准确率和召回率以及综合性能指数F。
Currently, in our digital library, dissertations database is one important of digital resources. However, metadata en- try has relied on manual to complete, which is low efficiency, and cost a lot of manpower. For this problem, our applied the method of document features and pattern matching, and made use of regular expressions to research automatic extraction of dissertation metadata. The algorithm includes two modules of information field location and metadata extraction. The experimental data shows that the algorithm has higher precision and recall, and overall performance index F.
出处
《农业图书情报学刊》
2015年第2期57-59,共3页
Journal of Library and Information Sciences in Agriculture
关键词
学位论文
元数据
信息抽取
正则表达式
模式匹配
Dissertation
Metadata
Information extraction
Regular expression
Pattern matching