摘要
为高效、准确、全面获取食品安全相关信息,以食品安全文本为研究对象,采用Lucene全文检索架构和长短期记忆神经网络(Longshort-term memory,LSTM)构建了食品安全自动问答系统。依托于从互联网爬取的文本作为非结构化数据集,利用检索架构扩充人工标注的问题答案对规模,并以此训练了可以判断问题和答案候选句匹配程度的LSTM模型。基于Lucene检索机制进行答案候选集提取和基于LSTM模型进行答案提取,构建了一个可根据食品安全相关问题给出答案所在句子的问答系统,并对比了基于Lucene直接检索的答案抽取和基于LSTM的答案抽取这两种方法。结果表明,当候选文档数量增加时,基于LSTM模型的问题答案匹配方法,其平均准确度始终高于基于Lucene检索方法的平均准确度;而候选句子数量较小时,基于LSTM模型的问题答案匹配方法的平均准确度也高于基于Lucene检索方法的平均准确度。
Nowadays,food safety issues have been concerned by both governments and consumers.However,the increasing number of food safety related articles makes it difficult to retrieve useful information from the articles in a short time with high accuracy.In order to improve the efficiency and accuracy of accessing food safety information,a question answering system was proposed,which was based on long short-term memory (LSTM) and information retrieval techniques.The system relied on the food safety unstructured texts obtained by Web crawler technologies,and question answer pairs were selected by using Lucene,and LSTM was used to predict answers according to matching degrees between question and candidate sentences.Based on Lucene's retrieval mechanism and the LSTM model,a question answering system was built which can select sentences that were most likely to contain the answer to given questions.The results showed that the proposed system outperformed the baseline which was only based on retrieval mechanism.Moreover,the performance analysis were made for the two methods with respect to the numbers of candidate articles and candidate sentences.
作者
陈瑛
陈昂轩
董玉博
赵筱钰
侯文俊
CHEN Ying;CHEN Angxuan;DONG Yubo;ZHAO Xiaoyu;HOU Wenjun(College of Information and Electrical Engineering,China Agricultural University,Beijing 100083,China)
出处
《农业机械学报》
EI
CAS
CSCD
北大核心
2019年第B07期380-384,共5页
Transactions of the Chinese Society for Agricultural Machinery
基金
国家自然科学基金项目(61503386)