摘要
[目的/意义]乡村振兴战略给农业技术推广提出新的要求,使农业推广知识的供给形式有待进一步创新。以果蔬农技知识服务为需求导向,基于前沿大语言模型技术,面向新型农业知识导读和知识问答等农技推广服务,构建果蔬农技知识智能问答系统。[方法]基于草莓种植户需求分析,把草莓栽培农技知识划分为不同主题,形成知识对象识别和知识问答两种大模型下游任务,结合机器自动标注和人工标注的方法构建小样本高质量训练语料;通过对比已有的4种大语言模型:Baichuan2-13B-Chat、Chat GLM2-6B、Llama-2-13B-Chat、Chat GPT的性能表现,选择性能最优的模型作为基础模型,按照“优质语料+预训练大模型+微调”的研究思路,训练具有语义分析、上下文关联和生成能力,能够适应多种下游任务的深度神经网络,构建农业知识问答大模型;采用数据优化、检索增强生成技术等多种策略缓解大模型幻觉问题;研发果蔬农技知识智能问答系统,生成高精度、无歧义的农业知识答案,同时支持用户多轮问答。[结果和讨论]以精准率和召回率为命名实体识别任务的性能表现指标,参与测评的国内主流模型在微调后不同知识主题下的平均精准率均超过85%,平均召回率表现各异,其中知识实体类型的数量、标注语料数量等因素都会影响大模型性能;以幻觉率和语义相似度为知识问答任务的性能表现指标,数据优化、采用检索增强生成技术等策略以10%~40%的幅度有效降低大模型幻觉率,并有效提高大模型的语义相似度。[结论]在农业领域的命名实体识别和知识问答任务中,预训练大模型Chat GLM的表现性能最优。针对预训练大模型下游任务的微调和基于检索增强生成(Retrieval-Augmented Generation,RAG)技术的模型优化可以缓解大模型幻觉问题,显著提升大模型性能。大模型技术具有创新农技知识服务模式、
[Objective]The rural revitalization strategy presents novel requisites for the extension of agricultural technology.However,the conventional method encounters the issue of a contradiction between supply and demand.Therefore,there is a need for further innovation in the supply form of agricultural knowledge.Recent advancements in artificial intelligence technologies,such as deep learning and large-scale neural networks,particularly the advent of large language models(LLMs),render anthropomorphic and intelligent agricultural technology extension feasible.With the agricultural technology knowledge service of fruit and vegetable as the demand orientation,the intelligent agricultural technology question answering system was built in this research based on LLM,providing agricultural technology extension services,including guidance on new agricultural knowledge and question-and-answer sessions.This facilitates farmers in accessing high-quality agricultural knowledge at their convenience.[Methods]Through an analysis of the demands of strawberry farmers,the agricultural technology knowledge related to strawberry cultivation was categorized into six themes:basic production knowledge,variety screening,interplanting knowledge,pest diagnosis and control,disease diagnosis and control,and drug damage diagnosis and control.Considering the current situation of agricultural technology,two primary tasks were formulated:named entity recognition and question answering related to agricultural knowledge.A training corpus comprising entity type annotations and question-answer pairs was constructed using a combination of automatic machine annotation and manual annotation,ensuring a small yet high-quality sample.After comparing four existing Large Language Models(Baichuan2-13B-Chat,ChatGLM2-6B,Llama 2-13B-Chat,and ChatGPT),the model exhibiting the best performance was chosen as the base LLM to develop the intelligent question-answering system for agricultural technology knowledge.Utilizing a highquality corpus,pre-training of a Large Lang
作者
王婷
王娜
崔运鹏
刘娟
WANG Ting;WANG Na;CUI Yunpeng;LIU Juan(Agricultural Information Institute,Chinese Academy of Agricultural Sciences,Beijing 100081,China;Key Laborato‐ry of Big Agri-data,Ministry of agriculture and rural areas,Beijing 100081,China;Unit 96962,Beijing 102206,China)
出处
《智慧农业(中英文)》
CSCD
2023年第4期105-116,共12页
Smart Agriculture
基金
北京市数字农业创新团队项目(BAIC10-2023)
中国农业科学院基本科研业务费项目(JBYW-AII-2023-31)
国家重点研发计划项目(2022YFF0711902)。
关键词
大模型
生成式预训练变换器
农技知识
智能问答
命名实体识别
LLM
generative pre-trained transformer
agricultural technology knowledge
intelligent question answering
name entity identity