大语言模型(large language model,LLM),包括ChatGPT,在理解和响应人类指令方面表现突出,对自然语言问答影响深远。然而,由于缺少针对垂直领域的训练,LLM在垂直领域的表现并不理想。此外,由于对硬件的高要求,训练和部署LLM仍然具有一定...大语言模型(large language model,LLM),包括ChatGPT,在理解和响应人类指令方面表现突出,对自然语言问答影响深远。然而,由于缺少针对垂直领域的训练,LLM在垂直领域的表现并不理想。此外,由于对硬件的高要求,训练和部署LLM仍然具有一定困难。为了应对这些挑战,以中医药方剂领域的应用为例,收集领域相关数据并对数据进行预处理,基于LLM和知识图谱设计了一套垂直领域的问答系统。该系统具备以下能力:(1)信息过滤,过滤出垂直领域相关的问题,并输入LLM进行回答;(2)专业问答,基于LLM和自建知识库来生成更具备专业知识的回答,相比专业数据的微调方法,该技术无需重新训练即可部署垂直领域大模型;(3)抽取转化,通过强化LLM的信息抽取能力,利用生成的自然语言回答,从中抽取出结构化知识,并和专业知识图谱匹配以进行专业验证,同时可以将结构化知识转化成易读的自然语言,实现了大模型与知识图谱的深度结合。最后展示了该系统的效果,并通过专家主观评估与选择题客观评估两个实验,从主客观两个角度验证了系统的性能。展开更多
Knowledge Discovery in Databases is gaining attention and raising new hopes for traditional Chinese medicine (TCM) researchers. It is a useful tool in understanding and deciphering TCM theories. Aiming for a better ...Knowledge Discovery in Databases is gaining attention and raising new hopes for traditional Chinese medicine (TCM) researchers. It is a useful tool in understanding and deciphering TCM theories. Aiming for a better understanding of Chinese herbal property theory (CHPT), this paper performed an improved association rule learning to analyze semistructured text in the book entitled Shennong's Classic of Materia Medica. The text was firstly annotated and transformed to well-structured multidimensional data. Subsequently, an Apriori algorithm was employed for producing association rules after the sensitivity analysis of parameters. From the confirmed 120 resulting rules that described the intrinsic relationships between herbal property (qi, flavor and their combinations) and herbal efficacy, two novel fundamental principles underlying CHPT were acquired and further elucidated: (1) the many-to-one mapping of herbal efficacy to herbal property; (2) the nonrandom overlap between the related efficacy of qi and flavor. This work provided an innovative knowledge about CHPT, which would be helpful for its modern research.展开更多
文摘大语言模型(large language model,LLM),包括ChatGPT,在理解和响应人类指令方面表现突出,对自然语言问答影响深远。然而,由于缺少针对垂直领域的训练,LLM在垂直领域的表现并不理想。此外,由于对硬件的高要求,训练和部署LLM仍然具有一定困难。为了应对这些挑战,以中医药方剂领域的应用为例,收集领域相关数据并对数据进行预处理,基于LLM和知识图谱设计了一套垂直领域的问答系统。该系统具备以下能力:(1)信息过滤,过滤出垂直领域相关的问题,并输入LLM进行回答;(2)专业问答,基于LLM和自建知识库来生成更具备专业知识的回答,相比专业数据的微调方法,该技术无需重新训练即可部署垂直领域大模型;(3)抽取转化,通过强化LLM的信息抽取能力,利用生成的自然语言回答,从中抽取出结构化知识,并和专业知识图谱匹配以进行专业验证,同时可以将结构化知识转化成易读的自然语言,实现了大模型与知识图谱的深度结合。最后展示了该系统的效果,并通过专家主观评估与选择题客观评估两个实验,从主客观两个角度验证了系统的性能。
文摘Knowledge Discovery in Databases is gaining attention and raising new hopes for traditional Chinese medicine (TCM) researchers. It is a useful tool in understanding and deciphering TCM theories. Aiming for a better understanding of Chinese herbal property theory (CHPT), this paper performed an improved association rule learning to analyze semistructured text in the book entitled Shennong's Classic of Materia Medica. The text was firstly annotated and transformed to well-structured multidimensional data. Subsequently, an Apriori algorithm was employed for producing association rules after the sensitivity analysis of parameters. From the confirmed 120 resulting rules that described the intrinsic relationships between herbal property (qi, flavor and their combinations) and herbal efficacy, two novel fundamental principles underlying CHPT were acquired and further elucidated: (1) the many-to-one mapping of herbal efficacy to herbal property; (2) the nonrandom overlap between the related efficacy of qi and flavor. This work provided an innovative knowledge about CHPT, which would be helpful for its modern research.