The exption of Chinese natural language processing(NLP)has stimulated research in the broader NLP domain.However,existing large language models have limitations in comprehending and reasoning in Chinese.This paper add...The exption of Chinese natural language processing(NLP)has stimulated research in the broader NLP domain.However,existing large language models have limitations in comprehending and reasoning in Chinese.This paper addresses these limitations by enhancing Chinese language models comprehension and reasoning capabilities while minimizing resource requirements.We propose LLaMA-LoRA,a neural prompt engineering framework that builds upon the LLaMA-13B model and incorporates the Low-Rank Adaptation(LoRA)of Large Language Models technique for refinement.Chain-of-Thought(CoT)are crucial for generating intermediate reasoning chains in language models,but their effectiveness can be limited by isolated language patterns.Erroneous reasoning resulting from conventional prompts negatively impacts model performance.Automatic prompts are introduced to encourage reasoning chain generation and accurate answer inference.Training the model with an extensive corpus of Chinese CoT data enhances its comprehension and reasoning abilities.The LLaMA-LoRA model demonstrates exceptional performance across numerous Chinese language tasks,surpassing benchmark performance achieved by related language models such as GPT-3.5,Chat-GLM,and OpenAssistant,delivering accurate,comprehensive,and professional answers.The availability of our open-source model code facilitates further research in the field of Chinese text logical reasoning thinking chains.展开更多
思维链(Chain of thought,CoT)提示使大语言模型能够按照具体推理步骤处理复杂的任务,让大语言模型在常识推理、数学逻辑推理和可解释性等方面表现出更强的能力。然而,CoT方法的主要缺点在于其对庞大语言模型的依赖,这些模型通常拥有数...思维链(Chain of thought,CoT)提示使大语言模型能够按照具体推理步骤处理复杂的任务,让大语言模型在常识推理、数学逻辑推理和可解释性等方面表现出更强的能力。然而,CoT方法的主要缺点在于其对庞大语言模型的依赖,这些模型通常拥有数百亿的参数,在大规模部署方面面临挑战。为此,本文提出一种基于思维链的大模型知识蒸馏方法,主要目标在于充分利用大型语言模型的思维推理能力,通过知识蒸馏技术,引导小模型解决复杂任务。以大型模型为教师模型,小型模型为学生模型,通过获取教师模型的推理数据来微调学生模型。通过更改数据生成方式、基于聚类的问答示例采样、示例启发式纠错以及答案的自适应生成等一系列精心设计的方法,使教师模型的生成过程更高效,生成的推理数据质量更高、数量更多,从而更好地微调学生模型,使其获得强大的推理能力,实现高效的知识蒸馏。这一研究框架旨在建立一个有效的知识传递机制,使得大模型的深度思考能够有效指导小模型,为解决复杂任务提供更为智能且高效的解决方案。通过这种方式,希望能够克服大模型部署的挑战,并促进语言模型在现实世界中的应用和进步。展开更多
基金supported by the the Science and Technology Program of Sichuan Province(Grant no.2023YFS0424)the"Open bidding for selecting the best candidates"Science and Technology Project of Chengdu(Grant no.2023-JB00-00020-GX)the National Natural Science Foundation(Grant nos.61902324,11426179,and 61872298).
文摘The exption of Chinese natural language processing(NLP)has stimulated research in the broader NLP domain.However,existing large language models have limitations in comprehending and reasoning in Chinese.This paper addresses these limitations by enhancing Chinese language models comprehension and reasoning capabilities while minimizing resource requirements.We propose LLaMA-LoRA,a neural prompt engineering framework that builds upon the LLaMA-13B model and incorporates the Low-Rank Adaptation(LoRA)of Large Language Models technique for refinement.Chain-of-Thought(CoT)are crucial for generating intermediate reasoning chains in language models,but their effectiveness can be limited by isolated language patterns.Erroneous reasoning resulting from conventional prompts negatively impacts model performance.Automatic prompts are introduced to encourage reasoning chain generation and accurate answer inference.Training the model with an extensive corpus of Chinese CoT data enhances its comprehension and reasoning abilities.The LLaMA-LoRA model demonstrates exceptional performance across numerous Chinese language tasks,surpassing benchmark performance achieved by related language models such as GPT-3.5,Chat-GLM,and OpenAssistant,delivering accurate,comprehensive,and professional answers.The availability of our open-source model code facilitates further research in the field of Chinese text logical reasoning thinking chains.
文摘思维链(Chain of thought,CoT)提示使大语言模型能够按照具体推理步骤处理复杂的任务,让大语言模型在常识推理、数学逻辑推理和可解释性等方面表现出更强的能力。然而,CoT方法的主要缺点在于其对庞大语言模型的依赖,这些模型通常拥有数百亿的参数,在大规模部署方面面临挑战。为此,本文提出一种基于思维链的大模型知识蒸馏方法,主要目标在于充分利用大型语言模型的思维推理能力,通过知识蒸馏技术,引导小模型解决复杂任务。以大型模型为教师模型,小型模型为学生模型,通过获取教师模型的推理数据来微调学生模型。通过更改数据生成方式、基于聚类的问答示例采样、示例启发式纠错以及答案的自适应生成等一系列精心设计的方法,使教师模型的生成过程更高效,生成的推理数据质量更高、数量更多,从而更好地微调学生模型,使其获得强大的推理能力,实现高效的知识蒸馏。这一研究框架旨在建立一个有效的知识传递机制,使得大模型的深度思考能够有效指导小模型,为解决复杂任务提供更为智能且高效的解决方案。通过这种方式,希望能够克服大模型部署的挑战,并促进语言模型在现实世界中的应用和进步。