期刊文献+

大语言模型能力评价体系构建及问题研究

Construction and Investigation of Evaluation System for Large Language Model Capabilities
下载PDF
导出
摘要 [目的/意义]大语言模型是人工智能领域的一项新型技术,因其强大而专业的能力已应用于多个领域,探究大语言模型能力体系及对大语言模型做出评价有助于其研究与应用。[方法/过程]收集各领域评价大语言模型榜单共20个,基于扎根理论构建大语言模型能力评价体系,选取12个大语言模型对其进行实证分析。[结果/结论]基于人类能力体系构建的大语言模型能力评价体系具有合理性与可行性,现有大语言模型能力评价中存在变量未控制、流程不规范、结果可行性存疑等问题,并给出解决对策,为大语言模型评价提供理论参考。 [Purpose/significance]Large language models represent an emerging technology in the field of artificial intelligence.Due to their powerful and specialized capabilities,they have been applied across various domains.Investigating the capabilities of large language models and evaluating them is beneficial for both research and application.[Method/process]This study collects 20 lists of evaluating large language models from different domains and constructs an evaluation framework for these models based on grounded theory,and empirically analyzes 12 selected large language models.[Result/conclusion]The evaluation system for large language model capabilities built on the basis of human capability system is reasonable and feasible.Current evaluations of large language model capabilities show issues such as uncontrolled variables,non-standardized processes,and doubtful feasibility of results.The study provides solutions to these problems and offers a theoretical reference for the evaluation of large language models.
作者 符鹏 杨海平 Fu Peng;Yang Haiping(School of Information Management,Nanjing University,Nanjing Jiangsu 210023)
出处 《情报探索》 2024年第11期34-40,共7页 Information Research
关键词 大语言模型 人工智能 体系构建 扎根理论 large language model artificial intelligence system construction grounded theory
  • 相关文献

参考文献10

二级参考文献66

共引文献322

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部