摘要
[目的/意义]大语言模型是人工智能领域的一项新型技术,因其强大而专业的能力已应用于多个领域,探究大语言模型能力体系及对大语言模型做出评价有助于其研究与应用。[方法/过程]收集各领域评价大语言模型榜单共20个,基于扎根理论构建大语言模型能力评价体系,选取12个大语言模型对其进行实证分析。[结果/结论]基于人类能力体系构建的大语言模型能力评价体系具有合理性与可行性,现有大语言模型能力评价中存在变量未控制、流程不规范、结果可行性存疑等问题,并给出解决对策,为大语言模型评价提供理论参考。
[Purpose/significance]Large language models represent an emerging technology in the field of artificial intelligence.Due to their powerful and specialized capabilities,they have been applied across various domains.Investigating the capabilities of large language models and evaluating them is beneficial for both research and application.[Method/process]This study collects 20 lists of evaluating large language models from different domains and constructs an evaluation framework for these models based on grounded theory,and empirically analyzes 12 selected large language models.[Result/conclusion]The evaluation system for large language model capabilities built on the basis of human capability system is reasonable and feasible.Current evaluations of large language model capabilities show issues such as uncontrolled variables,non-standardized processes,and doubtful feasibility of results.The study provides solutions to these problems and offers a theoretical reference for the evaluation of large language models.
作者
符鹏
杨海平
Fu Peng;Yang Haiping(School of Information Management,Nanjing University,Nanjing Jiangsu 210023)
出处
《情报探索》
2024年第11期34-40,共7页
Information Research
关键词
大语言模型
人工智能
体系构建
扎根理论
large language model
artificial intelligence
system construction
grounded theory