摘要
大语言模型在教育领域中的应用备受瞩目,为改善传统的教育评价方式提供了新的技术契机。本研究以英语写作评分与反馈为例,初步探讨了大语言模型在过程性评价中的应用,旨在评估少样本学习条件下这些模型被学校教师采用的可能性。为明确何种提示语(prompts)方案能够有效提升大语言模型在处理英语写作评分与反馈任务中的可行性与可靠性,本研究采用了梯度的提示语设计方法。结果显示,“作文题目+评分标准+人工评分样本”的提示语类型与人工评分的一致性最高。基于此,GPT-3.5和GPT-4分别对166份测试集写作样本进行评分与反馈,并通过皮尔逊相关系数、相邻一致性、精确一致性、二次加权卡帕系数等多个指标进行了验证。研究发现,GPT-4在评分的准确性和一致性方面表现均优于GPT-3.5,但鉴于其低成本优势,GPT-3.5也具有一定的可行性。此外,大语言模型可以提供详细的写作反馈,反馈质量得到专家组的一致认可。因此,本研究认为,教师在日常教学和评价中积极探索和应用大语言模型,尤其是教育资源薄弱的学校教师,可以借助这一工具,提升教育质量,以缩小与优质学校的差距。
The application of large language models(LLMs) in education has attracted much attention,providing new opportunities for improving traditional assessment and evaluation methods.Taking English writing scoring and feedback as an example,this study preliminarily explored the application of LLMs in formative assessment,aiming to evaluate the possibility of these models being adopted by schoolteachers under the condition of small sample learning.In order to clarify what kind of prompts can effectively improve the feasibility and reliability of LLMs in dealing with English writing scoring and feedback tasks,this study adopted a step-by-step prompts design.Results showed that prompts composed of“essay prompt + scoring rubric + human-scored samples”aligned most closely with human scoring.Based on this,GPT-3.5 and GPT-4scored and provided feedback on 166 writing samples of the test respectively,and verified by Pearson correlation coefficient,adjacent consistency,accurate consistency,quadratic weighted kappa coefficient and other indicators.The study found that GPT-4 performed better than GPT-3.5 in terms of accuracy and consistency,but GPT-3.5 was also feasible in view of its low cost.In addition,LLMs can provide detailed writing feedback,and the quality of the feedback has been unanimously recognized by a panel of experts.Hence,this study believes that teachers should actively explore LLMs in daily teaching and evaluation,especially those with scarce educational resources can use this tool to improve the education quality and keep up with highquality schools.
作者
黄晓婷
郭丽婷
Huang Xiaoting;Guo Liting
出处
《教育学术月刊》
北大核心
2024年第7期74-80,共7页
Education Research Monthly
基金
教育部人文社会科学重点研究基地重大项目“数字化背景下的教育质量评价研究”(编号:22JJD880002)。
关键词
大语言模型
提示语工程
过程性评价
英语写作评分
英语写作反馈
Large Language Models(LLMs)
prompt project
formative assessment
scoring English writing
feedback on English writing