随着人工智能技术的迅猛发展,大语言模型(large language models,LLMs)在自然语言处理和各种知识应用中展现了强大的能力.研究了国内大语言模型在中小学学科知识图谱自动标注中的应用,重点以义务教育阶段道德与法治学科和高中数学学科...随着人工智能技术的迅猛发展,大语言模型(large language models,LLMs)在自然语言处理和各种知识应用中展现了强大的能力.研究了国内大语言模型在中小学学科知识图谱自动标注中的应用,重点以义务教育阶段道德与法治学科和高中数学学科为例进行分析和探讨.在教育领域,知识图谱的构建对于整理和系统化学科知识具有重要意义,然而传统的知识图谱构建方法在数据标注方面存在效率低、耗费大量人工成本等问题.研究旨在通过大语言模型来解决这些问题,从而提升知识图谱构建的自动化和智能化水平.基于国内大语言模型的现状,探讨了其在学科知识图谱自动标注中的应用,以道德与法治和数学学科为例,阐述了相关方法和实验结果.首先,探讨了研究背景和意义.接着,综述了国内大语言模型的发展现状和学科知识图谱的自动标注技术.在方法与模型部分,尝试探索一种基于国内大语言模型的自动标注方法,力图完善其在学科知识图谱上的应用.还探讨了学科知识图谱人工标注方法模型,以此作为对比,评估自动标注方法的实际效果.在实验与分析部分,通过在道德与法治和数学学科的自动标注实验和对其结果的分析,发现两个学科的知识图谱自动标注均取得了较高的准确率和效率,与人工标注结果进行了深入比较分析,得出了一系列有价值的结论,验证了所提出方法的有效性和准确性.最后,对未来的研究方向进行了展望.总体而言,研究为学科知识图谱的自动标注提供了一种新的思路和方法,有望推动相关领域的进一步发展.展开更多
The Internet revolution has resulted in abundant data from various sources,including social media,traditional media,etcetera.Although the availability of data is no longer an issue,data labelling for exploiting it in ...The Internet revolution has resulted in abundant data from various sources,including social media,traditional media,etcetera.Although the availability of data is no longer an issue,data labelling for exploiting it in supervised machine learning is still an expensive process and involves tedious human efforts.The overall purpose of this study is to propose a strategy to automatically label the unlabeled textual data with the support of active learning in combination with deep learning.More specifically,this study assesses the performance of different active learning strategies in automatic labelling of the textual dataset at sentence and document levels.To achieve this objective,different experiments have been performed on the publicly available dataset.In first set of experiments,we randomly choose a subset of instances from training dataset and train a deep neural network to assess performance on test set.In the second set of experiments,we replace the random selection with different active learning strategies to choose a subset of the training dataset to train the same model and reassess its performance on test set.The experimental results suggest that different active learning strategies yield performance improvement of 7% on document level datasets and 3%on sentence level datasets for auto labelling.展开更多
文摘随着人工智能技术的迅猛发展,大语言模型(large language models,LLMs)在自然语言处理和各种知识应用中展现了强大的能力.研究了国内大语言模型在中小学学科知识图谱自动标注中的应用,重点以义务教育阶段道德与法治学科和高中数学学科为例进行分析和探讨.在教育领域,知识图谱的构建对于整理和系统化学科知识具有重要意义,然而传统的知识图谱构建方法在数据标注方面存在效率低、耗费大量人工成本等问题.研究旨在通过大语言模型来解决这些问题,从而提升知识图谱构建的自动化和智能化水平.基于国内大语言模型的现状,探讨了其在学科知识图谱自动标注中的应用,以道德与法治和数学学科为例,阐述了相关方法和实验结果.首先,探讨了研究背景和意义.接着,综述了国内大语言模型的发展现状和学科知识图谱的自动标注技术.在方法与模型部分,尝试探索一种基于国内大语言模型的自动标注方法,力图完善其在学科知识图谱上的应用.还探讨了学科知识图谱人工标注方法模型,以此作为对比,评估自动标注方法的实际效果.在实验与分析部分,通过在道德与法治和数学学科的自动标注实验和对其结果的分析,发现两个学科的知识图谱自动标注均取得了较高的准确率和效率,与人工标注结果进行了深入比较分析,得出了一系列有价值的结论,验证了所提出方法的有效性和准确性.最后,对未来的研究方向进行了展望.总体而言,研究为学科知识图谱的自动标注提供了一种新的思路和方法,有望推动相关领域的进一步发展.
基金the Deanship of Scientific Research at Shaqra University for supporting this work.
文摘The Internet revolution has resulted in abundant data from various sources,including social media,traditional media,etcetera.Although the availability of data is no longer an issue,data labelling for exploiting it in supervised machine learning is still an expensive process and involves tedious human efforts.The overall purpose of this study is to propose a strategy to automatically label the unlabeled textual data with the support of active learning in combination with deep learning.More specifically,this study assesses the performance of different active learning strategies in automatic labelling of the textual dataset at sentence and document levels.To achieve this objective,different experiments have been performed on the publicly available dataset.In first set of experiments,we randomly choose a subset of instances from training dataset and train a deep neural network to assess performance on test set.In the second set of experiments,we replace the random selection with different active learning strategies to choose a subset of the training dataset to train the same model and reassess its performance on test set.The experimental results suggest that different active learning strategies yield performance improvement of 7% on document level datasets and 3%on sentence level datasets for auto labelling.