摘要
基于T5语言大模型,本研究探索了中国特色话语的自动识别及其特征。通过在大规模语料上进行预训练和微调,本研究构建了适用于中国特色话语自动识别的T5语言大模型,提取了语义、文化和情感等多维特征,以区分中国特色话语和其他类型的文本。实验结果表明T5语言大模型在中国特色话语自动识别任务上表现出较高的准确率,特征分析揭示了中国特色话语的独特表达方式和语言特征,话语理论分析阐释了中国特色话语的构建特征。本研究方法可用于中国特色话语非结构化文本挖掘,有助于构建中国特色话语数据库、知识图谱、知识问答系统等,对于跨文化语言研究和自然语言处理具有重要的理论和实际意义。
Based on the T5 language model,this study explored automatic recognition and features of Chinese characteristic discourse.By pretraining and fine-tuning on large-scale corpora,this paper constructed a T5 language model suitable forrecognizing Chinese characteristic discourse and extracting multidimensional features such as semantics,culture,and emotion to distinguish Chinese characteristic discourse from other types of texts.Experimental results demonstrated that the T5 model exhibited high accuracy in automatic recognition of Chinese characteristic discourse.Feature analysis revealed the unique expression patterns and linguistic characteristics of Chinese characteristic discourse,while discourse theory analysis elucidated the construction features of Chinese characteristic discourse.The methodology proposed in this study can be applied to the exploration of unstructured texts related to Chinese characteristic discourse,aiding the construction of databases,knowledge graphs,and knowledge-based question answering systems for Chinese characteristic discourse.This study provided theoretical and practical implications for further researches in cultural linguistics and natural language processing.
作者
邓云华
许群爱
罗坚
DENG Yunhua;XU Qun'ai;LUO Jian
出处
《中国外语》
CSSCI
北大核心
2024年第1期58-67,共10页
Foreign Languages in China
基金
湖南省重点实验室“人工智能与精准国际传播”的阶段性成果之一
。
关键词
中国特色话语
T5语言大模型
预训练
自动识别
特征分析
Chinese characteristic discourse
T5 language model
pre-training
automatic recognition
feature analysis