摘要
剧本是一种特殊的文本结构,以人物的对话和对场景的描述信息组成文本。无监督剧本摘要是指对篇幅很长的剧本进行压缩、提取,形成能够概括剧本信息的短文本。提出了一种基于预训练模型的无监督剧本摘要方法,首先在预训练过程中通过增加对文本序列处理的预训练任务,使得预训练生成的模型能够充分考虑剧本中对话的场景描述及人物说话的情感特点,然后使用该预训练模型作为训练器计算剧本中的句间相似度,结合TextRank算法对关键句进行打分、排序,最终抽取得分最高的句子作为摘要。实验结果表明,该方法相比基准模型方法取得了更好的效果,系统性能在ROUGE评价上有显著的提高。
The script is a special text structure,which is composed of the dialogue between characters and the description of the scene.Unsupervised script summary refers to compressing and extracting a long script to form a short text that can summarize the information of the script.Therefore,this paper proposes an unsupervised script summary method based on a pre-training mo-del.By adding pre-training tasks for text sequence processing in pre-training,the generated pre-training model fully takes into account the description of the dialogue in the script and the emotional characteristics of the characters,then the model is used as a trainer to calculate the similarity between sentences and combined with the TextRank algorithm to score and sort the key sentences.Finally,the sentence with the highest score is selected as the summary.Experimental results show that the proposed method has better performance than the base model,and the performance is significantly improved in the ROUGE evaluation.
作者
苏琦
王红玲
王中卿
SU Qi;WANG Hongling;WANG Zhongqing(School of Computer Science and Technology,Soochow University,Suzhou,Jiangsu 215006,China)
出处
《计算机科学》
CSCD
北大核心
2023年第2期310-316,共7页
Computer Science
基金
国家自然科学基金(61976146)。
关键词
训练模型
预训练任务
剧本摘要
无监督
句间相似度
对话
Pre-trained model
Pre-training task
Script summary
Unsupervised
Sentence similarity
Dialogue