摘要
关键词是概括给定文本核心主题及关键内容的一组短语。在信息过载日益严重的今天,从给定的大量文本信息中预测出具有其中心思想的关键词至关重要。因此,关键词预测作为自然语言处理的基本任务之一,受到越来越多研究者的关注。其对应方法主要包括两类:关键词抽取和关键词生成。关键词抽取是从给定文本中快速、准确地抽取文中出现过的显著性短语作为关键词。与关键词抽取不同,关键词生成既能预测出现在给定文本中的关键词,也能预测未出现在给定文本中的关键词。总而言之,这两类方法各有优劣。然而,现有的关键词生成工作大多忽视了抽取式特征可能为关键词生成模型带来的潜在收益。抽取式特征能指明原文本的重要片段,对于模型学习原文本的深层语义表示起到重要作用。因此,结合抽取式和生成式方法的优势,提出了一种新的融合多粒度抽取式特征的关键词生成模型(incorporating Multi-Granularity Extractive features for keyphrase generation,MGE-Net)。在一系列公开数据集上的实验结果表明,和近年来的关键词生成模型相比,所提模型在大多数评价指标上取得了显著的性能提升。
Keyphrase is a set of phrases that summarizes the core theme and key content of a given text.At present,information overload is becoming more and more serious,it is crucial to predict phrases with their central ideas for a given large amount of textual information.Therefore,keyphrase prediction,as one of the basic tasks of natural language processing,has received more and more attention from research scholars.Its corresponding methods mainly contain two categories,namely keyphrase extraction and keyphrase generation.Keyphrase extraction is the fast and accurate extraction of salient phrases that appear in the given text.Unlike keyphrase extraction,keyphrase generation predicts both phrases that appear in the given text and those do not appear in the given text.In summary,both have their advantages and disadvantages.However,most of the existing work on keyphrase ge-neration has ignored the potential benefits that extractive features may bring to keyphrase generation models.Extractive features can indicate important fragments of the original text and play an important role for the model to learn the deep semantic representation of the original text.Therefore,combining the advantages of extractive and generative approaches,this paper proposes a new keyphrase generation model incorporating multi-granularity extractive features(MGE-Net).Compared with recent keyphrase ge-neration models on a series of publicly available datasets,the proposed model achieves significant performance improvements in most evaluation metrics.
作者
甄田歌
宋明阳
景丽萍
ZHEN Tiange;SONG Mingyang;JING Liping(School of Computer and Information Technology,Beijing Jiaotong University,Beijing 100044,China;Beijing Key Lab of Traffic Data Analysis and Mining,Beijing Jiaotong University,Beijing 100044,China)
出处
《计算机科学》
CSCD
北大核心
2023年第4期181-187,共7页
Computer Science
基金
国家自然科学基金(61822601,61773050,61632004)
北京市自然科学基金(Z180006)
北京市科委项目(Z181100008918012)。
关键词
自然语言处理
序列到序列
关键词生成
抽取式特征
多任务学习
Natural language processing
Sequence-to-Sequence
Keyphrase generation
Extractive features
Multi-task learning