摘要
新闻主题文本内容简短却含义丰富,传统方法通常只考虑词粒度或句粒度向量中的一种进行研究,未能充分利用新闻主题文本不同粒度向量之间的关联信息。为深入挖掘文本的词向量和句向量间的依赖关系,提出一种基于XLNet和多粒度特征对比学习的新闻主题分类方法。首先,利用XLNet对新闻主题文本进行特征提取获得文本中词、句粒度的特征表示和潜在空间关系;然后,通过对比学习R-Drop策略生成不同粒度特征的正负样本对,以一定权重对文本的词向量-词向量、词向量-句向量和句向量-句向量进行特征相似度学习,使模型深入挖掘出字符属性和语句属性之间的关联信息,提升模型的表达能力。在THUCNews、Toutiao和SHNews数据集上进行实验,实验结果表明,与基准模型相比,所提方法在准确率和F 1值上都有更好的表现,在三个数据集上的F 1值分别达到了93.88%、90.08%、87.35%,验证了方法的有效性和合理性。
News topic text was typically concise but rich in meaning.However,traditional methods in most studies often only considered one type of granularity vector,either word or sentence-level,and failed to fully utilize the correlated information among different granularity vectors of news topic text.To address this issue and explore the dependence relationship between word vectors and sentence vectors in texts,a news topic classification method based on XLNet and multi-granularity feature contrastive learning was proposed.Firstly,features were extracted from the news topic text using XLNet to obtain the feature representations and potential spatial relationships of words and sentences in the text.Then,positive and negative sample pairs of different granularity features were generated using the R-Drop strategy in contrastive learning.Feature similarity learning was conducted on the word-word embedding,word-sentence embedding,and sentence-sentence embedding with certain weights,allowing the model to more deeply explore the related information between character attributes and sentence attributes,thereby enhancing the model′s expression ability.Experiments were conducted on THUCNews,Toutiao,and SHNews datasets,the results showed that the proposed method outperformed other methods in terms of accuracy and F 1 value,with F 1 values reached 93.88%,90.08%,and 87.35%respectively,thus verifying the effectiveness and rationality of the proposed method.
作者
陈敏
王雷春
徐瑞
史含笑
徐渺
CHEN Min;WANG Leichun;XU Rui;SHI Hanxiao;XU Miao(College of Computer Science,Hubei University,Wuhan 430062,China)
出处
《郑州大学学报(理学版)》
CAS
北大核心
2025年第2期16-23,共8页
Journal of Zhengzhou University:Natural Science Edition
基金
国家自然科学基金项目(62106069)。