期刊文献+

基于类名引导的弱监督文本分类

Weakly supervised text classification based on class name guidance
下载PDF
导出
摘要 针对弱监督文本分类过于依赖专家生成种子词的局限,提出一种基于类名引导生成种子词的弱监督文本分类方法。使用Skip-Gram模型学习单词的向量表示,借助vMF(von Mises Fisher)分布对用户提供的类名与语料库之间的关系进行建模,综合考虑语义相关性和语义特异性,由此生成一组高质量的种子词,无需依赖专家经验;迭代使用种子词生成伪标签和文档分类器;扩展种子词,进一步提升模型性能。在NYT和20 Newsgroups两个公开数据集上的实验结果(F1-score)表明了所提弱监督文本分类方法的有效性。 Aiming at the limitation that weakly supervised text classification relies too much on experts to generate seed words,a weakly supervised text classification method based on class name guidance to generate seed words was proposed.The vector representation of words was learned using Skip-Gram model,and with the help of vMF distribution,the relationship between class names provided by users and corpus was modeled.Considering semantic relevance and semantic specificity comprehensively,a group of high-quality seed words was generated without relying on expert experience.Seed words were used iteratively to gene-rate pseudo tags and document classifiers.The seed words were extended to further improve the performance of the model.The results of experiments(F1-score)on two public data sets of NYT and 20 Newsgroups show the effectiveness of the proposed method.
作者 周悦尧 奚雪峰 崔志明 盛胜利 仇亚进 ZHOU Yue-yao;XI Xue-feng;CUI Zhi-ming;SHENG Sheng-li;QIU Ya-jin(School of Electronic and Information Engineering,Suzhou University of Science and Technology,Suzhou 215000,China;Suzhou Key Laboratory of Virtual Reality Intelligent Interaction and Application Technology,Suzhou Science and Technology Bureau,Suzhou 215000,China;Suzhou Smart City Research Institute,Suzhou University of Science and Technology,Suzhou 215000,China;School of Computer Science,Texas Institute of Technology,Lubbock 79401,USA)
出处 《计算机工程与设计》 北大核心 2023年第8期2329-2336,共8页 Computer Engineering and Design
基金 国家自然科学基金项目(61876217、62176175) 江苏省“六大人才高峰”高层次人才基金项目(XYDXX-086) 苏州市科技计划基金项目(SGC2021078)。
关键词 弱监督 文本分类 词向量 冯米塞尔分布 语义相关性 语义特异性 深度学习 weakly supervision text classification word embedding vMF distribution semantic relevance semantic specificity deep learning
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部