期刊文献+

一种基于Hownet的词向量表示方法 被引量:11

A Word Representation Method Based on Hownet
下载PDF
导出
摘要 针对基于预训练得到的词向量在低频词语表示质量和稳定性等方面存在的缺陷,提出一种基于Hownet的词向量表示方法(H-WRL)。首先,基于义原独立性假设,将Hownet中所有N个义原指定为欧式空间的一个标准正交基,实现Hownet义原向量初始化;然后,根据Hownet中词语与义原之间的定义关系,将词语向量表示视为相关义原所张成的子空间中的投影,并提出学习词向量表示的深度神经网络模型。实验表明,基于Hownet的词向量表示在词相似度计算和词义消歧两项标准评测任务中均取得很好的效果。 Word embedding method based on pre-training still has some defects in the stability and the quality of low-frequency words.The authors propose a new word embedding method based on Hownet.First,based on the sememe independence assumption,all sememes of Hownet are specified in an Euclidean Space’s standard orthogonal basis to initialize all sememe vectors.Secondly,utilizing the relationship between word and sememe defined in the Hownet,each word vector representation can be regarded as a subspace projection by related sememes.Finally,a deep neural network model is put forward to learn word representations.The experimental results indicate that proposed word embedding method based on Hownet obtained comparable results in the two standard evaluation tasks including the word similarity computation and the word sense disambiguation.
作者 陈洋 罗智勇 CHEN Yang;LUO Zhiyong(College of Information Science,Beijing Language and Culture University,Beijing 100083;Institute of Linguistic Information Processing,Beijing Language and Culture University,Beijing 100083)
出处 《北京大学学报(自然科学版)》 EI CAS CSCD 北大核心 2019年第1期22-28,共7页 Acta Scientiarum Naturalium Universitatis Pekinensis
关键词 词向量表示 HOWNET 词语相似性计算 词义消岐 word embedding Hownet word similarity computation word sense disambiguation
  • 相关文献

参考文献2

二级参考文献20

  • 1余晓峰,刘鹏远,赵铁军.一种基于《知网》的汉语词语词义消歧方法[C]//第二届学生计算机语言学研讨会.北京:中国中文信息学会,2004. 被引量:3
  • 2刘群,李素建.基于《知网》的词汇语义相似度的计算[C].台北:第三届汉语词汇语义学研讨会,2002. 被引量:45
  • 3董振东,董强.《知网》.[DB]http://www.keenage.com. 被引量:1
  • 4Mikolov T. Word2vec Project[DB/OL]. http://code. google, com/p/word2vec/. 被引量:1
  • 5Mikolov T, Kai Chen, Greg Corrado, et al. Efficient estimation of word representations in vector space [C]//Proceedings of the ICI.R Workshop, 2013. 被引量:1
  • 6Mikolov T, Yih W, Zweig G. Linguistic regulKarities in continuous space word representations [C]//Pro ceedings of the HLT-NAACL. 2013. 被引量:1
  • 7Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their corn-positionality[C]//Proeeedings of the Advances in Neural Information Processing Systems. 2013.. 3111- 3119. 被引量:1
  • 8Huang E H, Socher R, Manning C D, et al. Impro- ving word representations via global context and mul- tiple word prototypes[C]//Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics~ Volume 1. Association for Computation- al Linguistics, 2012 ;873-882. 被引量:1
  • 9Chen X, Liu Z, Sun M. A unified model for word sense representation and disambiguation [C]//Pro- ceedings of the 2014 Conference on Empirical Meth- ods in Natural Language Processing. 2014.. 1025- 1035. 被引量:1
  • 10Mihalcea R, Corley C, Strapparava C. Corpus-based and knowledge-based measures of text semantic simi- larity[C]//Proceedings of the American Association for Artificial Intelligence MA, 2006. 被引量:1

共引文献21

同被引文献106

引证文献11

二级引证文献47

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部