期刊文献+

基于双向长短期记忆神经网络的老挝语分词方法 被引量:17

A Lao word segmentation method based on bidirectional long-short term memory neural network model
下载PDF
导出
摘要 作为语言最小独立运行且有意义的单位,将连续型的老挝语划分成词是非常有必要的。提出一种基于双向长短期记忆BLSTM神经网络模型的老挝语分词方法,使用包含913 487个词的人工分词语料来训练模型,将老挝语分词任务转化为基于音节的序列标注任务,即将老挝语音节标注为词首(B)、词中(M)、词尾(E)和单独成词(S)4个标签。首先将老挝语句子划分成音节并训练成向量,然后把这些向量作为BLSTM神经网络模型的输入来预估该音节所属标签,再使用序列推断算法确定其标签,最后使用人工标注的分词语料进行实验。实验表明,基于双向长短期记忆神经网络的老挝语分词方法在准确率上达到了87.48%,效果明显好于以往的分词方法。 It is necessary to divide the continuous Lao language into words,which are the smallest independent and meaningful unit of language.We propose a Lao word segmentation method based on bidirectional long-short term memory(BLSTM)neural network model.The model is trained from a Lao corpus that contains 913487 manually tagged words.In this model,the Lao word segmentation task can be transformed into a syllable-based sequential tagging task,in which a Lao syllable is labeled as four tags:begin-word(B),middle-word(M),end-word(E)and single-word(S).Firstly,Lao sentences are divided into syllables and the syllables are trained into vectors.Secondly,as the input of the BLSTM neural network model,these vectors are used to predict the label of the syllable.Thirdly,the sequence inference algorithm is used to determine the label of the syllable.We carry out experiments on the manually labeled word-segmentation corpus.Experimental results show that the proposal has an accuracy of 87.48%,which is obviously better than that of existing word segmentation methods.
作者 何力 周兰江 周枫 郭剑毅 HE Li;ZHOU Lan-jiang;ZHOU Feng;GUO Jian-yi(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China)
出处 《计算机工程与科学》 CSCD 北大核心 2019年第7期1312-1317,共6页 Computer Engineering & Science
基金 国家自然科学基金(61662040,61562049)
关键词 神经网络 音节 双向长短期记忆 老挝语分词 neural network syllable bidirectional long-short term memory Lao word segmentation
  • 相关文献

参考文献4

二级参考文献28

  • 1刘群,张华平,俞鸿魁,程学旗.基于层叠隐马模型的汉语词法分析[J].计算机研究与发展,2004,41(8):1421-1429. 被引量:198
  • 2H Y Tan. Chinese place automatic recognition research. In: C N Huang, Z D Dong, eds. Proc of Computational Language.Beijing: Tsinghua University Press, 1999 被引量:1
  • 3Zhang Huaping, Liu Qun, Zhang Hao, et al. Automatic recognition of Chinese unknown words recognition. First SIGHAN Workshop Attached with the 19th COLING, Taipei, 2002 被引量:1
  • 4S R Ye, T S Chua, J M Liu. An agent-based approach to Chinese named entity recognition. The 19th Int'l Conf on Computational Linguistics, Taipei, 2002 被引量:1
  • 5J Sun, J F Gao, L Zhang, et al. Chinese named entity identification using class-based language model. The 19th Int'l Conf on Computational Linguistics, Taipei, 2002 被引量:1
  • 6Lawrence R Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proc of IEEE, 1989,77(2): 257~286 被引量:1
  • 7Shai Fine, Yoram Singer, Naftali Tishby. The hierarchical hidden Markov model: Analysis and applications. Machine Learning,1998, 32(1): 41~62 被引量:1
  • 8Richard Sproat, Thomas Emerson. The first international Chinese word segmentation bakeoff. The First SIGHAN Workshop Attached with the ACL2003, Sapporo, Japan, 2003. 133~143 被引量:1
  • 9J Hockenmaier, C Brew. Error-driven learning of Chinese word segmentation. In: J Guo, K T Lua, J Xu, eds. The 12th Pacific Conf on Language and Information, Singapore, 1998 被引量:1
  • 10Andi Wu, Zixin Jiang. Word segmentation in sentence analysis.1998 Int'l Conf on Chinese Information Processing, Beijing, 1998 被引量:1

共引文献208

同被引文献129

引证文献17

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部