期刊文献+

基于深度模型的汉字拼写检查方法

Chinese spelling check method based on deep model
下载PDF
导出
摘要 针对汉语初学者在学习汉语时不可避免地会出现拼写错误的问题,提出一个汉语拼写检查模型,用以检测和纠正句子中的拼写错误。模型结合了汉字的视觉特征和语音特征,由一个检查网络和一个纠正网络构成。基于双向长短期记忆网络(bidirectional long short-term memory network,BiLSTM)和条件随机场(conditional random field,CRF)构成的检测网络用于检测句子中的错误字;基于BERT(bidirectional encoder representations from transformer)模型的纠正网络用以结合全局上下文信息对检测到的错误字进行纠正。最后,在CLP-2014,SIGHAN-2013和SIGHAN-2015数据集上进行实验,结果表明:相比现有的方法,提出的模型在错字检测和错字纠正上的效果均得到了提升;相比利用视觉特征,汉字的语音特征能更好地提升错字的检测效果。 This study proposes a Chinese spelling check model aimed at detecting and correcting spelling errors in sentences for beginners of Chinese as a foreign language.The model integrates visual and phonetic features of Chinese characters and consists of a detection network and a correction network.The detection network,built upon a bidirectional long short-term memory network(BiLSTM)and a conditional random field(CRF),identifies erroneous characters within sentences.The correction network,based on the bidirectional encoder representations from transformers(BERT)model,employs global contextual information to correct identified errors.The effectiveness of the proposed model is validated through experiments on the CLP-2014,SIGHAN-2013,and SIGHAN-2015 datasets.Results show that the proposed model outperforms existing methods in both detection and correction of spelling errors,with phonetic features of Chinese characters proving more effective in error detection compared to visual features.
作者 陈哲 曹阳 CHEN Zhe;CAO Yang(School of Cyberspace Security,Southeast University,Nanjing 211102,China)
出处 《南通大学学报(自然科学版)》 CAS 2023年第4期69-78,共10页 Journal of Nantong University(Natural Science Edition) 
关键词 汉语拼写检查 长短期记忆网络 条件随机场 BERT Chinese spelling check long short-term memory network conditional random field(CRF) bidirectional encoder representations from transformers(BERT)
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部