摘要
利用最大熵模型和BP神经网络对《史记》古文与现代文译文的平行语料进行短句对齐研究。最大熵模型将短句长度、短句对齐模式和共现汉字特征相结合来对平行语料进行短句对齐;BP神经网络则把短句长度、短句位置和共现汉字特征相结合来对平行语料进行短句对齐。实验结果表明:同时考虑短句长度、短句对齐模式和共现汉字3个特征的最大熵模型,短句对齐的准确率和召回率是最高的;并且最大熵模型的准确率和召回率高于BP神经网络。
Clauses are aligned for Shi Ji ancient and modern parallel corpora using maximum entropy model and Back Propagation neural network model. Maximum entropy model combines clause length, clause alignment mode with co-occurring Chinese word feature. Back Propagation neural network model combines clause length, clause position with co-occurring Chinese word feature. The precision and the recall rate of clause alignment are highest when it uses the three features for maximum entropy model. The precision and the recall rate of maximum entropy model are higher than those of Back Propagation neural network model.
出处
《计算机工程与应用》
CSCD
北大核心
2015年第7期112-117,共6页
Computer Engineering and Applications
基金
国家自然科学基金(No.61171114)
教育部自主科研项目(No.20111081010)
关键词
短句对齐
最大熵模型
BP神经网络
《史记》
clause alignment
maximum entropy model
Back Propagation neural network model