摘要
由于作者归属问题较为复杂,采用传统自然语言处理模型难以完成作者识别.为了深入挖掘作者归属问题,首先采用降噪自编码器深度模型提取文本结构特征,再采用支持向量机分类器完成作者识别.模型的优势在于能够考虑未知文本特征的噪声多样性和复杂性,且能够重构添加噪声的原始文本输入.将该方法应用于吴承恩、王廷陈、薛蕙等人的诗词作者识别,识别准确率最高为78.2%,验证了该方法的有效性,进一步将该方法应用于《西游记》诗词作者识别.
Because of the complexity of the author′s attribution,it is difficult to use the traditional natural language processing model to complete the authorship identification.To discover the author′s attribution,we use the deep model of the denoising autoencoder to analyze the text structure and identify the author′s writing style in the text,and the SVM classifier is used to accomplish the recognition of authors.The advantage of the model lies in considering the noise diversity and complexity of unknown text features,and it can reconstruct the original text input with noise.This method is applied to the recognition of poetry authors such as Wu Chengen,Wang Tingchen,Xue Hui,etc.The most accuracy of recognition is 78.2%,it verifies the validity of the method.Furthermore this method is applied to the identification of poetry authors in"Journey to the West".
作者
范亚超
罗天健
周昌乐
FAN Yachao;LUO Tianjian;ZHOU Changle(Fujian Keylab of the Brain-like Computing and Applications, School of Information Science and Engineering,Xiamen University,Xiamen 361005,China)
出处
《厦门大学学报(自然科学版)》
CAS
CSCD
北大核心
2018年第6期884-889,共6页
Journal of Xiamen University:Natural Science
基金
国家自然科学基金(61673322,61573294)。
关键词
降噪自编码器
编码特征
作者识别
denoising autoencoder
code feature
authorship identification