摘要
为提升中小学汉语作文中存在的表现手法分类性能,选取引入方差的TF×IWF×IWF算法对其进行特征选择。其优势在于引入方差可以表征特征词汇在各类别之间的分布均匀程度,从而进一步确定特征词的重要性。鉴于使用引入方差的TF×IWF×IWF算法形成的文本特征向量太稀疏,因此采用Word2vec模型对词汇进行特征扩展。由于单独使用Word2vec模型难以体现文本中词汇的重要程度,继而使用上述权重计算算法对词向量进行加权。提出合并以上两种方法来表征表现手法文本特征,并利用SVM分类器对表现手法文本进行分类。实验结果表明,两种方法的结合使分类精确率平均提高3%。
To raise the classification efficiency about the features of expression techni composition of primary and middle schools, the TF×IWF×1WF algorithm which introduced ques existing in Chinese the variance is utilized to select features. This algorithm indicates strong ability in representing distribution uniformity of feature words among categories by introducing variance, which further determines the importance of feature words. The Word2vec model was used to extend the vocabulary features because of the spare characteristic vectors formed by variance-introduced TF x IWF x IWF algorithm. Since the importance of the words in texts could not be distinguished by using the Word2vec model independently, the word vectors were weighted by the above weighting algorithm. Merging both methods was proposed to represent the features of the expression techniques, and support vector machine(SVM) classifier was used to classify. The experimental results verify that the accuracy in classification is increased by 3 % on average.
作者
马晓丽
刘杰
周建设
骆力明
史金生
Ma Xiaoli;Liu Jie;Zhou Jianshe;Luo Liming;Shi Jinsheng(School of Information Engineering,Capital Normal University,Belting 100048,China;Beijing Advanced Innovation Center for hnaging Technology,Capital Normal University,Beijing 100048,China)
出处
《计算机应用与软件》
北大核心
2018年第10期49-54,共6页
Computer Applications and Software
基金
国家自然科学基金项目(61371194
61672361)
北京市自然科学基金项目(4152012)
北京成像技术高精尖创新中心项目支持(BAICIT-2016004)