本文利用Python语言,对25 000条英文影评数据进行文本分类。首先利用词袋模型对文本数据进行分类。在此基础上加入Word2Vec建立新的词向量特征,通过精准率和召回率对比前后2种模型的分类效果;最后通过逻辑回归和朴素贝叶斯分类模型的分...本文利用Python语言,对25 000条英文影评数据进行文本分类。首先利用词袋模型对文本数据进行分类。在此基础上加入Word2Vec建立新的词向量特征,通过精准率和召回率对比前后2种模型的分类效果;最后通过逻辑回归和朴素贝叶斯分类模型的分类效果对照得出研究结论。结果表明:对于英文影评文本分类,在同等条件下,使用Word2Vec构建词向量模型的精准率和召回率比使用bag of Word词袋模型分别高出0.02个百分点和0.026个百分点;在使用Word2Vec的基础上,朴素贝叶斯分类器的精准率和召回率分别高出逻辑回归分类0.027个百分点和0.028个百分点。展开更多
This paper mainly presented a good simple and multi-linear regression model of verbs in the Quran book. This model, gives an analysis for the influence to frequency of words with the form (—un, ---) made by the frequ...This paper mainly presented a good simple and multi-linear regression model of verbs in the Quran book. This model, gives an analysis for the influence to frequency of words with the form (—un, ---) made by the frequency of plural present verbs (t—un, ---) or (y—un, ---), and models, and the relationship between independent variables and dependent variable by fitting a linear equation to the observed data with simple linear regression model. The matlab function is used for finding the parameters of the linear regression model and plotting the fits. The results show that the parameters of the model are one vector (1, 1) and mean of dataset is (6, 7). Its corresponding to the verb with input is frequency of the verb they enter and the frequency of enter (yadkolun ?dakilun), also other 17 points exist in the line and in the dataset of 387 verbs and their derivate verbs in Quran. The name of Allah () showed when we use tree variables and plot it in 3D with option “Show Text” for a multi regression model.展开更多
文摘本文利用Python语言,对25 000条英文影评数据进行文本分类。首先利用词袋模型对文本数据进行分类。在此基础上加入Word2Vec建立新的词向量特征,通过精准率和召回率对比前后2种模型的分类效果;最后通过逻辑回归和朴素贝叶斯分类模型的分类效果对照得出研究结论。结果表明:对于英文影评文本分类,在同等条件下,使用Word2Vec构建词向量模型的精准率和召回率比使用bag of Word词袋模型分别高出0.02个百分点和0.026个百分点;在使用Word2Vec的基础上,朴素贝叶斯分类器的精准率和召回率分别高出逻辑回归分类0.027个百分点和0.028个百分点。
文摘This paper mainly presented a good simple and multi-linear regression model of verbs in the Quran book. This model, gives an analysis for the influence to frequency of words with the form (—un, ---) made by the frequency of plural present verbs (t—un, ---) or (y—un, ---), and models, and the relationship between independent variables and dependent variable by fitting a linear equation to the observed data with simple linear regression model. The matlab function is used for finding the parameters of the linear regression model and plotting the fits. The results show that the parameters of the model are one vector (1, 1) and mean of dataset is (6, 7). Its corresponding to the verb with input is frequency of the verb they enter and the frequency of enter (yadkolun ?dakilun), also other 17 points exist in the line and in the dataset of 387 verbs and their derivate verbs in Quran. The name of Allah () showed when we use tree variables and plot it in 3D with option “Show Text” for a multi regression model.