摘要
为了提高特殊类型短文本分类准确度和降低特征维度,提出了基于改进TF-IDF方法融合二进制灰狼优化的短文本分类。为了提高特征向量文本权重计算准确度,提出了点赞排列因子,并融合了文本特征集中度,对附有点赞数的特殊类型文本进行权重计算,设计改进了TF-IDF-RANK方法对特征进行加权;同时,基于初选特征向量,设计优化了二进制灰狼优化算法(BGWO)搜寻最优特征子集,引入衰减系数向量和多优解迭代机制,提高灰狼搜寻性能。结果表明,该方法有效地提升了权重准确率,更好地表征初选特征向量,增强特征选择时寻找全局最优解的能力,进而提高短文本的分类效果。通过LABIC和抖音开放平台数据集测试,综合指标F1值分别提高了14.76%和14.02%,验证了该方法对于特殊类型文本分类的有效性。
To improve the classification accuracy and decrease the feature dimension of special type short text,short text classification based on improved TF-IDF method integrated with Binary Gray Wolf Optimization(BGWO)is proposed.To improve the accuracy of feature vector text weight calculation,likes ranking factor is proposed,and text feature concentration is integrated to calculate the weight of special types of text with a number of likes,and the improved TF-IDF-RANK is designed to weight the features.Meanwhile,based on the initial selection of feature vectors,the BGWO algorithm is designed and optimized to search for the optimal feature subset,and the attenuation coefficient vector and multi-optimal solution iteration mechanism are introduced to improve the performance of gray wolf search.The results show that the proposed method effectively improves the weighting accuracy,better characterizes the primitive feature vectors,enhances the ability to find the global optimal solution during feature selection,and thus improves the classification effect of short text.Tested by LABIC and Tiktok open platform dataset,the F1 value of the comprehensive index is improved by 14.76%and 14.02%respectively,which verifies the effectiveness of the proposed method for the classification of special types of text.
作者
杨东
毋涛
赵雪青
李猛
YANG Dong;WU Tao;ZHAO Xue-qing;LI Meng(School of Computer Science,Xi’an Polytechnic University,Xi’an 710048,China)
出处
《计算机技术与发展》
2024年第8期37-41,共5页
Computer Technology and Development
基金
国家自然科学基金青年项目(61806160)。