摘要
研究文本分类、提高文本检索效率问题,针对文本特征维数过高导致神经网络收敛速度慢、文本分类精度低的难题,结合粗糙集的属性约简和神经网络的文本分类优点,提出了一种粗糙集(RS)结合BP神经网络的文本自动分类算法(RS-BPNN)。RS-BPNN首先应用粗糙集理论的属性约简对文本特征预处理,降低向量维数,然后把冗余的属性从决策表中删去,最后利用神经网络进行分类。并在MATLAB环境中进行了仿真实验,仿真结果表明,RS-BPNN方法的识别精度比传统的BP神经网络高4%左右,提高了文本分类的精度和检索效率。
Although Rough Set can get obviously categorization rules with information reduction under the premise of not influeneing the aceuraey of Text Categorization,it is sensitive to noise data.Neural Network has a strong ability to learn fuzzy data,but it can not remove uncertain and vague information and its performance is weakened because the vectors of text are very huge.A hybrid classifier is presented based on the combination of rough set theory and BP neural network.Firstly,the documents are denoted by vector space model.Secondly,the feature vector were reduced by using rough sets.Finally,the documents were classed by BP neural network.Experimental results show that the algorithm of Rough-ANN is effective for the texts classification,and has the better performance in classification precision,stability and fault-tolerance compared with the traditional BP neural networks.
出处
《计算机仿真》
CSCD
北大核心
2011年第6期219-222,283,共5页
Computer Simulation
关键词
粗糙集
神经网络
文本分类
约简
Rough sets(RS)
Neural network(NN)
Text categorization recognition
Reduction