摘要
随着Internet技术的不断发展,Web信息不断的变化和增长.为有效查找用户所需要的信息,需将传统的信息检索向Web信息检索方向发展.如果预先对网页文本进行分类,则面对用户的检索需求就可以在相应的类别中进行查找,这样大大提高了检索的效率.文章通过对网页进行预处理,中文分词,特征提取,再使用KNN分类算法对网页进行智能分类,并采用了PSO算法快速寻找K近邻.实验结果表明:该方法不仅减少了网页分类时间,准确率、召回率和F1标准也明显提高,有效地提高了网页智能分类的效率.
With the continuous development of Internet technology,Web Information constant change and growth.In order to effectively find the information users need,the need to traditional information retrieval to Web information retrieval direction.If the first classification of the text on the page,then the face of the user′s search needs to be carried out in the appropriate category to find,thereby greatly enhancing the retrieval efficiency.Based on the pages pretreatment,Chinese word segmentation,feature extraction,and then use the KNN classification algorithm classified intelligence on the web page,using the PSO algorithm is fast to find K neighbors.Experimental results show that the method not only reduced the web page classification time,precision、 recall and F1 standard rate and also significantly increased,effectively improving the efficiency of Web intelligence classification.
出处
《太原师范学院学报(自然科学版)》
2010年第4期55-58,共4页
Journal of Taiyuan Normal University:Natural Science Edition
基金
江苏省高科技攻关项目(BE2006357)
关键词
中文分词
特征提取
智能分类
KNN分类算法
PSO算法
Chinese participle
feature selection
intelligent classification
KNN classification arithmetic
PSO arithmetic