摘要
网页分类需要使用标记网页对分类算法进行训练,然而,对网页进行标记的过程既费时又费力.随着Web的快速发展,获得未标记网页已经变得相对容易.为了有效地利用未标记网页来提高网页分类的性能,提出了一种基于集成学习的网页分类算法,迭代运行支持向量机、中心分类器和朴素贝叶斯分类器,并对各分类器的预测进行集成,不断地从未标记集中对网页进行标记后用于训练.实验结果表明,提出的算法有效地提高了网页分类的性能.
Labeled Web pages are needed to train classification algorithm in the process of Web page classification. However,it is normally arduous,and much time is consumed to label Web pages. With the rapid development of Web,it becomes easy to collect unlabeled Web pages. For utilizing the unlabeled Web pages to improve the classification performance,a classification algorithm based on ensemble learning is proposed. It iteratively runs SVM,Centroid and Naive Bayes classifiers,and then combines their predictions to ...
出处
《郑州大学学报(理学版)》
CAS
北大核心
2009年第3期26-29,共4页
Journal of Zhengzhou University:Natural Science Edition
关键词
集成学习
支持向量机
网页分类
ensemble learning
SVM
Web page classification