期刊文献+

基于集成学习的网页分类算法 被引量:1

Web Page Classification Algorithm Based on Ensemble Learning
下载PDF
导出
摘要 网页分类需要使用标记网页对分类算法进行训练,然而,对网页进行标记的过程既费时又费力.随着Web的快速发展,获得未标记网页已经变得相对容易.为了有效地利用未标记网页来提高网页分类的性能,提出了一种基于集成学习的网页分类算法,迭代运行支持向量机、中心分类器和朴素贝叶斯分类器,并对各分类器的预测进行集成,不断地从未标记集中对网页进行标记后用于训练.实验结果表明,提出的算法有效地提高了网页分类的性能. Labeled Web pages are needed to train classification algorithm in the process of Web page classification. However,it is normally arduous,and much time is consumed to label Web pages. With the rapid development of Web,it becomes easy to collect unlabeled Web pages. For utilizing the unlabeled Web pages to improve the classification performance,a classification algorithm based on ensemble learning is proposed. It iteratively runs SVM,Centroid and Naive Bayes classifiers,and then combines their predictions to ...
出处 《郑州大学学报(理学版)》 CAS 北大核心 2009年第3期26-29,共4页 Journal of Zhengzhou University:Natural Science Edition
关键词 集成学习 支持向量机 网页分类 ensemble learning SVM Web page classification
  • 相关文献

参考文献4

二级参考文献16

共引文献40

同被引文献15

  • 1黄臻臻,吴扬扬.中文网页体裁分类特征项的权值调整策略[J].广西师范大学学报(自然科学版),2007,25(2):173-177. 被引量:3
  • 2Broder A. A taxonomy of Web search[ J]. SIGIR Forum,2002,36 (2) : 3 - 10. 被引量:1
  • 3Rose D E, Levinson D. Understanding user goals in Web search [C]//Proceeding of the 13th International Conference on World Wide Web. New York, : ACM, 2004:13 - 19. 被引量:1
  • 4Chakrabarti S,Dom B, Indyk P. Enhanced hypertext ealegoization u- sing hyperlinks[ C]//Proceedings of ACM SIGMOD International Con- terence on Management of Data. New York : ACM, 1998:307 -318. 被引量:1
  • 5Asirvatham A P,Ravi K K,Prakash A,et al. Web page classifica- tion based on document structure[ EB/OL]. [2014 - 11 -28]. ht- tp//citeseepx, ist. psu. edu/viewdnc/download? doi = 10. 1.1.24. 7710&rep = repl &type = pdf. 被引量:1
  • 6Cohen W W. Improving a page classifier with anchor extraction and link analysis[ C ]//Proceedings of Advances ill Neural Information Processing Systems. Cambridge: MIT Press,2002 : 1481 - 1488. 被引量:1
  • 7Kan M Y,Thi H 0 N. Fast webpage elassifieation using URL fea- tures [ C ]//Proceedings of the 14th ACM International Conference on Information and Knowledge Management. New York: ACM, 2005:325 - 326. 被引量:1
  • 8Knvaeevic M, Diligenti M, Gori M, et al. l.eeognition of common areas in a Web page using visual information: A possible applica- tion in a page classification [ C ]//Proeeedings of 2002 IEEE Inter- national Conference on Data Mining(ICDM#02). Maebashi: 1EEE Press, 2002:250 - 257. 被引量:1
  • 9Shen Dou, Sun Jiantao, Yang Qiang, et al. A comparison of implicit and explicit, links for webpage classification [ C ]//Proceedings of the 15th International Conference on World Wide Web. New York: ACM ,2006:643 - 650. 被引量:1
  • 10Elizabeth S B. Genre classification of Web documents[ D]. Fort Collins : Colorado State Univet,'sity, 2005. 被引量:1

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部