期刊文献+
共找到3篇文章
< 1 >
每页显示 20 50 100
A Semi—Structured Document Model for Text Mining 被引量:5
1
作者 杨建武 陈晓鸥 《Journal of Computer Science & Technology》 SCIE EI CSCD 2002年第5期603-610,共8页
A semi-structured document has more structured information compared to anordinary document, and the relation among semi-structured documents can be fully utilized. Inorder to take advantage of the structure and link i... A semi-structured document has more structured information compared to anordinary document, and the relation among semi-structured documents can be fully utilized. Inorder to take advantage of the structure and link information in a semi-structured document forbetter mining, a structured link vector model (SLVM) is presented in this paper, where a vectorrepresents a document, and vectors' elements are determined by terms, document structure andneighboring documents. Text mining based on SLVM is described in the procedure of K-meansfor briefness and clarity: calculating document similarity and calculating cluster center. Theclustering based on SLVM performs significantly better than that based on a conventional vectorspace model in the experiments, and its F value increases from 0.65-0.73 to 0.82-0.86. 展开更多
关键词 HTML语言 XML语言 半结构文件模型 版本开采 结构信息
原文传递
A Chinese Web Page Clustering Algorithm Based on the Suffix Tree 被引量:4
2
作者 YANGJian-wu 《Wuhan University Journal of Natural Sciences》 EI CAS 2004年第5期817-822,共6页
In this paper, an improved algorithm, named STC-I, is proposed for Chinese Web page clustering based on Chinese language characteristics, which adopts a new unit choice principle and a novel suffix tree construction p... In this paper, an improved algorithm, named STC-I, is proposed for Chinese Web page clustering based on Chinese language characteristics, which adopts a new unit choice principle and a novel suffix tree construction policy. The experimental results show that the new algorithm keeps advantages of STC, and is better than STC in precision and speed when they are used to cluster Chinese Web page. Key words clustering - suffix tree - Web mining CLC number TP 311 Foundation item: Supported by the National Information Industry Development Foundation of ChinaBiography: YANG Jian-wu (1973-), male, Ph. D, research direction: information retrieval and text mining. 展开更多
关键词 CLUSTERING suffix tree Web mining
下载PDF
Incremental Training for SVM-Based Classification with Keyword Adjusting
3
作者 SUNJin-wen YANGJian-wu LUBin XIAOJian-guo 《Wuhan University Journal of Natural Sciences》 EI CAS 2004年第5期805-811,共7页
This paper analyzed the theory of incremental learning of SVM (support vector machine) and pointed out it is a shortage that the support vector optimization is only considered in present research of SVM incremental le... This paper analyzed the theory of incremental learning of SVM (support vector machine) and pointed out it is a shortage that the support vector optimization is only considered in present research of SVM incremental learning. According to the significance of keyword in training, a new incremental training method considering keyword adjusting was proposed, which eliminates the difference between incremental learning and batch learning through the keyword adjusting. The experimental results show that the improved method outperforms the method without the keyword adjusting and achieve the same precision as the batch method. Key words SVM (support vector machine) - incremental training - classification - keyword adjusting CLC number TP 18 Foundation item: Supported by the National Information Industry Development Foundation of ChinaBiography: SUN Jin-wen (1972-), male, Post-Doctoral, research direction: artificial intelligence, data mining and system integration. 展开更多
关键词 SVM (support vector machine) incremental training CLASSIFICATION keyword adjusting
下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部