期刊文献+

基于概念格的Web文本聚类 被引量:3

Web text clustering based on concept lattice
下载PDF
导出
摘要 Web文本聚类大多是基于空间向量文本表示模型的,它没有考虑特征词之间的语义关系,并且特征词的维数非常高,造成文本语义信息的损失和时间复杂度的增加。把文本作为对象,文本中的特征词作为对应的属性,形成了基于文本的形式背景,从中提取概念来表示文本并度量文本之间的相似度,从而降低了特征词的维数,减少了计算的复杂度,取得了良好的聚类结果。 Web text clustering are mostly based on space vector text express model,the semantics relation of the terms in the text is not considered in this method and the dimension of the terms is very high,which results in the losing of text semantics and the increase of time complexity.The text is considered as object in this paper,and the term of text is abstract as the corresponding attribute.Therefore,a formal context is formed based on text,To express text and measure the similarity the authors extract the concept from formal context, Thus,the dimension of term is reduced,and the complexity of computation is decreased too,Theoretical analysis shows that the method of clustering is effective.
出处 《计算机工程与应用》 CSCD 北大核心 2008年第23期169-171,186,共4页 Computer Engineering and Applications
基金 国家自然科学基金(No.60575035 No.60673060) 江苏省自然科学基金(No.BK2004052)~~
关键词 WEB文档 聚类 概念格 约简 Web document clustering concept lattice reduce
  • 相关文献

参考文献14

  • 1Zamir O.A dynamic clustering interface to Web search results[J]. Computer Networks, 1999,31(11/16) : 1361-1374. 被引量:1
  • 2Osinski S.An algorithm for clustering of Web search result[D]. Poland: Poznan University of Technology, 2003. 被引量:1
  • 3Godoy D,Amandi A.Modeling user interests by conceptual clustering[J].Information Systems, 2006,31 : 247-265. 被引量:1
  • 4Hotho A,Staab S,Maedche A.Ontology-based text clustering[J]. Kunstliche Intelligenz, 2002,4: 48-54. 被引量:1
  • 5Flotho A,Staab S,Stumme G.Text clustering based on background knowledge[R].University of Karlsruhe,Institute AIFB,2003. 被引量:1
  • 6Bhogalb J,Macfarlane A.A review of ontology based query expansion[J].Information Processing and Management, 2006,43 : 866-886. 被引量:1
  • 7Wille R.Restructuring lattice theory:an approach based on hierarchies of concepts[M]//Rival I.Ordered Sets.Dordrecht:Reidel,1982: 445-470. 被引量:1
  • 8Kim M,Compton P.Evolutionary document management and retrieval for specialisted domains[J]Journal of Human Computer Studies, 2004,60(2) :201-241. 被引量:1
  • 9Porter M F.An algorithm for suffix stripping[J].Program, 1980,14 (3):130. 被引量:1
  • 10Li-Ping J,Hou-Kuan H,Hong-Bo S.Improved feature selection approach TFIFF in text mining[C]//IEEE Proceedings of the First International Conference on Machine Learning and Cybernetics, Beijing, 2002. 被引量:1

同被引文献33

引证文献3

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部