期刊文献+

图像-文本相关性挖掘的Web图像聚类方法 被引量:10

Clustering Web Images by Correlation Mining of Image-Text
下载PDF
导出
摘要 为了实现Web图像检索结果的聚类,提出了一种Web图像的图聚类方法.首先定义了两种类型关联:单词与图像结点之间的异构链接以及单词结点之间的同构链接.为了克服传统的TF-IDF方法不能直接反映单词与图像之间的语义关联局限性,提出并定义了单词可见度(visibility)这一属性,并将其集成到传统的tf-idf模型中以挖掘单词-图像之间关联的权重.根据LDA(latent Dirichlet allocation)模型,单词-单词之间关联权重通过一个定义的主题相关度函数来计算.最后,应用复杂图聚类和二部图协同谱聚类等算法验证了在图模型上引入两种相关性关联的有效性,达到了改进了Web图像聚类性能的目的. To cluster the retrieval results of Web image, a framework for the clustering is proposed in this paper. It explores the surrounding text to mine the correlations between words and images and therefore the correlations are used to improve clustering results. Two kinds of correlations, namely word to image and word to word correlations, are mainly considered. As a standard text process technique, tf-idf method cannot measure the correlation of word to image directly. Therefore, this paper proposes to combine tf-idf method with a feature of word, namely visibility, to infer the correlation of word to image. Through LDA model, it defines a topic relevance function to compute the weights of word to word correlations. Finally, complex graph clustering and spectral co-clustering algorithms are used to testify the effect of introducing visibility and topic relevance into image clustering. Encouraging experimental results are reported in this paper.
出处 《软件学报》 EI CSCD 北大核心 2010年第7期1561-1575,共15页 Journal of Software
基金 国家自然科学基金Nos.60603096 60533090 国家高技术研究发展计划(863)No.2006AA010107 长江学者和创新团队发展计划Nos.IRT0652 PCSIRT~~
关键词 图聚类 复杂图 可见度 LDA(latent DIRICHLET allocation) 谱聚类 graph clustering complex graph visibility latent Dirichlet allocation spectral clustering
  • 相关文献

参考文献2

二级参考文献34

  • 1张启蕊,张凌,董守斌,谭景华.训练集类别分布对文本分类的影响[J].清华大学学报(自然科学版),2005,45(S1):1802-1805. 被引量:26
  • 2曾雪强,王明文,陈素芬.一种基于潜在语义结构的文本分类模型[J].华南理工大学学报(自然科学版),2004,32(z1):99-102. 被引量:27
  • 3苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量:384
  • 4Fabrizio Sebastiani. Text categorization//Alessandro Zanasi. Text Mining and its Applications. Southampton, UK: WIT Press, 2005:109-129 被引量:1
  • 5Fabrizio Sebastiani. Machine learning in automated text categorization. ACM Computing Surveys, 2002, 34(1): 1-47 被引量:1
  • 6Moschitti A, Basili R. Complex linguistic features for text classification: A comprehensive study//McDonald S, Tait J. Proceedings of the ECIR-04. Sunderland: Springer-Verlag. Sunderland, U. K., 2004:181-196 被引量:1
  • 7Kehagias A, Petridis V, Kaburlasos V G, Fragkou P. A comparison of word- and sense- based text categorization using several classification algorithms. Journal of Intelligent Information Systems, 2003, 21(3): 227-247 被引量:1
  • 8Deerwester S, Dumais S T, Furnas et al. Indexing by latent semantic indexing. Journal of the American Society for Information Science, 1990, 41(6): 391-407 被引量:1
  • 9Thomas Hofmann. Probabilistic latent semantic indexing// Proceedings of the SIGIR. Berkeley, CA, USA, 1999:50-57 被引量:1
  • 10Schutze H, Hull D A et al, A comparison of classifiers and document representations for the routing problem//Proceedings of the SIGIR-95. Seattle, Washington, USA, 1995: 229-237 被引量:1

共引文献135

同被引文献90

引证文献10

二级引证文献29

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部