期刊文献+

一种基于DOM的Web文档主题划分方法

TOPIC SEGMENTATION OF WEB DOCUMENT BASED ON DOM
下载PDF
导出
摘要 主题划分是多主题文档自动摘要中的一个重要问题,提出了一种以网页结构为指导,利用页面对应DOM树中节点的自然分割功能以及相邻边界节点语义相似度的比较进行网页主题划分的方法。实验结果表明该方法具有较高的划分准确率,在此基础上抽取的网页摘要可显著增加文摘内容对原文的覆盖率、有效解决Web文档摘要分布不平衡问题。 Topic partition is a significant problem in automatic abstracting system of multi-topic document. In this paper it proposed a partition method regarding webpage structure as the guideline. It utilizes the natural dividing function of the nodes in the DOM tree of the webpage, and then calculates the semantic similarity degree of the adjoining border nodes in order to segment topics. Experiments on this foundation showed that it has the higher partition accuracy, and it can remarkably increase the digest's coverage for the original document and solve the un-balance distribution problem in summarization system effectively.
出处 《计算机应用与软件》 CSCD 2009年第8期59-61,共3页 Computer Applications and Software
基金 江苏省自然科学基金项目(BK2005046)
关键词 主题划分 文档对象模型 语义相似度 自动摘要 Topic segmentation Document object model Semantic similarity Automatic abstract
  • 相关文献

参考文献9

二级参考文献22

  • 1J MacQueen. Some methods for classification and analysis of multivariate observation. In: Proc of the 5th Berkeley Symp Math Statist and Prob 1. California; University of California Press,1967. 281~297 被引量:1
  • 2L Kaufman, P J Rousseeuw. Finding Groups in Data: An Introduction to Cluster Analysis. New York: John Wiley & Sons,1990 被引量:1
  • 3M Ankerst, M M Breunig, H P Kriegel, et al. OPTICS:Ordering points to identify the clustering structure. In: Proc of the 1999 ACM SIGMOD Int'l Conf on Management of Data (SIGMOD' 99). New York: ACM Press, 1999. 164~169 被引量:1
  • 4A Hotho, G Stumme. Conceptual clustering of text clusters.FGML Workshop, Hannover, 2002 被引量:1
  • 5D S Modha, W S Spangler. Feature weighting in k-means clustering. Machine Learning, 2003, 52(3): 217~237 被引量:1
  • 6F Beil, M Ester, X Xu. Frequent term-based text clustering. In:Proc of 2002 Int Conf Knowledge Discovery and Data Mining.New York: ACM Press, 2002. 436~442 被引量:1
  • 7B B Wang, R I McKay, Hussein AAbbass, etal. A comparative study for domain ontology guided feature extraction. In: Proc of 26th Australian Computer Science Conference (ACSC2003).Darlinghurst, Australia: Australian Computer Society Inc, 2003.69~ 78 被引量:1
  • 8DEERWESTER S,DUMAIS ST,LANDAUER TK,et al.Indexing by latent semantic analysis[J].Journal of Society for Information Science,1990,41 (6):391-407. 被引量:1
  • 9CHANG H-C,HSU C-C.Using topic keyword clusters for automatic document clustering[J].IEEE Transactions on Information and Systems,2005,E88-D(8):1852-1860. 被引量:1
  • 10CHANG HC,HSU CC,DENG YW.Automatic document clustering based on keyword clusters using partitions of weighted undirected graph[A].Proceedings of 2003 Symposium on Digital Life and Intemet Technologies[C].2003. 被引量:1

共引文献31

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部