摘要
主题划分是多主题文档自动摘要中的一个重要问题,提出了一种以网页结构为指导,利用页面对应DOM树中节点的自然分割功能以及相邻边界节点语义相似度的比较进行网页主题划分的方法。实验结果表明该方法具有较高的划分准确率,在此基础上抽取的网页摘要可显著增加文摘内容对原文的覆盖率、有效解决Web文档摘要分布不平衡问题。
Topic partition is a significant problem in automatic abstracting system of multi-topic document. In this paper it proposed a partition method regarding webpage structure as the guideline. It utilizes the natural dividing function of the nodes in the DOM tree of the webpage, and then calculates the semantic similarity degree of the adjoining border nodes in order to segment topics. Experiments on this foundation showed that it has the higher partition accuracy, and it can remarkably increase the digest's coverage for the original document and solve the un-balance distribution problem in summarization system effectively.
出处
《计算机应用与软件》
CSCD
2009年第8期59-61,共3页
Computer Applications and Software
基金
江苏省自然科学基金项目(BK2005046)
关键词
主题划分
文档对象模型
语义相似度
自动摘要
Topic segmentation Document object model Semantic similarity Automatic abstract