期刊文献+

基于Web的文摘技术研究 被引量:1

Automatic Text Summarization Based on Web
下载PDF
导出
摘要 W eb文档的迅猛增长使W eb文摘技术成了当今的一个研究热点。由于W eb文档的特殊性,使得W eb文摘不同于传统的文本自动文摘。本文分析了W eb文档的特点;给出了W eb文摘的定义;提出了基于句子抽取的W eb文摘生成算法。算法中将每个W eb句子权重分解为W eb特征词权重和W eb句子结构权重,并用机器学习的方法来计算二者所占的比重。W eb特征词权重根据文档分类树图进行权值调整,W eb句子结构权重充分考虑排版格式和超连接属性。通过对1000篇W eb文档的文摘实验,证明文中所提W eb文摘算法切实可行。 Web Document Summarization (WDS) is becoming one of the hot subjects in the text summarization field due to the rapidly increasing number of documents on Web. However, WDS is different from traditional text summarization because it processes hyperlinked texts. This paper first analyses the features of Web documents, then gives a definition for WDS, and finally presents an algorithm for WDS based on sentences extraction. Each sentence's weight is a weighted sum of words' weight and its sentence-structure's weight. The former weight is adjusted by document class tree graph and the latter weight considers both the Web formats and hyperlink attributes. The weight proportion of words and structures is learned by a machine learning approach. Experiments on 1,000 Web documents show that our algorithm is feasible.
出处 《中文信息学报》 CSCD 北大核心 2006年第6期54-60,108,共8页 Journal of Chinese Information Processing
基金 国家部委基金资助项目(2003WL01)
关键词 计算机应用 中文信息处理 Web文摘 文本文摘 Web文档预处理 文摘后处理 computer application Chinese information processing Web document summarization automatic text summarization preprocessing of Web document postprocessing of summary
  • 相关文献

参考文献11

  • 1刘挺,王开铸.自动文摘的四种主要方法[J].情报学报,1999,18(1):10-19. 被引量:55
  • 2Wai Lam,Kei Shiu Ho.FIDS:An intelligent financial Web news articles digest system[J].Systems,Man and Cybernetics,2001,31(6):753-762. 被引量:1
  • 3J.B.Keith Humphreys.Phraserate:An html keyphrase extractor[R].Technical report,University of California,Riverside,2002. 被引量:1
  • 4尹存燕,戴新宇,陈家骏.Internet上文本的自动摘要技术[J].计算机工程,2006,32(3):88-90. 被引量:13
  • 5Fang Chen,Kesong Han,Guilin Chen.An approach to sentence-selection-based text summarization[A].In:Proc.2002 IEEE Region 10 Conference on Computers,Communications,Control and Power Engineering Volume 1[C].2002,489-493. 被引量:1
  • 6Carlos N.Silla,Gisele L.Pappa,Alex A.Freitas,et al.Automatic text summarization with genetic algorithm-based attribute selection[A].In:Proceedings of 9th Ibero-American Conference on AI[C],Puebla,Mexico,2004:305-314. 被引量:1
  • 7Yihong Gong,Xin Liu.Generic text summarization using relevance measure and Latent Semantic Analysis[EB/OL].http://portal.acm.org,2001-10-12/2006-01-10. 被引量:1
  • 8Joel Larocca Neto,Alex A.Freitas,Celso A.A.Kaestner.Automatic text Summarization using a machine learning approach[A].In:Proceedings of 16th Brazilian Symposium on Artificial Intelligence[C].2002:205-215. 被引量:1
  • 9Kleinberg J.Authoritative sources in a hyperlinked environment[A].In:Proceedings of the 9th ACM-SIAM Symposium on Discrete Algorithms[C].New Orleans:ACM Press,1997:668-677. 被引量:1
  • 10Khosrow Kaikhah.Automatic text summarization with neural networks[A].Second IEEE International Conference on Intelligent Systems[C].IEEE,2004:40-44. 被引量:1

二级参考文献21

共引文献65

同被引文献55

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部