摘要
针对当前自动文摘方法的缺陷,提出了基于文本聚类和自然语言理解的自动文摘实现方法.将文本聚类引入自动文摘中,实现多文档的自动文摘.提出了基于标题和段首句的二次自动分词算法.实验结果表明,分词正确率和召回率均在95%以上.实现了面向塑料行业的基于文本聚类和自然语言理解的自动文摘系统,其多文档自动文摘的正确率和召回率都在75%以上.实验表明该方法可行,对自动文摘系统的设计具有借鉴意义和深入研究价值.
A method of realization of automatic abstracting based on text clustering and natural language understanding is brought forward, aimed at overcoming shortages of some current methods. The method makes use of text clustering and can realize automatic abstracting of multidocuments. The algorithm of twice word segmentation based on the title and first-sentences in paragraphs is brought forward. Its precision and recall is above 95%. For a specific domain on plastics, an automatic abstracting system is implemented. The precision and recall of multidocument's automatic abstracting is above 75%. And experiments do prove that it is feasible to use the method to develop a domain automatic abstracting system, that is valuable for further study in more depth.
出处
《北京理工大学学报》
EI
CAS
CSCD
北大核心
2005年第8期705-709,共5页
Transactions of Beijing Institute of Technology
基金
国家自然科学基金资助项目(60305009)
华北电力大学博士学位教师科研基金资助项目
关键词
自动文摘
文本聚类
自然语言理解
automatic abstracting
text clustering
natural language understanding