期刊文献+
共找到127篇文章
< 1 2 7 >
每页显示 20 50 100
基于PageRank的中文多文档文本情感摘要 被引量:19
1
作者 林莉媛 王中卿 +1 位作者 李寿山 周国栋 《中文信息学报》 CSCD 北大核心 2014年第2期85-90,共6页
文本情感摘要任务旨在对带有情感的文本数据进行浓缩、提炼进而产生文本所表达的关于情感意见的摘要。该文主要研究基于多文档的文本情感摘要问题,重点针对网络上存在同一个产品的多个评论产生相应的摘要。首先,为了进行关于文本情感摘... 文本情感摘要任务旨在对带有情感的文本数据进行浓缩、提炼进而产生文本所表达的关于情感意见的摘要。该文主要研究基于多文档的文本情感摘要问题,重点针对网络上存在同一个产品的多个评论产生相应的摘要。首先,为了进行关于文本情感摘要的研究,该文收集并标注了一个基于产品评论的中文多文档文本情感摘要语料库。其次,该文提出了一种基于情感信息的PageRank算法框架用于实现多文档文本情感摘要,该算法同时考虑了情感和主题相关两方面的信息。实验结果表明,该文采用的方法和已有的方法相比在ROUGE值上有显著提高。 展开更多
关键词 摘要 情感 多文档
下载PDF
基于联合权重的多文档关键词抽取技术 被引量:15
2
作者 杨洁 季铎 +2 位作者 蔡东风 林晓庆 白宇 《中文信息学报》 CSCD 北大核心 2008年第6期75-79,共5页
该文提出一种多文档关键词抽取方法,该方法提出ATF×PDF(Average Term Frequency×ProportionalDocument Frequency)来计算词语权重,并根据候选关键词之间的语义相似度,采用联合权重方法重新计算候选关键词的权重来抽取关键词... 该文提出一种多文档关键词抽取方法,该方法提出ATF×PDF(Average Term Frequency×ProportionalDocument Frequency)来计算词语权重,并根据候选关键词之间的语义相似度,采用联合权重方法重新计算候选关键词的权重来抽取关键词。该方法综合考虑了词语的频率,词性以及词语之间的语义相似性等信息,实验表明,该方法能有效抽取多个文档的关键词,同基于关键词的聚类标记方法相比,其准确率提高3%,召回率提高7%,F-measure提高4.4%。 展开更多
关键词 计算机应用 中文信息处理 ATF×PDF 联合权重 多文档 语义相似度
下载PDF
Chinese multi-document personal name disambiguation 被引量:8
3
作者 Wang Houfeng(王厚峰) Mei Zheng 《High Technology Letters》 EI CAS 2005年第3期280-283,共4页
This paper presents a new approach to determining whether an interested personal name across doeuments refers to the same entity. Firstly,three vectors for each text are formed: the personal name Boolean vectors deno... This paper presents a new approach to determining whether an interested personal name across doeuments refers to the same entity. Firstly,three vectors for each text are formed: the personal name Boolean vectors denoting whether a personal name occurs the text the biographical word Boolean vector representing title, occupation and so forth, and the feature vector with real values. Then, by combining a heuristic strategy based on Boolean vectors with an agglomeratie clustering algorithm based on feature vectors, it seeks to resolve multi-document personal name coreference. Experimental results show that this approach achieves a good performance by testing on "Wang Gang" corpus. 展开更多
关键词 personal name disambiguation Chinese multi-document heuristic strategy. agglomerative clustering
下载PDF
文本聚类在自动文摘中的应用研究 被引量:4
4
作者 郭庆琳 樊孝忠 柳长安 《计算机应用》 CSCD 北大核心 2005年第5期1036-1038,共3页
针对当前自动文摘方法的不足,提出了基于文本聚类的自动文摘实现方法。将文本聚类引入自动文摘中,能实现多文档的自动文摘。实现了面向“塑料”行业的基于文本聚类的自动文摘系统TCAAS,其单文档自动文摘的正确率和召回率在80%以上,多文... 针对当前自动文摘方法的不足,提出了基于文本聚类的自动文摘实现方法。将文本聚类引入自动文摘中,能实现多文档的自动文摘。实现了面向“塑料”行业的基于文本聚类的自动文摘系统TCAAS,其单文档自动文摘的正确率和召回率在80%以上,多文档自动文摘的正确率和召回率在75%以上。实验表明该方法可行,对自动文摘系统的设计具有借鉴意义和深入研究的价值。 展开更多
关键词 自动文摘 文本聚类 多文档
下载PDF
结合预训练的多文档摘要:研究
5
作者 丁一 王中卿 《计算机科学》 CSCD 北大核心 2024年第S01期174-181,共8页
新闻文本摘要任务旨在从庞大复杂的新闻文本中快速准确地提炼出简明扼要的摘要。基于预训练语言模型对多文档摘要进行研究,重点研究结合预训练任务的具体模型训练方式对模型效果提升的作用,强化多文档之间的信息交流,以生成更全面、更... 新闻文本摘要任务旨在从庞大复杂的新闻文本中快速准确地提炼出简明扼要的摘要。基于预训练语言模型对多文档摘要进行研究,重点研究结合预训练任务的具体模型训练方式对模型效果提升的作用,强化多文档之间的信息交流,以生成更全面、更简练的摘要。对于结合预训练任务,提出对基线模型、预训练任务内容、预训练任务数量、预训练任务顺序的对比实验,探索标记了行之有效的预训练任务,总结归纳了强化多文档之间的信息交流的具体方法,精炼提出了简明高效的预训练流程。在公开新闻多文档数据集上进行训练和测试,实验结果表明预训练任务的内容、数量、顺序对ROUGE值都有一定提升,并且整合三者结论提出的特定预训练组合对ROUGE值有明显提升。 展开更多
关键词 新闻 摘要: 预训练 多文档 信息交流
下载PDF
主题信息的中文多文档自动文摘系统 被引量:5
6
作者 王红玲 张明慧 周国栋 《计算机工程与应用》 CSCD 2012年第25期132-136,共5页
多文档自动文摘能够帮助人们自动、快速地获取信息,使用主题模型构建多文档自动文摘系统是一种新的尝试,其中主题模型采用浅层狄利赫雷分配(LDA)。该模型是一个多层的产生式概率模型,能够检测文档中的主题分布。使用LDA为多文档集合建模... 多文档自动文摘能够帮助人们自动、快速地获取信息,使用主题模型构建多文档自动文摘系统是一种新的尝试,其中主题模型采用浅层狄利赫雷分配(LDA)。该模型是一个多层的产生式概率模型,能够检测文档中的主题分布。使用LDA为多文档集合建模,通过计算句子在不同主题上的概率分布之间的相似度作为句子的重要度,并根据句子重要度进行文摘句的抽取。实验结果表明,该方法所得到的文摘性能优于传统的文摘方法。 展开更多
关键词 中文自动文摘 浅层狄利赫雷分配(LDA) 主题模型 多文档
下载PDF
基于双向注意力机制的多文档神经阅读理解 被引量:5
7
作者 唐竑轩 武恺莉 +1 位作者 朱朦朦 洪宇 《计算机工程》 CAS CSCD 北大核心 2020年第12期43-51,共9页
机器阅读理解是一项针对给定文本和特定问题自动生成或抽取相应答案的问答任务,该任务是评估计算机系统对自然语言理解程度的重要任务之一。相比于传统的阅读理解任务,多文档阅读理解需要计算模型具备更高的推理和理解能力。为此,提出... 机器阅读理解是一项针对给定文本和特定问题自动生成或抽取相应答案的问答任务,该任务是评估计算机系统对自然语言理解程度的重要任务之一。相比于传统的阅读理解任务,多文档阅读理解需要计算模型具备更高的推理和理解能力。为此,提出一种基于多任务联合训练的阅读理解模型,该模型是由一组功能各异的神经网络构成的联合学习模型,其仿效人们推理和回答问题的基本方式分别执行文档选择和答案抽取两个关键步骤。文档选择过程融入了基于注意力矩阵的关联性判别机制,旨在建立各文档间的联系,而答案抽取过程则使用了语篇级的双向注意力机制,来找寻与答案相关的文字线索,将两者附着于一套神经阅读理解模型上,可形成一种基于联合学习的多文档阅读理解方法。在HotpotQA数据集上的实验结果表明,与基线模型相比,该模型的EM值和F1值分别提升了2.1%和1.7%。 展开更多
关键词 机器阅读理解 多文档 推理 联合训练 注意力机制
下载PDF
Using AdaBoost Meta-Learning Algorithm for Medical News Multi-Document Summarization 被引量:1
8
作者 Mahdi Gholami Mehr 《Intelligent Information Management》 2013年第6期182-190,共9页
Automatic text summarization involves reducing a text document or a larger corpus of multiple documents to a short set of sentences or paragraphs that convey the main meaning of the text. In this paper, we discuss abo... Automatic text summarization involves reducing a text document or a larger corpus of multiple documents to a short set of sentences or paragraphs that convey the main meaning of the text. In this paper, we discuss about multi-document summarization that differs from the single one in which the issues of compression, speed, redundancy and passage selection are critical in the formation of useful summaries. Since the number and variety of online medical news make them difficult for experts in the medical field to read all of the medical news, an automatic multi-document summarization can be useful for easy study of information on the web. Hence we propose a new approach based on machine learning meta-learner algorithm called AdaBoost that is used for summarization. We treat a document as a set of sentences, and the learning algorithm must learn to classify as positive or negative examples of sentences based on the score of the sentences. For this learning task, we apply AdaBoost meta-learning algorithm where a C4.5 decision tree has been chosen as the base learner. In our experiment, we use 450 pieces of news that are downloaded from different medical websites. Then we compare our results with some existing approaches. 展开更多
关键词 multi-document SUMMARIZATION Machine Learning Decision Trees ADABOOST C4.5 MEDICAL document SUMMARIZATION
下载PDF
MFC文档/视图研究
9
作者 郭清菊 《软件导刊》 2007年第3期17-18,共2页
以Visual Studio 6.0为例,研究了MFC下文档视图结构关系、单文档应用程序和多文档应用程序的应用。
关键词 MFC 文档 视图 一档多视
下载PDF
基于评论质量的多文档文本情感摘要 被引量:2
10
作者 林莉媛 王中卿 +1 位作者 李寿山 周国栋 《中文信息学报》 CSCD 北大核心 2015年第4期33-39,共7页
任务旨在对带有情感的文本数据进行浓缩、提炼进而产生文本所表达的关于情感意见的摘要,用以帮助用户更好地阅读、理解情感文本的内容。该文主要研究多文档的文本情感摘要问题,重点针对网络上存在的同一个产品的多个评论进行摘要抽取。... 任务旨在对带有情感的文本数据进行浓缩、提炼进而产生文本所表达的关于情感意见的摘要,用以帮助用户更好地阅读、理解情感文本的内容。该文主要研究多文档的文本情感摘要问题,重点针对网络上存在的同一个产品的多个评论进行摘要抽取。在情感文本中,情感相关性是一个重要的特点,该文将充分考虑情感信息对文本情感摘要的重要影响。同时,对于评论语料,质量高的评论或者说可信度高的评论可以帮助用户更好的了解评论中所评价的对象。因此,该文将充分考虑评论质量对文本情感摘要的影响。并且为了进行关于文本情感摘要的研究,该文收集并标注了一个基于产品评论的英文多文档文本情感摘要语料库。实验证明,情感信息和评论质量能够帮助多文档文本情感摘要,提高摘要效果。 展开更多
关键词 情感摘要 多文档 评论质量
下载PDF
Density peaks clustering based integrate framework for multi-document summarization 被引量:2
11
作者 BaoyanWang Jian Zhang +1 位作者 Yi Liu Yuexian Zou 《CAAI Transactions on Intelligence Technology》 2017年第1期26-30,共5页
We present a novel unsupervised integrated score framework to generate generic extractive multi- document summaries by ranking sentences based on dynamic programming (DP) strategy. Considering that cluster-based met... We present a novel unsupervised integrated score framework to generate generic extractive multi- document summaries by ranking sentences based on dynamic programming (DP) strategy. Considering that cluster-based methods proposed by other researchers tend to ignore informativeness of words when they generate summaries, our proposed framework takes relevance, diversity, informativeness and length constraint of sentences into consideration comprehensively. We apply Density Peaks Clustering (DPC) to get relevance scores and diversity scores of sentences simultaneously. Our framework produces the best performance on DUC2004, 0.396 of ROUGE-1 score, 0.094 of ROUGE-2 score and 0.143 of ROUGE-SU4 which outperforms a series of popular baselines, such as DUC Best, FGB [7], and BSTM [10]. 展开更多
关键词 multi-document summarization Integrated score framework Density peaks clustering Sentences rank
下载PDF
多文档短摘要生成技术研究 被引量:2
12
作者 张随远 薛源海 +2 位作者 俞晓明 刘悦 程学旗 《广西师范大学学报(自然科学版)》 CAS 北大核心 2019年第2期60-74,共15页
自动摘要技术用于将较长篇幅的文章压缩为一段较短的能概括原文中心内容的文本。多文档冗余度高,电子设备所展示的空间有限,成为摘要发展面临的挑战。本文提出融合图卷积特征的句子粗粒度排序方法。首先将句子之间的相似度矩阵视为拓扑... 自动摘要技术用于将较长篇幅的文章压缩为一段较短的能概括原文中心内容的文本。多文档冗余度高,电子设备所展示的空间有限,成为摘要发展面临的挑战。本文提出融合图卷积特征的句子粗粒度排序方法。首先将句子之间的相似度矩阵视为拓扑关系图,对其进行图卷积计算得到图卷积特征。然后通过排序模型融合图卷积特征以及主流的抽取式多文档摘要技术对句子进行重要度排序,选取排名前四的句子作为摘要。最后提出基于Seq2seq框架的短摘要生成模型:①在Encoder部分采用基于卷积神经网络(CNN)的方法;②引入基于注意力的指针机制,并将主题向量融入其中。实验结果表明,在本文场景下,相较于循环神经网络(RNN),在Encoder部分基于CNN能够更好地进行并行化,在效果基本一致的前提下,显著提升效率。此外,相较于传统的基于抽取和压缩的模型,本文提出的模型在ROUGE指标以及可读性(信息度和流利度)方面均取得了显著的效果提升。 展开更多
关键词 多文档 短摘要生成 Seq2seq
下载PDF
Multi-Document Summarization Model Based on Integer Linear Programming
13
作者 Rasim Alguliev Ramiz Aliguliyev Makrufa Hajirahimova 《Intelligent Control and Automation》 2010年第2期105-111,共7页
This paper proposes an extractive generic text summarization model that generates summaries by selecting sentences according to their scores. Sentence scores are calculated using their extensive coverage of the main c... This paper proposes an extractive generic text summarization model that generates summaries by selecting sentences according to their scores. Sentence scores are calculated using their extensive coverage of the main content of the text, and summaries are created by extracting the highest scored sentences from the original document. The model formalized as a multiobjective integer programming problem. An advantage of this model is that it can cover the main content of source (s) and provide less redundancy in the generated sum- maries. To extract sentences which form a summary with an extensive coverage of the main content of the text and less redundancy, have been used the similarity of sentences to the original document and the similarity between sentences. Performance evaluation is conducted by comparing summarization outputs with manual summaries of DUC2004 dataset. Experiments showed that the proposed approach outperforms the related methods. 展开更多
关键词 multi-document SUMMARIZATION Content COVERAGE LESS REDUNDANCY INTEGER Linear Programming
下载PDF
Unsupervised Graph-Based Tibetan Multi-Document Summarization
14
作者 Xiaodong Yan Yiqin Wang +3 位作者 Wei Song Xiaobing Zhao A.Run Yang Yanxing 《Computers, Materials & Continua》 SCIE EI 2022年第10期1769-1781,共13页
Text summarization creates subset that represents the most important or relevant information in the original content,which effectively reduce information redundancy.Recently neural network method has achieved good res... Text summarization creates subset that represents the most important or relevant information in the original content,which effectively reduce information redundancy.Recently neural network method has achieved good results in the task of text summarization both in Chinese and English,but the research of text summarization in low-resource languages is still in the exploratory stage,especially in Tibetan.What’s more,there is no large-scale annotated corpus for text summarization.The lack of dataset severely limits the development of low-resource text summarization.In this case,unsupervised learning approaches are more appealing in low-resource languages as they do not require labeled data.In this paper,we propose an unsupervised graph-based Tibetan multi-document summarization method,which divides a large number of Tibetan news documents into topics and extracts the summarization of each topic.Summarization obtained by using traditional graph-based methods have high redundancy and the division of documents topics are not detailed enough.In terms of topic division,we adopt two level clustering methods converting original document into document-level and sentence-level graph,next we take both linguistic and deep representation into account and integrate external corpus into graph to obtain the sentence semantic clustering.Improve the shortcomings of the traditional K-Means clustering method and perform more detailed clustering of documents.Then model sentence clusters into graphs,finally remeasure sentence nodes based on the topic semantic information and the impact of topic features on sentences,higher topic relevance summary is extracted.In order to promote the development of Tibetan text summarization,and to meet the needs of relevant researchers for high-quality Tibetan text summarization datasets,this paper manually constructs a Tibetan summarization dataset and carries out relevant experiments.The experiment results show that our method can effectively improve the quality of summarization and our method is com 展开更多
关键词 multi-document summarization text clustering topic feature fusion graphic model
下载PDF
基于综合权重的多文档关键词抽取算法 被引量:1
15
作者 胡志敏 《计算机与数字工程》 2010年第6期45-48,共4页
多文档关键词抽取是进行在多篇文献中找出最能反映整体主题的关键词。对几种关键词抽取算法进行了介绍,分析了各自的优缺点,在TF/PDF算法的基础上,采用文献内和文献间综合权重的方法。
关键词 抽取算法ITF/PDF.将ITF/PDF的抽取结果与TF/PDF抽取结果进行了比较 实验结果表明 ITF/PDF能够更准确的在多文档中抽取合适的关键词.关键词多文档 关键词 抽取 综合权重
下载PDF
Research on multi-document summarization based on latent semantic indexing
16
作者 秦兵 刘挺 +1 位作者 张宇 李生 《Journal of Harbin Institute of Technology(New Series)》 EI CAS 2005年第1期91-94,共4页
A multi-document summarization method based on Latent Semantic Indexing (LSI) is proposed. The method combines several reports on the same issue into a matrix of terms and sentences, and uses a Singular Value Decompos... A multi-document summarization method based on Latent Semantic Indexing (LSI) is proposed. The method combines several reports on the same issue into a matrix of terms and sentences, and uses a Singular Value Decomposition (SVD) to reduce the dimension of the matrix and extract features, and then the sentence similarity is computed. The sentences are clustered according to similarity of sentences. The centroid sentences are selected from each class. Finally, the selected sentences are ordered to generate the summarization. The evaluation and results are presented, which prove that the proposed methods are efficient. 展开更多
关键词 multi-document summarization LSI (latent semantic indexing) CLUSTERING
下载PDF
Constructing a taxonomy to support multi-document summarization of dissertation abstracts
17
作者 KHOO Christopher S.G. GOH Dion H. 《Journal of Zhejiang University-Science A(Applied Physics & Engineering)》 SCIE EI CAS CSCD 2005年第11期1258-1267,共10页
This paper reports part of a study to develop a method for automatic multi-document summarization. The current focus is on dissertation abstracts in the field of sociology. The summarization method uses macro-level an... This paper reports part of a study to develop a method for automatic multi-document summarization. The current focus is on dissertation abstracts in the field of sociology. The summarization method uses macro-level and micro-level discourse structure to identify important information that can be extracted from dissertation abstracts, and then uses a variable-based framework to integrate and organize extracted information across dissertation abstracts. This framework focuses more on research concepts and their research relationships found in sociology dissertation abstracts and has a hierarchical structure. A taxonomy is constructed to support the summarization process in two ways: (1) helping to identify important concepts and relations expressed in the text, and (2) providing a structure for linking similar concepts in different abstracts. This paper describes the variable-based framework and the summarization process, and then reports the construction of the taxonomy for supporting the summarization process. An example is provided to show how to use the constructed taxonomy to identify important concepts and integrate the concepts extracted from different abstracts. 展开更多
关键词 Text summarization Automatic multi-document summarization Variable-based framework Digital library
下载PDF
基于VC++.NET的多种类型文件管理
18
作者 陈海宇 《郑州轻工业学院学报(自然科学版)》 CAS 2009年第6期18-22,共5页
提出了用VC++.NET实现多种类型文件管理的方法:通过在AppWizard生成的MDI应用程序框架中修改多文档应用程序的实例化函数,实现文本文件与位图图像2种不同类型文件的同时管理.实例表明,应用该方法不但可以实现同种类型文件的管理,还可有... 提出了用VC++.NET实现多种类型文件管理的方法:通过在AppWizard生成的MDI应用程序框架中修改多文档应用程序的实例化函数,实现文本文件与位图图像2种不同类型文件的同时管理.实例表明,应用该方法不但可以实现同种类型文件的管理,还可有效解决多种类型文件管理接口的问题,为应用程序对多种类型文件的进一步编辑处理奠定了基础,具有较强的实用性. 展开更多
关键词 VC++.NET 多文档 文件类型
下载PDF
BHLM:Bayesian theory-based hybrid learning model for multi-document summarization
19
作者 S.Suneetha A.Venugopal Reddy 《International Journal of Modeling, Simulation, and Scientific Computing》 EI 2018年第2期229-250,共22页
In order to understand and organize the document in an efficient way,the multidocument summarization becomes the prominent technique in the Internet world.As the information available is in a large amount,it is necess... In order to understand and organize the document in an efficient way,the multidocument summarization becomes the prominent technique in the Internet world.As the information available is in a large amount,it is necessary to summarize the document for obtaining the condensed information.To perform the multi-document summarization,a new Bayesian theory-based Hybrid Learning Model(BHLM)is proposed in this paper.Initially,the input documents are preprocessed,where the stop words are removed from the document.Then,the feature of the sentence is extracted to determine the sentence score for summarizing the document.The extracted feature is then fed into the hybrid learning model for learning.Subsequently,learning feature,training error and correlation coefficient are integrated with the Bayesian model to develop BHLM.Also,the proposed method is used to assign the class label assisted by the mean,variance and probability measures.Finally,based on the class label,the sentences are sorted out to generate the final summary of the multi-document.The experimental results are validated in MATLAB,and the performance is analyzed using the metrics,precision,recall,F-measure and rouge-1.The proposed model attains 99.6%precision and 75%rouge-1 measure,which shows that the model can provide the final summary efficiently. 展开更多
关键词 multi-document text feature sentence score hybrid learning model Bayesian theory
原文传递
Automatic Multi-Document Summarization Based on Keyword Density and Sentence-Word Graphs
20
作者 YE Feiyue XU Xinchen 《Journal of Shanghai Jiaotong university(Science)》 EI 2018年第4期584-592,共9页
As a fundamental and effective tool for document understanding and organization, multi-document summarization enables better information services by creating concise and informative reports for large collections of do... As a fundamental and effective tool for document understanding and organization, multi-document summarization enables better information services by creating concise and informative reports for large collections of documents. In this paper, we propose a sentence-word two layer graph algorithm combining with keyword density to generate the multi-document summarization, known as Graph & Keywordp. The traditional graph methods of multi-document summarization only consider the influence of sentence and word in all documents rather than individual documents. Therefore, we construct multiple word graph and extract right keywords in each document to modify the sentence graph and to improve the significance and richness of the summary. Meanwhile, because of the differences in the words importance in documents, we propose to use keyword density for the summaries to provide rich content while using a small number of words. The experiment results show that the Graph & Keywordp method outperforms the state of the art systems when tested on the Duc2004 data set. Key words: multi-document, graph algorithm, keyword density, Graph & Keywordp, Due2004 展开更多
关键词 multi-document graph algorithm keyword density Graph & Keywordρ Duc2004
原文传递
上一页 1 2 7 下一页 到第
使用帮助 返回顶部