Social media’s explosive growth has resulted in a massive influx of electronic documents influencing various facets of daily life.However,the enormous and complex nature of this content makes extracting valuable insi...Social media’s explosive growth has resulted in a massive influx of electronic documents influencing various facets of daily life.However,the enormous and complex nature of this content makes extracting valuable insights challenging.Long document summarization emerges as a pivotal technique in this context,serving to distill extensive texts into concise and comprehensible summaries.This paper presents a novel three-stage pipeline for effective long document summarization.The proposed approach combines unsupervised and supervised learning techniques,efficiently handling large document sets while requiring minimal computational resources.Our methodology introduces a unique process for forming semantic chunks through spectral dynamic segmentation,effectively reducing redundancy and repetitiveness in the summarization process.Contrary to previous methods,our approach aligns each semantic chunk with the entire summary paragraph,allowing the abstractive summarization model to process documents without truncation and enabling the summarization model to deduce missing information from other chunks.To enhance the summary generation,we utilize a sophisticated rewrite model based on Bidirectional and Auto-Regressive Transformers(BART),rearranging and reformulating summary constructs to improve their fluidity and coherence.Empirical studies conducted on the long documents from the Webis-TLDR-17 dataset demonstrate that our approach significantly enhances the efficiency of abstractive summarization transformers.The contributions of this paper thus offer significant advancements in the field of long document summarization,providing a novel and effective methodology for summarizing extensive texts in the context of social media.展开更多
A large variety of complaint reports reflect subjective information expressed by citizens.A key challenge of text summarization for complaint reports is to ensure the factual consistency of generated summary.Therefore...A large variety of complaint reports reflect subjective information expressed by citizens.A key challenge of text summarization for complaint reports is to ensure the factual consistency of generated summary.Therefore,in this paper,a simple and weakly supervised framework considering factual consistency is proposed to generate a summary of city-based complaint reports without pre-labeled sentences/words.Furthermore,it considers the importance of entity in complaint reports to ensure factual consistency of summary.Experimental results on the customer review datasets(Yelp and Amazon)and complaint report dataset(complaint reports of Shenyang in China)show that the proposed framework outperforms state-of-the-art approaches in ROUGE scores and human evaluation.It unveils the effectiveness of our approach to helping in dealing with complaint reports.展开更多
The existing abstractive text summarisation models only consider the word sequence correlations between the source document and the reference summary,and the summary generated by models lacks the cover of the subject ...The existing abstractive text summarisation models only consider the word sequence correlations between the source document and the reference summary,and the summary generated by models lacks the cover of the subject of source document due to models'small perspective.In order to make up these disadvantages,a multi‐domain attention pointer(MDA‐Pointer)abstractive summarisation model is proposed in this work.First,the model uses bidirectional long short‐term memory to encode,respectively,the word and sentence sequence of source document for obtaining the semantic representations at word and sentence level.Furthermore,the multi‐domain attention mechanism between the semantic representations and the summary word is established,and the proposed model can generate summary words under the proposed attention mechanism based on the words and sen-tences.Then,the words are extracted from the vocabulary or the original word sequences through the pointer network to form the summary,and the coverage mechanism is introduced,respectively,into word and sentence level to reduce the redundancy of sum-mary content.Finally,experiment validation is conducted on CNN/Daily Mail dataset.ROUGE evaluation indexes of the model without and with the coverage mechanism are improved respectively,and the results verify the validation of model proposed by this paper.展开更多
With the continuous growth of online news articles,there arises the necessity for an efficient abstractive summarization technique for the problem of information overloading.Abstractive summarization is highly complex...With the continuous growth of online news articles,there arises the necessity for an efficient abstractive summarization technique for the problem of information overloading.Abstractive summarization is highly complex and requires a deeper understanding and proper reasoning to come up with its own summary outline.Abstractive summarization task is framed as seq2seq modeling.Existing seq2seq methods perform better on short sequences;however,for long sequences,the performance degrades due to high computation and hence a two-phase self-normalized deep neural document summarization model consisting of improvised extractive cosine normalization and seq2seq abstractive phases has been proposed in this paper.The novelty is to parallelize the sequence computation training by incorporating feed-forward,the self-normalized neural network in the Extractive phase using Intra Cosine Attention Similarity(Ext-ICAS)with sentence dependency position.Also,it does not require any normalization technique explicitly.Our proposed abstractive Bidirectional Long Short Term Memory(Bi-LSTM)encoder sequence model performs better than the Bidirectional Gated Recurrent Unit(Bi-GRU)encoder with minimum training loss and with fast convergence.The proposed model was evaluated on the Cable News Network(CNN)/Daily Mail dataset and an average rouge score of 0.435 was achieved also computational training in the extractive phase was reduced by 59%with an average number of similarity computations.展开更多
Text summarization aims to generate a concise version of the original text.The longer the summary text is,themore detailed it will be fromthe original text,and this depends on the intended use.Therefore,the problem of...Text summarization aims to generate a concise version of the original text.The longer the summary text is,themore detailed it will be fromthe original text,and this depends on the intended use.Therefore,the problem of generating summary texts with desired lengths is a vital task to put the research into practice.To solve this problem,in this paper,we propose a new method to integrate the desired length of the summarized text into the encoder-decoder model for the abstractive text summarization problem.This length parameter is integrated into the encoding phase at each self-attention step and the decoding process by preserving the remaining length for calculating headattention in the generation process and using it as length embeddings added to theword embeddings.We conducted experiments for the proposed model on the two data sets,Cable News Network(CNN)Daily and NEWSROOM,with different desired output lengths.The obtained results show the proposed model’s effectiveness compared with related studies.展开更多
Text summarization is an important task in natural language processing and it has been applied in many applications.Recently,abstractive summarization has attracted many attentions.However,the traditional evaluation m...Text summarization is an important task in natural language processing and it has been applied in many applications.Recently,abstractive summarization has attracted many attentions.However,the traditional evaluation metrics that consider little semantic information,are unsuitable for evaluating the quality of deep learning based abstractive summarization models,since these models may generate new words that do not exist in the original text.Moreover,the out-of-vocabulary(OOV)problem that affects the evaluation results,has not been well solved yet.To address these issues,we propose a novel model called ENMS,to enhance existing N-gram based evaluation metrics with semantics.To be specific,we present two types of methods:N-gram based Semantic Matching(NSM for short),and N-gram based Semantic Similarity(NSS for short),to improve several widely-used evaluation metrics including ROUGE(Recall-Oriented Understudy for Gisting Evaluation),BLEU(Bilingual Evaluation Understudy),etc.NSM and NSS work in different ways.The former calculates the matching degree directly,while the latter mainly improves the similarity measurement.Moreover we propose an N-gram representation mechanism to explore the vector representation of N-grams(including skip-grams).It serves as the basis of our ENMS model,in which we exploit some simple but effective integration methods to solve the OOV problem efficiently.Experimental results over the TAC AESOP dataset show that the metrics improved by our methods are well correlated with human judgements and can be used to better evaluate abstractive summarization methods.展开更多
In recent research,deep learning algorithms have presented effective representation learning models for natural languages.The deep learningbased models create better data representation than classical models.They are ...In recent research,deep learning algorithms have presented effective representation learning models for natural languages.The deep learningbased models create better data representation than classical models.They are capable of automated extraction of distributed representation of texts.In this research,we introduce a new tree Extractive text summarization that is characterized by fitting the text structure representation in knowledge base training module,and also addresses memory issues that were not addresses before.The proposed model employs a tree structured mechanism to generate the phrase and text embedding.The proposed architecture mimics the tree configuration of the text-texts and provide better feature representation.It also incorporates an attention mechanism that offers an additional information source to conduct better summary extraction.The novel model addresses text summarization as a classification process,where the model calculates the probabilities of phrase and text-summary association.The model classification is divided into multiple features recognition such as information entropy,significance,redundancy and position.The model was assessed on two datasets,on the Multi-Doc Composition Query(MCQ)and Dual Attention Composition dataset(DAC)dataset.The experimental results prove that our proposed model has better summarization precision vs.other models by a considerable margin.展开更多
Nowadays,data is very rapidly increasing in every domain such as social media,news,education,banking,etc.Most of the data and information is in the form of text.Most of the text contains little invaluable information ...Nowadays,data is very rapidly increasing in every domain such as social media,news,education,banking,etc.Most of the data and information is in the form of text.Most of the text contains little invaluable information and knowledge with lots of unwanted contents.To fetch this valuable information out of the huge text document,we need summarizer which is capable to extract data automatically and at the same time capable to summarize the document,particularly textual text in novel document,without losing its any vital information.The summarization could be in the form of extractive and abstractive summarization.The extractive summarization includes picking sentences of high rank from the text constructed by using sentence and word features and then putting them together to produced summary.An abstractive summarization is based on understanding the key ideas in the given text and then expressing those ideas in pure natural language.The abstractive summarization is the latest problem area for NLP(natural language processing),ML(Machine Learning)and NN(Neural Network)In this paper,the foremost techniques for automatic text summarization processes are defined.The different existing methods have been reviewed.Their effectiveness and limitations are described.Further the novel approach based on Neural Network and LSTM has been discussed.In Machine Learning approach the architecture of the underlying concept is called Encoder-Decoder.展开更多
Automatically generating a brief summary for legal-related public opinion news(LPO-news,which contains legal words or phrases)plays an important role in rapid and effective public opinion disposal.For LPO-news,the cri...Automatically generating a brief summary for legal-related public opinion news(LPO-news,which contains legal words or phrases)plays an important role in rapid and effective public opinion disposal.For LPO-news,the critical case elements which are significant parts of the summary may be mentioned several times in the reader comments.Consequently,we investigate the task of comment-aware abstractive text summarization for LPO-news,which can generate salient summary by learning pivotal case elements from the reader comments.In this paper,we present a hierarchical comment-aware encoder(HCAE),which contains four components:1)a traditional sequenceto-sequence framework as our baseline;2)a selective denoising module to filter the noisy of comments and distinguish the case elements;3)a merge module by coupling the source article and comments to yield comment-aware context representation;4)a recoding module to capture the interaction among the source article words conditioned on the comments.Extensive experiments are conducted on a large dataset of legal public opinion news collected from micro-blog,and results show that the proposed model outperforms several existing state-of-the-art baseline models under the ROUGE metrics.展开更多
文摘Social media’s explosive growth has resulted in a massive influx of electronic documents influencing various facets of daily life.However,the enormous and complex nature of this content makes extracting valuable insights challenging.Long document summarization emerges as a pivotal technique in this context,serving to distill extensive texts into concise and comprehensible summaries.This paper presents a novel three-stage pipeline for effective long document summarization.The proposed approach combines unsupervised and supervised learning techniques,efficiently handling large document sets while requiring minimal computational resources.Our methodology introduces a unique process for forming semantic chunks through spectral dynamic segmentation,effectively reducing redundancy and repetitiveness in the summarization process.Contrary to previous methods,our approach aligns each semantic chunk with the entire summary paragraph,allowing the abstractive summarization model to process documents without truncation and enabling the summarization model to deduce missing information from other chunks.To enhance the summary generation,we utilize a sophisticated rewrite model based on Bidirectional and Auto-Regressive Transformers(BART),rearranging and reformulating summary constructs to improve their fluidity and coherence.Empirical studies conducted on the long documents from the Webis-TLDR-17 dataset demonstrate that our approach significantly enhances the efficiency of abstractive summarization transformers.The contributions of this paper thus offer significant advancements in the field of long document summarization,providing a novel and effective methodology for summarizing extensive texts in the context of social media.
基金supported by National Natural Science Foundation of China(62276058,61902057,41774063)Fundamental Research Funds for the Central Universities(N2217003)Joint Fund of Science&Technology Department of Liaoning Province and State Key Laboratory of Robotics,China(2020-KF-12-11).
文摘A large variety of complaint reports reflect subjective information expressed by citizens.A key challenge of text summarization for complaint reports is to ensure the factual consistency of generated summary.Therefore,in this paper,a simple and weakly supervised framework considering factual consistency is proposed to generate a summary of city-based complaint reports without pre-labeled sentences/words.Furthermore,it considers the importance of entity in complaint reports to ensure factual consistency of summary.Experimental results on the customer review datasets(Yelp and Amazon)and complaint report dataset(complaint reports of Shenyang in China)show that the proposed framework outperforms state-of-the-art approaches in ROUGE scores and human evaluation.It unveils the effectiveness of our approach to helping in dealing with complaint reports.
基金supported by the National Social Science Foundation of China(2017CG29)the Science and Technology Research Project of Chongqing Municipal Education Commission(2019CJ50)the Natural Science Foundation of Chongqing(2017CC29).
文摘The existing abstractive text summarisation models only consider the word sequence correlations between the source document and the reference summary,and the summary generated by models lacks the cover of the subject of source document due to models'small perspective.In order to make up these disadvantages,a multi‐domain attention pointer(MDA‐Pointer)abstractive summarisation model is proposed in this work.First,the model uses bidirectional long short‐term memory to encode,respectively,the word and sentence sequence of source document for obtaining the semantic representations at word and sentence level.Furthermore,the multi‐domain attention mechanism between the semantic representations and the summary word is established,and the proposed model can generate summary words under the proposed attention mechanism based on the words and sen-tences.Then,the words are extracted from the vocabulary or the original word sequences through the pointer network to form the summary,and the coverage mechanism is introduced,respectively,into word and sentence level to reduce the redundancy of sum-mary content.Finally,experiment validation is conducted on CNN/Daily Mail dataset.ROUGE evaluation indexes of the model without and with the coverage mechanism are improved respectively,and the results verify the validation of model proposed by this paper.
文摘With the continuous growth of online news articles,there arises the necessity for an efficient abstractive summarization technique for the problem of information overloading.Abstractive summarization is highly complex and requires a deeper understanding and proper reasoning to come up with its own summary outline.Abstractive summarization task is framed as seq2seq modeling.Existing seq2seq methods perform better on short sequences;however,for long sequences,the performance degrades due to high computation and hence a two-phase self-normalized deep neural document summarization model consisting of improvised extractive cosine normalization and seq2seq abstractive phases has been proposed in this paper.The novelty is to parallelize the sequence computation training by incorporating feed-forward,the self-normalized neural network in the Extractive phase using Intra Cosine Attention Similarity(Ext-ICAS)with sentence dependency position.Also,it does not require any normalization technique explicitly.Our proposed abstractive Bidirectional Long Short Term Memory(Bi-LSTM)encoder sequence model performs better than the Bidirectional Gated Recurrent Unit(Bi-GRU)encoder with minimum training loss and with fast convergence.The proposed model was evaluated on the Cable News Network(CNN)/Daily Mail dataset and an average rouge score of 0.435 was achieved also computational training in the extractive phase was reduced by 59%with an average number of similarity computations.
基金funded by Vietnam National Foundation for Science and Technology Development(NAFOSTED)under Grant Number 102.05-2020.26。
文摘Text summarization aims to generate a concise version of the original text.The longer the summary text is,themore detailed it will be fromthe original text,and this depends on the intended use.Therefore,the problem of generating summary texts with desired lengths is a vital task to put the research into practice.To solve this problem,in this paper,we propose a new method to integrate the desired length of the summarized text into the encoder-decoder model for the abstractive text summarization problem.This length parameter is integrated into the encoding phase at each self-attention step and the decoding process by preserving the remaining length for calculating headattention in the generation process and using it as length embeddings added to theword embeddings.We conducted experiments for the proposed model on the two data sets,Cable News Network(CNN)Daily and NEWSROOM,with different desired output lengths.The obtained results show the proposed model’s effectiveness compared with related studies.
基金This work was supported by the National Natural Science Foundation of China under Grant Nos.62172149,61632009,62172159,and 62172372the Natural Science Foundation of Hunan Province of China under Grant No.2021JJ30137the Open Project of ZHEJIANG LAB under Grant No.2019KE0AB02.
文摘Text summarization is an important task in natural language processing and it has been applied in many applications.Recently,abstractive summarization has attracted many attentions.However,the traditional evaluation metrics that consider little semantic information,are unsuitable for evaluating the quality of deep learning based abstractive summarization models,since these models may generate new words that do not exist in the original text.Moreover,the out-of-vocabulary(OOV)problem that affects the evaluation results,has not been well solved yet.To address these issues,we propose a novel model called ENMS,to enhance existing N-gram based evaluation metrics with semantics.To be specific,we present two types of methods:N-gram based Semantic Matching(NSM for short),and N-gram based Semantic Similarity(NSS for short),to improve several widely-used evaluation metrics including ROUGE(Recall-Oriented Understudy for Gisting Evaluation),BLEU(Bilingual Evaluation Understudy),etc.NSM and NSS work in different ways.The former calculates the matching degree directly,while the latter mainly improves the similarity measurement.Moreover we propose an N-gram representation mechanism to explore the vector representation of N-grams(including skip-grams).It serves as the basis of our ENMS model,in which we exploit some simple but effective integration methods to solve the OOV problem efficiently.Experimental results over the TAC AESOP dataset show that the metrics improved by our methods are well correlated with human judgements and can be used to better evaluate abstractive summarization methods.
基金This research was funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project Number(PNURSP2022R113),Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘In recent research,deep learning algorithms have presented effective representation learning models for natural languages.The deep learningbased models create better data representation than classical models.They are capable of automated extraction of distributed representation of texts.In this research,we introduce a new tree Extractive text summarization that is characterized by fitting the text structure representation in knowledge base training module,and also addresses memory issues that were not addresses before.The proposed model employs a tree structured mechanism to generate the phrase and text embedding.The proposed architecture mimics the tree configuration of the text-texts and provide better feature representation.It also incorporates an attention mechanism that offers an additional information source to conduct better summary extraction.The novel model addresses text summarization as a classification process,where the model calculates the probabilities of phrase and text-summary association.The model classification is divided into multiple features recognition such as information entropy,significance,redundancy and position.The model was assessed on two datasets,on the Multi-Doc Composition Query(MCQ)and Dual Attention Composition dataset(DAC)dataset.The experimental results prove that our proposed model has better summarization precision vs.other models by a considerable margin.
文摘Nowadays,data is very rapidly increasing in every domain such as social media,news,education,banking,etc.Most of the data and information is in the form of text.Most of the text contains little invaluable information and knowledge with lots of unwanted contents.To fetch this valuable information out of the huge text document,we need summarizer which is capable to extract data automatically and at the same time capable to summarize the document,particularly textual text in novel document,without losing its any vital information.The summarization could be in the form of extractive and abstractive summarization.The extractive summarization includes picking sentences of high rank from the text constructed by using sentence and word features and then putting them together to produced summary.An abstractive summarization is based on understanding the key ideas in the given text and then expressing those ideas in pure natural language.The abstractive summarization is the latest problem area for NLP(natural language processing),ML(Machine Learning)and NN(Neural Network)In this paper,the foremost techniques for automatic text summarization processes are defined.The different existing methods have been reviewed.Their effectiveness and limitations are described.Further the novel approach based on Neural Network and LSTM has been discussed.In Machine Learning approach the architecture of the underlying concept is called Encoder-Decoder.
基金supported by the National Key Research and Development Program of China (2018YFC0830105,2018YFC 0830101,2018YFC0830100)the National Natural Science Foundation of China (Grant Nos.61972186,61762056,61472168)+1 种基金the Yunnan Provincial Major Science and Technology Special Plan Projects (202002AD080001)the General Projects of Basic Research in Yunnan Province (202001AT070046,202001AT070047).
文摘Automatically generating a brief summary for legal-related public opinion news(LPO-news,which contains legal words or phrases)plays an important role in rapid and effective public opinion disposal.For LPO-news,the critical case elements which are significant parts of the summary may be mentioned several times in the reader comments.Consequently,we investigate the task of comment-aware abstractive text summarization for LPO-news,which can generate salient summary by learning pivotal case elements from the reader comments.In this paper,we present a hierarchical comment-aware encoder(HCAE),which contains four components:1)a traditional sequenceto-sequence framework as our baseline;2)a selective denoising module to filter the noisy of comments and distinguish the case elements;3)a merge module by coupling the source article and comments to yield comment-aware context representation;4)a recoding module to capture the interaction among the source article words conditioned on the comments.Extensive experiments are conducted on a large dataset of legal public opinion news collected from micro-blog,and results show that the proposed model outperforms several existing state-of-the-art baseline models under the ROUGE metrics.