Visual question answering(VQA)has attracted more and more attention in computer vision and natural language processing.Scholars are committed to studying how to better integrate image features and text features to ach...Visual question answering(VQA)has attracted more and more attention in computer vision and natural language processing.Scholars are committed to studying how to better integrate image features and text features to achieve better results in VQA tasks.Analysis of all features may cause information redundancy and heavy computational burden.Attention mechanism is a wise way to solve this problem.However,using single attention mechanism may cause incomplete concern of features.This paper improves the attention mechanism method and proposes a hybrid attention mechanism that combines the spatial attention mechanism method and the channel attention mechanism method.In the case that the attention mechanism will cause the loss of the original features,a small portion of image features were added as compensation.For the attention mechanism of text features,a selfattention mechanism was introduced,and the internal structural features of sentences were strengthened to improve the overall model.The results show that attention mechanism and feature compensation add 6.1%accuracy to multimodal low-rank bilinear pooling network.展开更多
To find disaster relevant social media messages,current approaches utilize natural language processing methods or machine learning algorithms relying on text only,which have not been perfected due to the variability a...To find disaster relevant social media messages,current approaches utilize natural language processing methods or machine learning algorithms relying on text only,which have not been perfected due to the variability and uncertainty in the language used on social media and ignoring the geographic context of the messages when posted.Meanwhile,a disaster relevant social media message is highly sensitive to its posting location and time.However,limited studies exist to explore what spatial features and the extent of how temporal,and especially spatial features can aid text classification.This paper proposes a geographic context-aware text mining method to incorporate spatial and temporal information derived from social media and authoritative datasets,along with the text information,for classifying disaster relevant social media posts.This work designed and demonstrated how diverse types of spatial and temporal features can be derived from spatial data,and then used to enhance text mining.The deep learning-based method and commonly used machine learning algorithms,assessed the accuracy of the enhanced text-mining method.The performance results of different classification models generated by various combinations of textual,spatial,and temporal features indicate that additional spatial and temporal features help improve the overall accuracy of the classification.展开更多
基金This work was supported by the Sichuan Science and Technology Program(2021YFQ0003).
文摘Visual question answering(VQA)has attracted more and more attention in computer vision and natural language processing.Scholars are committed to studying how to better integrate image features and text features to achieve better results in VQA tasks.Analysis of all features may cause information redundancy and heavy computational burden.Attention mechanism is a wise way to solve this problem.However,using single attention mechanism may cause incomplete concern of features.This paper improves the attention mechanism method and proposes a hybrid attention mechanism that combines the spatial attention mechanism method and the channel attention mechanism method.In the case that the attention mechanism will cause the loss of the original features,a small portion of image features were added as compensation.For the attention mechanism of text features,a selfattention mechanism was introduced,and the internal structural features of sentences were strengthened to improve the overall model.The results show that attention mechanism and feature compensation add 6.1%accuracy to multimodal low-rank bilinear pooling network.
基金the funding support from the Vilas Associates Competition Award at University of Wisconsin-Madison(UW-Madison)the National Science Foundation[grant number 1940091].
文摘To find disaster relevant social media messages,current approaches utilize natural language processing methods or machine learning algorithms relying on text only,which have not been perfected due to the variability and uncertainty in the language used on social media and ignoring the geographic context of the messages when posted.Meanwhile,a disaster relevant social media message is highly sensitive to its posting location and time.However,limited studies exist to explore what spatial features and the extent of how temporal,and especially spatial features can aid text classification.This paper proposes a geographic context-aware text mining method to incorporate spatial and temporal information derived from social media and authoritative datasets,along with the text information,for classifying disaster relevant social media posts.This work designed and demonstrated how diverse types of spatial and temporal features can be derived from spatial data,and then used to enhance text mining.The deep learning-based method and commonly used machine learning algorithms,assessed the accuracy of the enhanced text-mining method.The performance results of different classification models generated by various combinations of textual,spatial,and temporal features indicate that additional spatial and temporal features help improve the overall accuracy of the classification.