Social media websites allow users to exchange short texts such as tweets via microblogs and user status in friendship networks. Their limited length, pervasive abbrevi- ations, and coined acronyms and words exacerbate...Social media websites allow users to exchange short texts such as tweets via microblogs and user status in friendship networks. Their limited length, pervasive abbrevi- ations, and coined acronyms and words exacerbate the prob- lems of synonymy and polysemy, and bring about new chal- lenges to data mining applications such as text clustering and classification. To address these issues, we dissect some poten- tial causes and devise an efficient approach that enriches data representation by employing machine translation to increase the number of features from different languages. Then we propose a novel framework which performs multi-language knowledge integration and feature reduction simultaneously through matrix factorization techniques. The proposed ap- proach is evaluated extensively in terms of effectiveness on two social media datasets from Facebook and Twitter. With its significant performance improvement, we further investi- gate potential factors that contribute to the improved perfor- mance.展开更多
To remove handwritten texts from an image of a document taken by smart phone,an intelligent removal method was proposed that combines dewarping and Fully Convolutional Network with Atrous Convolutional and Atrous Spat...To remove handwritten texts from an image of a document taken by smart phone,an intelligent removal method was proposed that combines dewarping and Fully Convolutional Network with Atrous Convolutional and Atrous Spatial Pyramid Pooling(FCN-AC-ASPP).For a picture taken by a smart phone,firstly,the image is transformed into a regular image by the dewarping algorithm.Secondly,the FCN-AC-ASPP is used to classify printed texts and handwritten texts.Lastly,handwritten texts can be removed by a simple algorithm.Experiments show that the classification accuracy of the FCN-AC-ASPP is better than FCN,DeeplabV3+,FCN-AC.For handwritten texts removal effect,the method of combining dewarping and FCN-AC-ASPP is superior to FCN-AC-ASP alone.展开更多
The assessment of translation quality in political texts is primarily based on achieving effective communication.Throughout the translation process,it is essential to not only accurately convey the original content bu...The assessment of translation quality in political texts is primarily based on achieving effective communication.Throughout the translation process,it is essential to not only accurately convey the original content but also effectively transform the structural mechanisms of the source language.In the translation reconstruction of political texts,various textual cohesion methods are often employed,with conjunctions serving as a primary means for semantic coherence within text units.展开更多
End-to-end text spotting is a vital computer vision task that aims to integrate scene text detection and recognition into a unified framework.Typical methods heavily rely on region-of-interest(Rol)operations to extrac...End-to-end text spotting is a vital computer vision task that aims to integrate scene text detection and recognition into a unified framework.Typical methods heavily rely on region-of-interest(Rol)operations to extract local features and complex post-processing steps to produce final predictions.To address these limitations,we propose TextFormer,a query-based end-to-end text spotter with a transformer architecture.Specifically,using query embedding per text instance,TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multitask modeling.It allows for mutual training and optimization of classification,segmentation and recognition branches,resulting in deeper feature sharing without sacrificing flexibility or simplicity.Additionally,we design an adaptive global aggregation(AGG)module to transfer global features into sequential features for reading arbitrarilyshaped texts,which overcomes the suboptimization problem of Rol operations.Furthermore,potential corpus information is utilized from weak annotations to full labels through mixed supervision,further improving text detection and end-to-end text spotting results.Extensive experiments on various bilingual(i.e.,English and Chinese)benchmarks demonstrate the superiority of our method.Especially on the TDA-ReCTS dataset,TextFormer surpasses the state-of-the-art method in terms of 1-NED by 13.2%.展开更多
This study aims to construct an automotive English corpus to comprehensively compare the differences between English automotive marketing texts and their Chinese translations.The objective is to reveal challenges and ...This study aims to construct an automotive English corpus to comprehensively compare the differences between English automotive marketing texts and their Chinese translations.The objective is to reveal challenges and opportunities in cultural and contextual translation.The research holds significant importance for understanding the impact of cross-cultural communication in the automotive market and providing more effective translation strategies for multinational automotive manufacturers.Through corpus analysis,focusing on common marketing phrases and text features,employing both quantitative and qualitative analysis methods,and examining the accuracy,naturalness,and cultural adaptability of translated texts,we delve into the similarities and differences in conveying automotive information between the two languages.The study finds that expressive and emotional expressions commonly used in English automotive contexts may encounter challenges in Chinese translations due to language and cultural differences.This necessitates the adoption of more flexible translation strategies.Additionally,Chinese translations tend to emphasize the practicality and safety of products more than their English counterparts,placing a greater emphasis on technical and functional descriptions.The primary conclusion of this research is that the translation of automotive marketing texts requires heightened cross-cultural sensitivity and an understanding of the target audience.When translating automotive advertisements and promotions,translators should consider consumer expectations and cultural values in different contexts to ensure the effectiveness and adaptability of the translation.Furthermore,the formulation of more flexible translation strategies,integrating local culture and market demands,will contribute to enhancing the image and influence of automotive brands in the international market.Through this study,we provide deeper insights for automotive manufacturers,assisting them in leveraging the power of language for successful globa展开更多
When someone threatens or humiliates another person online by sending those unpleasant messages or comments, this is known as Cyberbullying. Recently, Bangla text has been used much more often on social media. People ...When someone threatens or humiliates another person online by sending those unpleasant messages or comments, this is known as Cyberbullying. Recently, Bangla text has been used much more often on social media. People communicate with others on social media through messages and comments. So bullies use social media as a rich environment to bully others, especially on political issues. Fights over Cyberbullying on political and social media posts are common today. Most of the time, it does a lot of damage. However, few works have been done for monitoring Bangla text on social media & no work has been done yet for detecting the bullying Bangla text on political issues due to the lack of annotated corpora and morphologic analyzers. In this work, we used several machine learning classifiers & a model. That will help to detect the Bangla bullying texts on social media. For this work, 11,000 Bangla texts have been collected from the comments section of political Facebook posts to make a new dataset and labelled the data as either bullied or not. This dataset has been used to train the machine learning classifier. The results indicate that Random Forest achieves superior accuracy of 91.08%.展开更多
文摘Social media websites allow users to exchange short texts such as tweets via microblogs and user status in friendship networks. Their limited length, pervasive abbrevi- ations, and coined acronyms and words exacerbate the prob- lems of synonymy and polysemy, and bring about new chal- lenges to data mining applications such as text clustering and classification. To address these issues, we dissect some poten- tial causes and devise an efficient approach that enriches data representation by employing machine translation to increase the number of features from different languages. Then we propose a novel framework which performs multi-language knowledge integration and feature reduction simultaneously through matrix factorization techniques. The proposed ap- proach is evaluated extensively in terms of effectiveness on two social media datasets from Facebook and Twitter. With its significant performance improvement, we further investi- gate potential factors that contribute to the improved perfor- mance.
基金Sponsored by the Scientific Research Project of Zhejiang Provincial Department of Education(Grant No.KYY-ZX-20210329).
文摘To remove handwritten texts from an image of a document taken by smart phone,an intelligent removal method was proposed that combines dewarping and Fully Convolutional Network with Atrous Convolutional and Atrous Spatial Pyramid Pooling(FCN-AC-ASPP).For a picture taken by a smart phone,firstly,the image is transformed into a regular image by the dewarping algorithm.Secondly,the FCN-AC-ASPP is used to classify printed texts and handwritten texts.Lastly,handwritten texts can be removed by a simple algorithm.Experiments show that the classification accuracy of the FCN-AC-ASPP is better than FCN,DeeplabV3+,FCN-AC.For handwritten texts removal effect,the method of combining dewarping and FCN-AC-ASPP is superior to FCN-AC-ASP alone.
基金This article is a phased achievement of the 2020 research project“Research on Chinese-Russian Translation of Political Terminology Based on Corpora”(YB2020005)by CNTERM.
文摘The assessment of translation quality in political texts is primarily based on achieving effective communication.Throughout the translation process,it is essential to not only accurately convey the original content but also effectively transform the structural mechanisms of the source language.In the translation reconstruction of political texts,various textual cohesion methods are often employed,with conjunctions serving as a primary means for semantic coherence within text units.
基金supported by the National Natural Science Foundation of China(No.61902027).
文摘End-to-end text spotting is a vital computer vision task that aims to integrate scene text detection and recognition into a unified framework.Typical methods heavily rely on region-of-interest(Rol)operations to extract local features and complex post-processing steps to produce final predictions.To address these limitations,we propose TextFormer,a query-based end-to-end text spotter with a transformer architecture.Specifically,using query embedding per text instance,TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multitask modeling.It allows for mutual training and optimization of classification,segmentation and recognition branches,resulting in deeper feature sharing without sacrificing flexibility or simplicity.Additionally,we design an adaptive global aggregation(AGG)module to transfer global features into sequential features for reading arbitrarilyshaped texts,which overcomes the suboptimization problem of Rol operations.Furthermore,potential corpus information is utilized from weak annotations to full labels through mixed supervision,further improving text detection and end-to-end text spotting results.Extensive experiments on various bilingual(i.e.,English and Chinese)benchmarks demonstrate the superiority of our method.Especially on the TDA-ReCTS dataset,TextFormer surpasses the state-of-the-art method in terms of 1-NED by 13.2%.
文摘This study aims to construct an automotive English corpus to comprehensively compare the differences between English automotive marketing texts and their Chinese translations.The objective is to reveal challenges and opportunities in cultural and contextual translation.The research holds significant importance for understanding the impact of cross-cultural communication in the automotive market and providing more effective translation strategies for multinational automotive manufacturers.Through corpus analysis,focusing on common marketing phrases and text features,employing both quantitative and qualitative analysis methods,and examining the accuracy,naturalness,and cultural adaptability of translated texts,we delve into the similarities and differences in conveying automotive information between the two languages.The study finds that expressive and emotional expressions commonly used in English automotive contexts may encounter challenges in Chinese translations due to language and cultural differences.This necessitates the adoption of more flexible translation strategies.Additionally,Chinese translations tend to emphasize the practicality and safety of products more than their English counterparts,placing a greater emphasis on technical and functional descriptions.The primary conclusion of this research is that the translation of automotive marketing texts requires heightened cross-cultural sensitivity and an understanding of the target audience.When translating automotive advertisements and promotions,translators should consider consumer expectations and cultural values in different contexts to ensure the effectiveness and adaptability of the translation.Furthermore,the formulation of more flexible translation strategies,integrating local culture and market demands,will contribute to enhancing the image and influence of automotive brands in the international market.Through this study,we provide deeper insights for automotive manufacturers,assisting them in leveraging the power of language for successful globa
文摘When someone threatens or humiliates another person online by sending those unpleasant messages or comments, this is known as Cyberbullying. Recently, Bangla text has been used much more often on social media. People communicate with others on social media through messages and comments. So bullies use social media as a rich environment to bully others, especially on political issues. Fights over Cyberbullying on political and social media posts are common today. Most of the time, it does a lot of damage. However, few works have been done for monitoring Bangla text on social media & no work has been done yet for detecting the bullying Bangla text on political issues due to the lack of annotated corpora and morphologic analyzers. In this work, we used several machine learning classifiers & a model. That will help to detect the Bangla bullying texts on social media. For this work, 11,000 Bangla texts have been collected from the comments section of political Facebook posts to make a new dataset and labelled the data as either bullied or not. This dataset has been used to train the machine learning classifier. The results indicate that Random Forest achieves superior accuracy of 91.08%.