In recent years,the usage of social networking sites has considerably increased in the Arab world.It has empowered individuals to express their opinions,especially in politics.Furthermore,various organizations that op...In recent years,the usage of social networking sites has considerably increased in the Arab world.It has empowered individuals to express their opinions,especially in politics.Furthermore,various organizations that operate in the Arab countries have embraced social media in their day-to-day business activities at different scales.This is attributed to business owners’understanding of social media’s importance for business development.However,the Arabic morphology is too complicated to understand due to the availability of nearly 10,000 roots and more than 900 patterns that act as the basis for verbs and nouns.Hate speech over online social networking sites turns out to be a worldwide issue that reduces the cohesion of civil societies.In this background,the current study develops a Chaotic Elephant Herd Optimization with Machine Learning for Hate Speech Detection(CEHOML-HSD)model in the context of the Arabic language.The presented CEHOML-HSD model majorly concentrates on identifying and categorising the Arabic text into hate speech and normal.To attain this,the CEHOML-HSD model follows different sub-processes as discussed herewith.At the initial stage,the CEHOML-HSD model undergoes data pre-processing with the help of the TF-IDF vectorizer.Secondly,the Support Vector Machine(SVM)model is utilized to detect and classify the hate speech texts made in the Arabic language.Lastly,the CEHO approach is employed for fine-tuning the parameters involved in SVM.This CEHO approach is developed by combining the chaotic functions with the classical EHO algorithm.The design of the CEHO algorithm for parameter tuning shows the novelty of the work.A widespread experimental analysis was executed to validate the enhanced performance of the proposed CEHOML-HSD approach.The comparative study outcomes established the supremacy of the proposed CEHOML-HSD model over other approaches.展开更多
Large Language Models(LLMs)are increasingly demonstrating their ability to understand natural language and solve complex tasks,especially through text generation.One of the relevant capabilities is contextual learning...Large Language Models(LLMs)are increasingly demonstrating their ability to understand natural language and solve complex tasks,especially through text generation.One of the relevant capabilities is contextual learning,which involves the ability to receive instructions in natural language or task demonstrations to generate expected outputs for test instances without the need for additional training or gradient updates.In recent years,the popularity of social networking has provided a medium through which some users can engage in offensive and harmful online behavior.In this study,we investigate the ability of different LLMs,ranging from zero-shot and few-shot learning to fine-tuning.Our experiments show that LLMs can identify sexist and hateful online texts using zero-shot and few-shot approaches through information retrieval.Furthermore,it is found that the encoder-decoder model called Zephyr achieves the best results with the fine-tuning approach,scoring 86.811%on the Explainable Detection of Online Sexism(EDOS)test-set and 57.453%on the Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter(HatEval)test-set.Finally,it is confirmed that the evaluated models perform well in hate text detection,as they beat the best result in the HatEval task leaderboard.The error analysis shows that contextual learning had difficulty distinguishing between types of hate speech and figurative language.However,the fine-tuned approach tends to produce many false positives.展开更多
Detecting hate speech automatically in social media forensics has emerged as a highly challenging task due tothe complex nature of language used in such platforms. Currently, several methods exist for classifying hate...Detecting hate speech automatically in social media forensics has emerged as a highly challenging task due tothe complex nature of language used in such platforms. Currently, several methods exist for classifying hatespeech, but they still suffer from ambiguity when differentiating between hateful and offensive content and theyalso lack accuracy. The work suggested in this paper uses a combination of the Whale Optimization Algorithm(WOA) and Particle Swarm Optimization (PSO) to adjust the weights of two Multi-Layer Perceptron (MLPs)for neutrosophic sets classification. During the training process of the MLP, the WOA is employed to exploreand determine the optimal set of weights. The PSO algorithm adjusts the weights to optimize the performanceof the MLP as fine-tuning. Additionally, in this approach, two separate MLP models are employed. One MLPis dedicated to predicting degrees of truth membership, while the other MLP focuses on predicting degrees offalse membership. The difference between these memberships quantifies uncertainty, indicating the degree ofindeterminacy in predictions. The experimental results indicate the superior performance of our model comparedto previous work when evaluated on the Davidson dataset.展开更多
Automatic identification of cyberbullying is a problem that is gaining traction,especially in the Machine Learning areas.Not only is it complicated,but it has also become a pressing necessity,considering how social me...Automatic identification of cyberbullying is a problem that is gaining traction,especially in the Machine Learning areas.Not only is it complicated,but it has also become a pressing necessity,considering how social media has become an integral part of adolescents’lives and how serious the impacts of cyberbullying and online harassment can be,particularly among teenagers.This paper contains a systematic literature review of modern strategies,machine learning methods,and technical means for detecting cyberbullying and the aggressive command of an individual in the information space of the Internet.We undertake an in-depth review of 13 papers from four scientific databases.The article provides an overview of scientific literature to analyze the problem of cyberbullying detection from the point of view of machine learning and natural language processing.In this review,we consider a cyberbullying detection framework on social media platforms,which includes data collection,data processing,feature selection,feature extraction,and the application ofmachine learning to classify whether texts contain cyberbullying or not.This article seeks to guide future research on this topic toward a more consistent perspective with the phenomenon’s description and depiction,allowing future solutions to be more practical and effective.展开更多
Purpose-Hate speech is an expression of intense hatred.Twitter has become a popular analytical tool for the prediction and monitoring of abusive behaviors.Hate speech detection with social media data has witnessed spe...Purpose-Hate speech is an expression of intense hatred.Twitter has become a popular analytical tool for the prediction and monitoring of abusive behaviors.Hate speech detection with social media data has witnessed special research attention in recent studies,hence,the need to design a generic metadata architecture and efficient feature extraction technique to enhance hate speech detection.Design/methodology/approach-This study proposes a hybrid embeddings enhanced with a topic inference method and an improved cuckoo search neural network for hate speech detection in Twitter data.The proposed method uses a hybrid embeddings technique that includes Term Frequency-Inverse Document Frequency(TF-IDF)for word-level feature extraction and Long Short Term Memory(LSTM)which is a variant of recurrent neural networks architecture for sentence-level feature extraction.The extracted features from the hybrid embeddings then serve as input into the improved cuckoo search neural network for the prediction of a tweet as hate speech,offensive language or neither.Findings-The proposed method showed better results when tested on the collected Twitter datasets compared to other related methods.In order to validate the performances of the proposed method,t-test and post hoc multiple comparisons were used to compare the significance and means of the proposed method with other related methods for hate speech detection.Furthermore,Paired Sample t-Test was also conducted to validate the performances of the proposed method with other related methods.Research limitations/implications-Finally,the evaluation results showed that the proposed method outperforms other related methods with mean F1-score of 91.3.Originality/value-The main novelty of this study is the use of an automatic topic spotting measure based on na€ıve Bayes model to improve features representation.展开更多
基金Princess Nourah bint Abdulrahman University Researchers Supporting Project Number(PNURSP2024R263)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.This study is supported via funding from Prince Sattam bin Abdulaziz University Project Number(PSAU/2024/R/1445).
文摘In recent years,the usage of social networking sites has considerably increased in the Arab world.It has empowered individuals to express their opinions,especially in politics.Furthermore,various organizations that operate in the Arab countries have embraced social media in their day-to-day business activities at different scales.This is attributed to business owners’understanding of social media’s importance for business development.However,the Arabic morphology is too complicated to understand due to the availability of nearly 10,000 roots and more than 900 patterns that act as the basis for verbs and nouns.Hate speech over online social networking sites turns out to be a worldwide issue that reduces the cohesion of civil societies.In this background,the current study develops a Chaotic Elephant Herd Optimization with Machine Learning for Hate Speech Detection(CEHOML-HSD)model in the context of the Arabic language.The presented CEHOML-HSD model majorly concentrates on identifying and categorising the Arabic text into hate speech and normal.To attain this,the CEHOML-HSD model follows different sub-processes as discussed herewith.At the initial stage,the CEHOML-HSD model undergoes data pre-processing with the help of the TF-IDF vectorizer.Secondly,the Support Vector Machine(SVM)model is utilized to detect and classify the hate speech texts made in the Arabic language.Lastly,the CEHO approach is employed for fine-tuning the parameters involved in SVM.This CEHO approach is developed by combining the chaotic functions with the classical EHO algorithm.The design of the CEHO algorithm for parameter tuning shows the novelty of the work.A widespread experimental analysis was executed to validate the enhanced performance of the proposed CEHOML-HSD approach.The comparative study outcomes established the supremacy of the proposed CEHOML-HSD model over other approaches.
基金This work is part of the research projects LaTe4PoliticES(PID2022-138099OBI00)funded by MICIU/AEI/10.13039/501100011033the European Regional Development Fund(ERDF)-A Way of Making Europe and LT-SWM(TED2021-131167B-I00)funded by MICIU/AEI/10.13039/501100011033the European Union NextGenerationEU/PRTR.Mr.Ronghao Pan is supported by the Programa Investigo grant,funded by the Region of Murcia,the Spanish Ministry of Labour and Social Economy and the European Union-NextGenerationEU under the“Plan de Recuperación,Transformación y Resiliencia(PRTR).”。
文摘Large Language Models(LLMs)are increasingly demonstrating their ability to understand natural language and solve complex tasks,especially through text generation.One of the relevant capabilities is contextual learning,which involves the ability to receive instructions in natural language or task demonstrations to generate expected outputs for test instances without the need for additional training or gradient updates.In recent years,the popularity of social networking has provided a medium through which some users can engage in offensive and harmful online behavior.In this study,we investigate the ability of different LLMs,ranging from zero-shot and few-shot learning to fine-tuning.Our experiments show that LLMs can identify sexist and hateful online texts using zero-shot and few-shot approaches through information retrieval.Furthermore,it is found that the encoder-decoder model called Zephyr achieves the best results with the fine-tuning approach,scoring 86.811%on the Explainable Detection of Online Sexism(EDOS)test-set and 57.453%on the Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter(HatEval)test-set.Finally,it is confirmed that the evaluated models perform well in hate text detection,as they beat the best result in the HatEval task leaderboard.The error analysis shows that contextual learning had difficulty distinguishing between types of hate speech and figurative language.However,the fine-tuned approach tends to produce many false positives.
文摘Detecting hate speech automatically in social media forensics has emerged as a highly challenging task due tothe complex nature of language used in such platforms. Currently, several methods exist for classifying hatespeech, but they still suffer from ambiguity when differentiating between hateful and offensive content and theyalso lack accuracy. The work suggested in this paper uses a combination of the Whale Optimization Algorithm(WOA) and Particle Swarm Optimization (PSO) to adjust the weights of two Multi-Layer Perceptron (MLPs)for neutrosophic sets classification. During the training process of the MLP, the WOA is employed to exploreand determine the optimal set of weights. The PSO algorithm adjusts the weights to optimize the performanceof the MLP as fine-tuning. Additionally, in this approach, two separate MLP models are employed. One MLPis dedicated to predicting degrees of truth membership, while the other MLP focuses on predicting degrees offalse membership. The difference between these memberships quantifies uncertainty, indicating the degree ofindeterminacy in predictions. The experimental results indicate the superior performance of our model comparedto previous work when evaluated on the Davidson dataset.
文摘Automatic identification of cyberbullying is a problem that is gaining traction,especially in the Machine Learning areas.Not only is it complicated,but it has also become a pressing necessity,considering how social media has become an integral part of adolescents’lives and how serious the impacts of cyberbullying and online harassment can be,particularly among teenagers.This paper contains a systematic literature review of modern strategies,machine learning methods,and technical means for detecting cyberbullying and the aggressive command of an individual in the information space of the Internet.We undertake an in-depth review of 13 papers from four scientific databases.The article provides an overview of scientific literature to analyze the problem of cyberbullying detection from the point of view of machine learning and natural language processing.In this review,we consider a cyberbullying detection framework on social media platforms,which includes data collection,data processing,feature selection,feature extraction,and the application ofmachine learning to classify whether texts contain cyberbullying or not.This article seeks to guide future research on this topic toward a more consistent perspective with the phenomenon’s description and depiction,allowing future solutions to be more practical and effective.
文摘Purpose-Hate speech is an expression of intense hatred.Twitter has become a popular analytical tool for the prediction and monitoring of abusive behaviors.Hate speech detection with social media data has witnessed special research attention in recent studies,hence,the need to design a generic metadata architecture and efficient feature extraction technique to enhance hate speech detection.Design/methodology/approach-This study proposes a hybrid embeddings enhanced with a topic inference method and an improved cuckoo search neural network for hate speech detection in Twitter data.The proposed method uses a hybrid embeddings technique that includes Term Frequency-Inverse Document Frequency(TF-IDF)for word-level feature extraction and Long Short Term Memory(LSTM)which is a variant of recurrent neural networks architecture for sentence-level feature extraction.The extracted features from the hybrid embeddings then serve as input into the improved cuckoo search neural network for the prediction of a tweet as hate speech,offensive language or neither.Findings-The proposed method showed better results when tested on the collected Twitter datasets compared to other related methods.In order to validate the performances of the proposed method,t-test and post hoc multiple comparisons were used to compare the significance and means of the proposed method with other related methods for hate speech detection.Furthermore,Paired Sample t-Test was also conducted to validate the performances of the proposed method with other related methods.Research limitations/implications-Finally,the evaluation results showed that the proposed method outperforms other related methods with mean F1-score of 91.3.Originality/value-The main novelty of this study is the use of an automatic topic spotting measure based on na€ıve Bayes model to improve features representation.