The rapid growth of social networks has produced an unprecedented amount of user-generated data, which provides an excellent opportunity for text mining. Authorship analysis, an important part of text mining, attempts...The rapid growth of social networks has produced an unprecedented amount of user-generated data, which provides an excellent opportunity for text mining. Authorship analysis, an important part of text mining, attempts to learn about the author of the text through subtle variations in the writing styles that occur between gender, age and social groups. Such information has a variety of applications including advertising and law enforcement. One of the most accessible sources of user-generated data is Twitter, which makes the majority of its user data freely available through its data access API. In this study we seek to identify the gender of users on Twitter using Perceptron and Nai ve Bayes with selected 1 through 5-gram features from tweet text. Stream applications of these algorithms were employed for gender prediction to handle the speed and volume of tweet traffic. Because informal text, such as tweets, cannot be easily evaluated using traditional dictionary methods, n-gram features were implemented in this study to represent streaming tweets. The large number of 1 through 5-grams requires that only a subset of them be used in gender classification, for this reason informative n-gram features were chosen using multiple selection algorithms. In the best case the Naive Bayes and Perceptron algorithms produced accuracy, balanced accuracy, and F-measure above 99%.展开更多
Text-to-video artificial intelligence(AI)is a new product that has arisen from the continuous development of digital technology over recent years.The emergence of various text-to-video AI models,including Sora,is driv...Text-to-video artificial intelligence(AI)is a new product that has arisen from the continuous development of digital technology over recent years.The emergence of various text-to-video AI models,including Sora,is driving the proliferation of content generated through concrete imagery.However,the content generated by text-to-video AI raises significant issues such as unclear work identification,ambiguous copyright ownership,and widespread copyright infringement.These issues can hinder the development of text-to-video AI in the creative fields and impede the prosperity of China’s social and cultural arts.Therefore,this paper proposes three recommendations within a legal framework:(a)categorizing the content generated by text-to-video AI as audiovisual works;(b)clarifying the copyright ownership model for text-to-video AI works;(c)reasonably delineating the responsibilities of the parties who are involved in the text-to-video AI works.The aim is to mitigate the copyright risks associated with content generated by text-to-video AI and to promote the healthy development of text-to-video AI in the creative fields.展开更多
文摘The rapid growth of social networks has produced an unprecedented amount of user-generated data, which provides an excellent opportunity for text mining. Authorship analysis, an important part of text mining, attempts to learn about the author of the text through subtle variations in the writing styles that occur between gender, age and social groups. Such information has a variety of applications including advertising and law enforcement. One of the most accessible sources of user-generated data is Twitter, which makes the majority of its user data freely available through its data access API. In this study we seek to identify the gender of users on Twitter using Perceptron and Nai ve Bayes with selected 1 through 5-gram features from tweet text. Stream applications of these algorithms were employed for gender prediction to handle the speed and volume of tweet traffic. Because informal text, such as tweets, cannot be easily evaluated using traditional dictionary methods, n-gram features were implemented in this study to represent streaming tweets. The large number of 1 through 5-grams requires that only a subset of them be used in gender classification, for this reason informative n-gram features were chosen using multiple selection algorithms. In the best case the Naive Bayes and Perceptron algorithms produced accuracy, balanced accuracy, and F-measure above 99%.
基金This research is supported by“Research on Legal Issues Caused by Sora from the Perspective of Copyright Law”(YK20240094)of the Xihua University Science and Technology Innovation Competition Project for Postgraduate Students(cultivation project).
文摘Text-to-video artificial intelligence(AI)is a new product that has arisen from the continuous development of digital technology over recent years.The emergence of various text-to-video AI models,including Sora,is driving the proliferation of content generated through concrete imagery.However,the content generated by text-to-video AI raises significant issues such as unclear work identification,ambiguous copyright ownership,and widespread copyright infringement.These issues can hinder the development of text-to-video AI in the creative fields and impede the prosperity of China’s social and cultural arts.Therefore,this paper proposes three recommendations within a legal framework:(a)categorizing the content generated by text-to-video AI as audiovisual works;(b)clarifying the copyright ownership model for text-to-video AI works;(c)reasonably delineating the responsibilities of the parties who are involved in the text-to-video AI works.The aim is to mitigate the copyright risks associated with content generated by text-to-video AI and to promote the healthy development of text-to-video AI in the creative fields.