The topic recognition for dynamic topic number can realize the dynamic update of super parameters,and obtain the probability distribution of dynamic topics in time dimension,which helps to clear the understanding and ...The topic recognition for dynamic topic number can realize the dynamic update of super parameters,and obtain the probability distribution of dynamic topics in time dimension,which helps to clear the understanding and tracking of convection text data.However,the current topic recognition model tends to be based on a fixed number of topics K and lacks multi-granularity analysis of subject knowledge.Therefore,it is impossible to deeply perceive the dynamic change of the topic in the time series.By introducing a novel approach on the basis of Infinite Latent Dirichlet allocation model,a topic feature lattice under the dynamic topic number is constructed.In the model,documents,topics and vocabularies are jointly modeled to generate two probability distribution matrices:Documentstopics and topic-feature words.Afterwards,the association intensity is computed between the topic and its feature vocabulary to establish the topic formal context matrix.Finally,the topic feature is induced according to the formal concept analysis(FCA)theory.The topic feature lattice under dynamic topic number(TFL DTN)model is validated on the real dataset by comparing with the mainstream methods.Experiments show that this model is more in line with actual needs,and achieves better results in semi-automatic modeling of topic visualization analysis.展开更多
To improve the accuracy of text clustering, fuzzy c-means clustering based on topic concept sub-space (TCS2FCM) is introduced for classifying texts. Five evaluation functions are combined to extract key phrases. Con...To improve the accuracy of text clustering, fuzzy c-means clustering based on topic concept sub-space (TCS2FCM) is introduced for classifying texts. Five evaluation functions are combined to extract key phrases. Concept phrases, as well as the descriptions of final clusters, are presented using WordNet origin from key phrases. Initial centers and membership matrix are the most important factors affecting clustering performance. Orthogonal concept topic sub-spaces are built with the topic concept phrases representing topics of the texts and the initialization of centers and the membership matrix depend on the concept vectors in sub-spaces. The results show that, different from random initialization of traditional fuzzy c-means clustering, the initialization related to text content contributions can improve clustering precision.展开更多
基金the Key Projects of Social Sciences of Anhui Provincial Department of Education(SK2018A1064,SK2018A1072)the Natural Scientific Project of Anhui Provincial Department of Education(KJ2019A0371)Innovation Team of Health Information Management and Application Research(BYKC201913),BBMC。
文摘The topic recognition for dynamic topic number can realize the dynamic update of super parameters,and obtain the probability distribution of dynamic topics in time dimension,which helps to clear the understanding and tracking of convection text data.However,the current topic recognition model tends to be based on a fixed number of topics K and lacks multi-granularity analysis of subject knowledge.Therefore,it is impossible to deeply perceive the dynamic change of the topic in the time series.By introducing a novel approach on the basis of Infinite Latent Dirichlet allocation model,a topic feature lattice under the dynamic topic number is constructed.In the model,documents,topics and vocabularies are jointly modeled to generate two probability distribution matrices:Documentstopics and topic-feature words.Afterwards,the association intensity is computed between the topic and its feature vocabulary to establish the topic formal context matrix.Finally,the topic feature is induced according to the formal concept analysis(FCA)theory.The topic feature lattice under dynamic topic number(TFL DTN)model is validated on the real dataset by comparing with the mainstream methods.Experiments show that this model is more in line with actual needs,and achieves better results in semi-automatic modeling of topic visualization analysis.
基金The National Natural Science Foundation of China(No60672056)Open Fund of MOE-MS Key Laboratory of Multime-dia Computing and Communication(No06120809)
文摘To improve the accuracy of text clustering, fuzzy c-means clustering based on topic concept sub-space (TCS2FCM) is introduced for classifying texts. Five evaluation functions are combined to extract key phrases. Concept phrases, as well as the descriptions of final clusters, are presented using WordNet origin from key phrases. Initial centers and membership matrix are the most important factors affecting clustering performance. Orthogonal concept topic sub-spaces are built with the topic concept phrases representing topics of the texts and the initialization of centers and the membership matrix depend on the concept vectors in sub-spaces. The results show that, different from random initialization of traditional fuzzy c-means clustering, the initialization related to text content contributions can improve clustering precision.