摘要
Traditional topic models have been widely used for analyzing semantic topics from electronic documents.However,the obvious defects of topic words acquired by them are poor in readability and consistency.Only the domain experts are possible to guess their meaning.In fact,phrases are the main unit for people to express semantics.This paper presents a Distributed Representation-Phrase Latent Dirichlet Allocation(DR-Phrase LDA)which is a phrase topic model.Specifically,we reasonably enhance the semantic information of phrases via distributed representation in this model.The experimental results show the topics quality acquired by our model is more readable and consistent than other similar topic models.
基金
This work was supported by the Project of Industry and University Cooperative Research of Jiangsu Province,China(No.BY2019051)
Ma,J.would like to thank the Jiangsu Eazytec Information Technology Company(www.eazytec.com)for their financial support.