摘要
为高效地发现满足用户需求的Web服务,针对Web服务的描述文本较短、缺乏足够有效信息的问题,提出一种基于Word2Vec和LDA主题模型的Web服务聚类方法。该方法首先将Wikipedia语料库作为扩充源,使用word2vec对Web服务描述文档内容进行扩充,再将扩充后的描述文档利用主题模型进行特征建模,将短文本主题建模转化为长文本主题建模,更准确地实现服务内容主题表达,最后根据文档的主题分布矩阵寻找相似的服务并完成聚类,使用从ProgrammableWeb收集的真实数据进行实验。研究结果表明:本文方法与TFIDF-K,LDA,WT-LDA和LDA-K方法相比,F分别提高419.74%,20.11%,15.60%和27.80%,利用扩充后的Web服务的描述文档进行聚类的方法能够有效提高Web服务聚类的效果。
Considering that the description text of Web service is short and lack of enough effective information,a Web service clustering method was proposed based on Word2Vec and LDA topic model in order to find the Web service that meets user’s needs efficiently.Firstly,Wikipedia corpus was used as an extension source,and Word2Vec was used to extend the content of Web service description document,and then the expanded description document was modeled using the topic model.The short text topic modeling was transformed into a long text topic modeling,which achieved the topic of service content expression more accurately.Finally the similar service was found based on the topic distribution matrix of the document and the clustering was completed.Real data from ProgrammableWeb was used to carry out experiments.The results show that F obtained by the method increases by419.74%,20.11%,15.60%,27.80%,respectively,compared with those using TFIDF-K,LDA,WT-LDA and LDA-K.The use of extended Web service description documents clustering method can effectively improve the effectiveness of Web service clustering.
作者
肖巧翔
曹步清
张祥平
刘建勋
李晏新闻
XIAO Qiaoxiang;CAO Buqing;ZHANG Xiangping;LIU Jianxun;LI Yanxinwen(Hunan University of Science & Technology, Xiangtan 411201, China;State Key Laboratory of Networking and Switching Technology,Beijing University of Posts and Telecommunications, Beijing 100876, China;College of Navigation, Quanzhou Normal University, Quanzhou 362699, China)
出处
《中南大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2018年第12期2979-2985,共7页
Journal of Central South University:Science and Technology
基金
国家自然科学基金资助项目(61873316
61872139)
湖南省自然科学基金资助项目(2017JJ2098)
网络与交换技术国家重点实验室(北京邮电大学)开放课题(SKLNST-2016-2-26)~~