摘要
针对传统分词算法、传统提取关键词算法对现代营销活动中以客户为中心,分析客户评论,提取重要客户的需求具有局限性等问题,提出融合信息熵和多权TF-IDF关键词提取算法。该算法首先运用结合互信息和左右熵分词算法对标题、用户评论进行分词,产生新词;再运用TF-IDF算法抽取评论关键词、标题关键词,根据关键词的位置因子、词性因子、词长因子加以不同的特征权重,避免忽视标题和评论的不同重要性,提高结果精度;利用余弦相似度对两者的关键词进行相似度的比较,从而确定该评论的质量。实验结果表明:从互信息、左右熵、词语的位置,词性和词长几个方面考虑,可以提高提取关键词的效率,可以有效地筛选重要评论,为挑选重要客户提供了条件。
Aiming at the limitations of traditional word segmentation algorithms,traditional keyword extraction algorithms for customer-centric analysis of customer reviews and extraction of important customer needs in modern marketing activities,a fusion information entropy and multi-weight TF-IDF keyword extraction algorithm is proposed.The algorithm first uses the combination of mutual information and the left and right entropy word segmentation algorithm to segment the title and user comments to generate new words.Then the TF-IDF algorithm is used to extract the review keywords and title keywords based on the keyword's position factor and classification factor,which is added with different feature weights to avoid ignoring the different importance of the title and the comment and to improve the accuracy of the result.The cosine similarity is used to compare the similarity of the keywords of the two keywords to determine the quality of the comment.The experimental results show that considering the mutual information,left and right entropy,word position,part of speech and word length,the efficiency of extracting keywords can be improved,and important comments can be effectively screened,making it easier to select important customers.
作者
李璐
何利力
LI Lu;HE Lili(School of Information Science and Technology,Zhejiang Sci-Tech University,Hangzhou 310018,China)
出处
《智能计算机与应用》
2020年第9期69-72,76,共5页
Intelligent Computer and Applications
基金
国家重点研发计划(2018YFB1700702)。
关键词
TF-IDF算法
特征权重
互信息
左右熵
余弦相似度
TF-IDF algorithm
feature weight
mutual information
left and right entropy
cosine similarity