期刊文献+

基于查询日志分析的中文网页关键词抽取方法 被引量:1

Chinese Page Keyword Extraction Method Based on Query Log Analysis
下载PDF
导出
摘要 以全文索引为基础的网页搜索引擎检索相关度偏低。针对这一问题,本文提出了一种基于查询日志分析的中文网页关键词抽取方法。该方法利用用户对网页与查询词的相关性判断来选择关键词。为了量化用户的相关性判断,提出了单位篇幅停留时间、逆向点击率、排名补偿因子3个指标,并对其进行综合加权。在查询串分词、同义词识别及多义词消歧、关键短语组配方面,也做了特殊处理。实验结果表明:抽取关键词的准确率较高,综合性能也高于TF.IDF和SVM方法。该方法能得到较满意的关键词抽取效果。 The webpage search engine based on the full-text index provides low correlation. To solve this problem, this paper proposes a keyword extraction method for Chinese pages based on query log analysis. The method selects keywords according to users' judgment of relevance on the page and query words. In order to quantify the relevance judgment, three indexes, such as residence time per unit length, inverted click rate and rank compensation factor, are proposed of which are then comprehensively weighted. In this paper, these processes, such as query string segmentation, synonym recognition, polysemy disambiguation, keyphrase matching, are specially treated. The experiment results show that the precision rate is high, and the comprehensive performance is better than that of the TF.IDF method and the SVM method. The proposed method can obtain satisfactory effect of the keyword extraction.
出处 《广西师范大学学报(自然科学版)》 CAS 北大核心 2015年第2期42-48,共7页 Journal of Guangxi Normal University:Natural Science Edition
基金 国家社会科学基金资助项目(14CJL001)
关键词 查询日志 关键词抽取 关键短语组配 同义词识别 多义词消歧 query log keyword extraction keyphrase matching synonym recognition polysemy disambiguation
  • 相关文献

参考文献27

  • 1MATSUO Y, ISHIZUMA M. Keyword extraction from a single document using word co-occurrence statistical Information[J].International Journal on Artificial Intelligence Tools, 2004, 13(1):157-169. 被引量:1
  • 2CHIEN Lee-feng. PAT-tree-based keyword extraction for Chinese information retrieval[C] //Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York : ACM Press, 1997:50-58. 被引量:1
  • 3JIAO Hui, LIU Qian, JIA Hui-bo. Chinese keyword extraction based on N-gram and word co-occurrence[C]// Proceedings of the International Conference on Computational Intelligence and Security Workshops. Los Alamitors, CA: IEEE Computer Society, 2007 : 152-155. 被引量:1
  • 4PLANTA E, TONELLI S.KX: a flexible system for keyphrase extraction [C]// Proceedings of the 5th International Workshop on Semantic Evaluation. Stroudsburg, PA:Association for Computational Linguistics, 2010:170-173. 被引量:1
  • 5BEREND G, FARKAS R. SZTERGAK.. feature engineering for keyphrase extraction [C]// Proceedings of the 5th International Workshop on Semantic Evaluation. Stroudsburg, PA: Association for Computational Linguistics, 2010 : 186-189. 被引量:1
  • 6ZERVANOU K.UvT..the UvT term extraction system in the keyphrase extraction task[C] //Proceedings of the 5th International Workshop on Semantic Evaluation.Stroudsburg, PA: Association for Computational Linguistics, 2010: 194-197. 被引量:1
  • 7章成志.自动标引研究的回顾与展望[J].现代图书情报技术,2007(11):33-39. 被引量:39
  • 8BLEI D M, NG A Y, JORDAN M I. Latent dirichlet allocation[J].Journal of Machine Learning Research, 2003,3 (Jan) :993-1022. 被引量:1
  • 9PASQUIER C. Task 5:single document keyphrase extraction using sentence clustering and latent dirichlet allocation [C]// Proceedings of the 5th International Workshop on Semantic Evaluation. Stroudsburg, PA.. Association for Computational Linguistics, 2010: 154-157. 被引量:1
  • 10ERCAN G,CICEMLI I.Using lexieal chains for keyword extraction[J].Information Processing & Management, 2007, 43(6) : 1705-1714. 被引量:1

二级参考文献81

共引文献255

同被引文献6

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部