期刊文献+

基于Document Triage的TF-IDF算法的改进 被引量:14

Improvement of term frequency-inverse document frequency algorithm based on Document Triage
下载PDF
导出
摘要 针对TF-IDF算法在加权时没有考虑特征词本身在文档中重要度的问题,提出利用用户阅读时的阅读行为来改进TF-IDF。将Document Triage引入到TF-IDF中,利用IPM收集用户阅读中行为的相关信息,计算文档评分。由于用户的标注内容往往是文章的重要内容,或者反映了用户的兴趣。因此,赋予用户标注词项更大的权重,将文档评分和用户的标注信息等作为因子引入到TF-IDF中,设计出改进的加权算法DT-TF-IDF。实验结果表明,相对传统TF-IDF算法,DT-TF-IDF的查全率、查准率,以及查准率和查全率的调和均值都有了一定的提高。DT-TF-IDF算法比传统TF-IDF算法更加有效,提高了文本相似度计算的准确性。 The Term Frequency-Inverse Document Frequency( TF-IDF) algorithm does not consider the importance of index items themselves in the document when computing the weights of index terms. In order to solve the problem, the users'behaviors when reading were utilized to improve the efficiency of TF-IDF. By introducing Document Triage to TF-IDF, the Interest Profile Manager( IPM) was used to collect data about users' reading behaviors, and then the document scores were computed. Since the users' annotation was quite important in the aimed text, or reflected the users' interest. The improved term weighting algorithm named Document Triage-Term Frequency-Inverse Document Frequency( DT-TF-IDF) was proposed by introducing document scores and users ' annotation to TF-IDF and giving a greater weight to annotated term. The experimental results show that the recall, the precision and their harmonic mean of DT-TF-IDF are all higher than those of the traditional TF-IDF algorithm. The proposed DT-TF-IDF algorithm is more effective than TF-IDF and has improved the accuracy of the text similarity calculation.
出处 《计算机应用》 CSCD 北大核心 2015年第12期3506-3510,3514,共6页 journal of Computer Applications
关键词 TF-IDF DOCUMENT TRIAGE 标引 加权 Term Frequency-Inverse Document Frequency(TF-IDF) Document Triage annotation weighting
  • 相关文献

参考文献13

  • 1韩如冰,叶得学.基于VSM的权重改进文档相似度算法研究[J].软件,2012,33(10):103-105. 被引量:9
  • 2SALTON G. The SMART retrieval system: experiments in automatic document processing [ M]. Upper Saddle River: Prentice Hall, 1971:45-62. 被引量:1
  • 3台德艺,王俊.文本分类特征权重改进算法[J].计算机工程,2010,36(9):197-199. 被引量:26
  • 4苏力华,朱章华,白文华,.基于向量空间模型的文本分类特征权重算法研究[J].电脑知识与技术(过刊),2010,0(33):9327-9329. 被引量:4
  • 5BADI R, BAE S, MOORE J M, et al. Recognizing user interest and document value from reading and organizing activities in document triage [ C]//Proceedings of the 11 th International Conference on In- telligent User Interfaces. New York: ACM, 2006:218-225. 被引量:1
  • 6SHIPMAN F, PRICE M, MARSHALL C C. Identifying useful pas- sages in documents based on annotation patterns [ C]//Proccedings of the 7th European Conference on Research and Advanced Technol- ogy for Digital Libraries, LNCS 2769. Berlin: Springer, 2013:101 - 112. 被引量:1
  • 7SU X, KHOSHGOFTAAR T M. A survey of collaborative filtering techniques [J]. Advances in Artificial Intelligence, 2009, 2009: Article No. 4. 被引量:1
  • 8ZHOU Z, JAYARATHNA S, PATRA A, et al. IPM-G: enabling collaborative filtering using multi-application interest models [ C]// Proceedings of the 2014 9th International Conference on Semantics, Knowledge and Grids. Piscataway: IEEE, 2014: 141- 144. 被引量:1
  • 9MARSHALL K, WANG S. Annotation persistence over dynamic documents [ D]. Boston: Massachusetts Institute of Technolo, 2009:19-43. 被引量:1
  • 10OVSIANNIKOV I A, ARBIB M A, MCHE1LL T H. Annotation technology [ J]. International Journal of Human-Computer Studies, 2010, 24(5): 329 -362. 被引量:1

二级参考文献37

共引文献81

同被引文献105

引证文献14

二级引证文献97

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部