期刊文献+

融合加权模式挖掘与规则混合扩展的跨语言检索 被引量:11

Cross Language Retrieval Based on Weighted Patterns Mining and Rule Hybrid Expansion
下载PDF
导出
摘要 针对信息检索中查询主题漂移和词不匹配问题,本文给出项集有效性计算方法及其剪枝策略,构建一种关联模式评价框架ACSC(Availability_Chis-Square_Confidence)和基于ACSC挖掘的规则混合扩展模型,提出一种融合加权关联模式挖掘与规则混合扩展模型的跨语言信息检索算法.该算法通过项集权值比较从跨语言初检相关文档集挖掘含有原查询词项的频繁项集,利用基于有效性的剪枝方法对项集进行剪枝得到有效频繁项集,从有效频繁项集挖掘加权关联规则,根据规则混合扩展模型实现查询扩展,扩展词与原查询词组合为新查询再次检索文档得到最终检索结果.与现有跨语言检索算法比较,实验结果表明,本文算法能有效地减少查询漂移和词不匹配问题,提高和改善跨语言信息检索性能,有效性和置信度可使本文算法分别获得最优的检索结果 R-prec和P@10值. In order to solve the problem of query topic drift and word mismatch in information retrieval,a computing method of availability and pruning strategies for itemsets are first given in this paper. And then,an evaluation framework of association patterns,ACSC( Availability_Chis-Square_Confidence),and a Rule Hybrid Expansion Model( RHEM) based on ACSC mining are constructed. Finally,an algorithm of Cross Language Information Retrieval( CLIR) is proposed based on weighted patterns mining by dint of ACSC and RHEM. This algorithm mines frequent itemsets containing the original query terms from the collection of relevance documents which come from initial retrieval results by comparing itemsets weight,and prune the itemsets using the pruning method based on the availability with the aim of obtaining Effective Frequent Itemsets( EFI). The weighted association rules are mined from the EFI,and query expansion is realized according to the RHEM. The expansion terms are combined with the original query terms to form a new query which retrieves the documents again and the final cross language retrieval results are achieved. Compared with the existing CLIR algorithms,the experimental results show that the proposed algorithm can effectively reduce the problem of query drift and word mismatch in cross language retrieval,and improve the performance of CLIR. Furthermore,the availability and confidence can make the algorithm obtain the optimal retrieval results: R-prec and P@ 10,respectively.
作者 黄名选 夏国恩 高荣 蒋曹清 HUANG Ming-xuan;XIA Guo-en;GAO Rong;JIANG Cao-qing(Guangxi(ASEAN)Financial Research Center,Guangxi University of Finance and Economics,Nanning 530003,China;Guangxi Key Laboratory of Cross-border E-commerce Intelligent Information Processing,Guangxi University of Finance and Economics,Nanning 530003,China;School of Information and Statistics,Guangxi University of Finance and Economics,Nanning 530003,China;School of Business Administration,Guangxi University of Finance and Economics,Nanning 530003,China)
出处 《小型微型计算机系统》 CSCD 北大核心 2019年第9期2013-2020,共8页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(61762006,71862003,61662003)资助 广西应用经济学一流学科(培育)开放性课题项目(2018MA07)资助 广西(东盟)财经研究中心开放性课题项目(2018DMCJYB08)资助 广西自然科学基金项目(2015GXNSFAA139310)资助
关键词 信息检索 跨语言检索 文本挖掘 查询扩展 自然语言处理 information retrieval cross language retrieval text mining query expansion natural language processing
  • 相关文献

参考文献12

二级参考文献85

  • 1吴丹.本体驱动的跨语言信息检索研究[J].现代图书情报技术,2006(5):22-26. 被引量:11
  • 2吴丹,王惠临.本体在跨语言信息检索中的应用机制研究[J].图书情报工作,2006,50(9):10-13. 被引量:17
  • 3刘远超,王晓龙,徐志明,刘秉权.基于粗集理论的中文关键词短语构成规则挖掘[J].电子学报,2007,35(2):371-374. 被引量:17
  • 4Gao Jianleng, Nie Jianyun, Zhang Jian, et al. TREC-9 CLIR Experiments[C]//Proc. of the 9th Text Retrieval Evaluation Conference. Gaithersburg, Maryland, USA: [s. n.], 2000. 被引量:1
  • 5Dumais S. Improving the Retrieval of Information from External Sources[J]. Behavior Research Methods Instruments & Computers, 1991, 23(2): 229-236. 被引量:1
  • 6Salton G. The Smart Retrieval System-experiments in Automatic Document Processing[M]. New Jersey, USA: Prentice-Halt Inc., 1971. 被引量:1
  • 7Ruthven I,Lalmas M.A survey on the use of relevance feedback for information access systems[J].The Knowledge Engineering Review,2003,18(2):95-145. 被引量:1
  • 8Harman D.Relevance feedback revisited[C]// Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,1992:1-10. 被引量:1
  • 9Xu J,Croft W B.Query expansion using local and global document analysis[C]//Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,1996:4-11. 被引量:1
  • 10Orengo V M,Huyck C.Relevance feedback and cross-language information retrieval[J].Information Processing & Management,2006,42(5):1203-1217. 被引量:1

共引文献72

同被引文献107

引证文献11

二级引证文献34

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部