摘要
针对信息检索中查询主题漂移和词不匹配问题,本文给出项集有效性计算方法及其剪枝策略,构建一种关联模式评价框架ACSC(Availability_Chis-Square_Confidence)和基于ACSC挖掘的规则混合扩展模型,提出一种融合加权关联模式挖掘与规则混合扩展模型的跨语言信息检索算法.该算法通过项集权值比较从跨语言初检相关文档集挖掘含有原查询词项的频繁项集,利用基于有效性的剪枝方法对项集进行剪枝得到有效频繁项集,从有效频繁项集挖掘加权关联规则,根据规则混合扩展模型实现查询扩展,扩展词与原查询词组合为新查询再次检索文档得到最终检索结果.与现有跨语言检索算法比较,实验结果表明,本文算法能有效地减少查询漂移和词不匹配问题,提高和改善跨语言信息检索性能,有效性和置信度可使本文算法分别获得最优的检索结果 R-prec和P@10值.
In order to solve the problem of query topic drift and word mismatch in information retrieval,a computing method of availability and pruning strategies for itemsets are first given in this paper. And then,an evaluation framework of association patterns,ACSC( Availability_Chis-Square_Confidence),and a Rule Hybrid Expansion Model( RHEM) based on ACSC mining are constructed. Finally,an algorithm of Cross Language Information Retrieval( CLIR) is proposed based on weighted patterns mining by dint of ACSC and RHEM. This algorithm mines frequent itemsets containing the original query terms from the collection of relevance documents which come from initial retrieval results by comparing itemsets weight,and prune the itemsets using the pruning method based on the availability with the aim of obtaining Effective Frequent Itemsets( EFI). The weighted association rules are mined from the EFI,and query expansion is realized according to the RHEM. The expansion terms are combined with the original query terms to form a new query which retrieves the documents again and the final cross language retrieval results are achieved. Compared with the existing CLIR algorithms,the experimental results show that the proposed algorithm can effectively reduce the problem of query drift and word mismatch in cross language retrieval,and improve the performance of CLIR. Furthermore,the availability and confidence can make the algorithm obtain the optimal retrieval results: R-prec and P@ 10,respectively.
作者
黄名选
夏国恩
高荣
蒋曹清
HUANG Ming-xuan;XIA Guo-en;GAO Rong;JIANG Cao-qing(Guangxi(ASEAN)Financial Research Center,Guangxi University of Finance and Economics,Nanning 530003,China;Guangxi Key Laboratory of Cross-border E-commerce Intelligent Information Processing,Guangxi University of Finance and Economics,Nanning 530003,China;School of Information and Statistics,Guangxi University of Finance and Economics,Nanning 530003,China;School of Business Administration,Guangxi University of Finance and Economics,Nanning 530003,China)
出处
《小型微型计算机系统》
CSCD
北大核心
2019年第9期2013-2020,共8页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61762006,71862003,61662003)资助
广西应用经济学一流学科(培育)开放性课题项目(2018MA07)资助
广西(东盟)财经研究中心开放性课题项目(2018DMCJYB08)资助
广西自然科学基金项目(2015GXNSFAA139310)资助
关键词
信息检索
跨语言检索
文本挖掘
查询扩展
自然语言处理
information retrieval
cross language retrieval
text mining
query expansion
natural language processing