In Chinese question answering system, because there is more semantic relation in questions than that in query words, the precision can be improved by expanding query while using natural language questions to retrieve ...In Chinese question answering system, because there is more semantic relation in questions than that in query words, the precision can be improved by expanding query while using natural language questions to retrieve documents. This paper proposes a new approach to query expansion based on semantics and statistics Firstly automatic relevance feedback method is used to generate a candidate expansion word set. Then the expanded query words are selected from the set based on the semantic similarity and seman- tic relevancy between the candidate words and the original words. Experiments show the new approach is effective for Web retrieval and out-performs the conventional expansion approaches.展开更多
Because of users' growing utilization of unclear and imprecise keywords when characterizing their informa- tion need, it has become necessary to expand their original search queries with additional words that best ca...Because of users' growing utilization of unclear and imprecise keywords when characterizing their informa- tion need, it has become necessary to expand their original search queries with additional words that best capture their actual intent. The selection of the terms that are suitable for use as additional words is in general dependent on the degree of relatedness between each candidate expansion term and the query keywords. In this paper, we propose two criteria for evaluating the degree of relatedness between a candidate expansion word and the query keywords: (1) co-occurrence frequency, where more importance is attributed to terms oc- curring in the largest possible number of documents where the query keywords appear; (2) proximity, where more im- portance is assigned to terms having a short distance from the query terms within documents. We also employ the strength Pareto fitness assignment in order to satisfy both criteria si- multaneously. The results of our numerical experiments on MEDLINE, the online medical information database, show that the proposed approach significantly enhances the re- trieval performance as compared to the baseline.展开更多
本文探讨加权关联模式挖掘在越英跨语言查询扩展中的应用。首先提出面向跨语言查询扩展的基于支持度-CPIR(Conditional Probability Increment Ratio)-兴趣度评价框架的加权关联模式挖掘算法(WARM-SCPIRICLQE)以及越英跨语言查询扩展模...本文探讨加权关联模式挖掘在越英跨语言查询扩展中的应用。首先提出面向跨语言查询扩展的基于支持度-CPIR(Conditional Probability Increment Ratio)-兴趣度评价框架的加权关联模式挖掘算法(WARM-SCPIRICLQE)以及越英跨语言查询扩展模型,然后提出基于词间加权关联模式挖掘的越英跨语言用户相关反馈查询扩展算法。该算法将越南语查询通过机器翻译系统翻译为英文并检索英文文档,提取前列初检文档进行用户相关性判断得到初检相关文档集,采用WARM-SCPIRI-CLQE算法对该文档集挖掘加权关联规则,从规则中提取与原查询相关的扩展词实现越英跨语言查询译后扩展。以NTCIR-5 CLIR为实验语料,将本文算法与现有算法进行实验比较,实验结果表明,本文算法能提高和改善越英跨语言信息检索性能,对长查询更有效。展开更多
基金the Specialized Research Program Fundthe Doctoral Program of Higher Education of China (20050007023)the Natural Science Foundation of Shandong Province(Y2004G04)
文摘In Chinese question answering system, because there is more semantic relation in questions than that in query words, the precision can be improved by expanding query while using natural language questions to retrieve documents. This paper proposes a new approach to query expansion based on semantics and statistics Firstly automatic relevance feedback method is used to generate a candidate expansion word set. Then the expanded query words are selected from the set based on the semantic similarity and seman- tic relevancy between the candidate words and the original words. Experiments show the new approach is effective for Web retrieval and out-performs the conventional expansion approaches.
文摘Because of users' growing utilization of unclear and imprecise keywords when characterizing their informa- tion need, it has become necessary to expand their original search queries with additional words that best capture their actual intent. The selection of the terms that are suitable for use as additional words is in general dependent on the degree of relatedness between each candidate expansion term and the query keywords. In this paper, we propose two criteria for evaluating the degree of relatedness between a candidate expansion word and the query keywords: (1) co-occurrence frequency, where more importance is attributed to terms oc- curring in the largest possible number of documents where the query keywords appear; (2) proximity, where more im- portance is assigned to terms having a short distance from the query terms within documents. We also employ the strength Pareto fitness assignment in order to satisfy both criteria si- multaneously. The results of our numerical experiments on MEDLINE, the online medical information database, show that the proposed approach significantly enhances the re- trieval performance as compared to the baseline.
文摘本文探讨加权关联模式挖掘在越英跨语言查询扩展中的应用。首先提出面向跨语言查询扩展的基于支持度-CPIR(Conditional Probability Increment Ratio)-兴趣度评价框架的加权关联模式挖掘算法(WARM-SCPIRICLQE)以及越英跨语言查询扩展模型,然后提出基于词间加权关联模式挖掘的越英跨语言用户相关反馈查询扩展算法。该算法将越南语查询通过机器翻译系统翻译为英文并检索英文文档,提取前列初检文档进行用户相关性判断得到初检相关文档集,采用WARM-SCPIRI-CLQE算法对该文档集挖掘加权关联规则,从规则中提取与原查询相关的扩展词实现越英跨语言查询译后扩展。以NTCIR-5 CLIR为实验语料,将本文算法与现有算法进行实验比较,实验结果表明,本文算法能提高和改善越英跨语言信息检索性能,对长查询更有效。