Ranking is a main research issue in IR-styled keyword search over a set of documents. In this paper, we study a new keyword search problem, called context-sensitive document ranking, which is to rank documents with an...Ranking is a main research issue in IR-styled keyword search over a set of documents. In this paper, we study a new keyword search problem, called context-sensitive document ranking, which is to rank documents with an additional context that provides additional information about the application domain where the documents are to be searched and ranked. The work is motivated by the fact that additional information associated with the documents can possibly assist users to find more relevant documents when they are unable to find the needed documents from the documents alone. In this paper, a context is a multi-attribute graph, which can represent any information maintained in a relational database, where multi-attribute nodes represent tuples, and edges represent primary key and foreign key references among nodes. The context-sensitive ranking is related to several research issues, how to score documents, how to evaluate the additional information obtained in the context that may contribute to the document ranking, how to rank the documents by combining the scores/costs from the documents and the context. More importantly, the relationships between documents and the information stored in a relational database may be uncertain, because they are from different data sources and the relationships are determined systematically using similarity match which causes uncertainty. In this paper, we concentrate ourselves on these research issues, and provide our solution on how to rank the documents in a context where there exist uncertainty between the documents and the context. We confirm the effectiveness of our approaches by conducting extensive experimental studies using real datasets. We present our findings in this paper.展开更多
Document ranking is one of the most studied but challenging problems in information retrieval(IR).More and more studies have begun to address this problem from fine-grained document modeling.However,most of them focus...Document ranking is one of the most studied but challenging problems in information retrieval(IR).More and more studies have begun to address this problem from fine-grained document modeling.However,most of them focus on context-independent passage-level relevance signals and ignore the context information.In this paper,we investigate how information gain accumulates with passages and propose the context-aware Passage Cumulative Gain(PCG).The fine-grained PCG avoids the need to split documents into independent passages.We investigate PCG patterns at the document level(DPCG)and the query level(QPCG).Based on the patterns,we propose a BERT-based sequential model called Passage-level Cumulative Gain Model(PCGM)and show that PCGM can effectively predict PCG sequences.Finally,we apply PCGM to the document ranking task using two approaches.The first one is leveraging DPCG sequences to estimate the gain of an individual document.Experimental results on two public ad hoc retrieval datasets show that PCGM outperforms most existing ranking models.The second one considers the cross-document effects and leverages QPCG sequences to estimate the marginal relevance.Experimental results show that predicted results are highly consistent with users'preferences.We believe that this work contributes to improving ranking performance and providing more explainability for document ranking.展开更多
基金supported by the Research Grants Council of the Hong Kong SAR,China,under Grant Nos. 419008 and 419109
文摘Ranking is a main research issue in IR-styled keyword search over a set of documents. In this paper, we study a new keyword search problem, called context-sensitive document ranking, which is to rank documents with an additional context that provides additional information about the application domain where the documents are to be searched and ranked. The work is motivated by the fact that additional information associated with the documents can possibly assist users to find more relevant documents when they are unable to find the needed documents from the documents alone. In this paper, a context is a multi-attribute graph, which can represent any information maintained in a relational database, where multi-attribute nodes represent tuples, and edges represent primary key and foreign key references among nodes. The context-sensitive ranking is related to several research issues, how to score documents, how to evaluate the additional information obtained in the context that may contribute to the document ranking, how to rank the documents by combining the scores/costs from the documents and the context. More importantly, the relationships between documents and the information stored in a relational database may be uncertain, because they are from different data sources and the relationships are determined systematically using similarity match which causes uncertainty. In this paper, we concentrate ourselves on these research issues, and provide our solution on how to rank the documents in a context where there exist uncertainty between the documents and the context. We confirm the effectiveness of our approaches by conducting extensive experimental studies using real datasets. We present our findings in this paper.
基金This work was supported by the National Natural Science Foundation of China under Grant No.61732008 and Tsinghua University Guoqiang Research Institute.
文摘Document ranking is one of the most studied but challenging problems in information retrieval(IR).More and more studies have begun to address this problem from fine-grained document modeling.However,most of them focus on context-independent passage-level relevance signals and ignore the context information.In this paper,we investigate how information gain accumulates with passages and propose the context-aware Passage Cumulative Gain(PCG).The fine-grained PCG avoids the need to split documents into independent passages.We investigate PCG patterns at the document level(DPCG)and the query level(QPCG).Based on the patterns,we propose a BERT-based sequential model called Passage-level Cumulative Gain Model(PCGM)and show that PCGM can effectively predict PCG sequences.Finally,we apply PCGM to the document ranking task using two approaches.The first one is leveraging DPCG sequences to estimate the gain of an individual document.Experimental results on two public ad hoc retrieval datasets show that PCGM outperforms most existing ranking models.The second one considers the cross-document effects and leverages QPCG sequences to estimate the marginal relevance.Experimental results show that predicted results are highly consistent with users'preferences.We believe that this work contributes to improving ranking performance and providing more explainability for document ranking.