专家发现是实体检索的一个重要方面。经典的专家发现模型建立在专家与词项的条件独立性假设基础上。在实际应用中该假设通常不成立,使得专家发现的效果不够理想。本文提出了一种基于话题模型的专家发现方法,该方法无需依赖候选专家与词...专家发现是实体检索的一个重要方面。经典的专家发现模型建立在专家与词项的条件独立性假设基础上。在实际应用中该假设通常不成立,使得专家发现的效果不够理想。本文提出了一种基于话题模型的专家发现方法,该方法无需依赖候选专家与词项的条件独立性假设,且其可操作性比经典模型更强。同时,使用了一种排序截断技术,该技术极大地降低了模型的计算复杂度。使用CERC(CSIRO Enterprise Research Collection)数据集对模型的性能进行评估。实验结果表明,基于话题模型的专家发现方法在各个评价指标上均优于经典的专家发现模型,能够有效地提高专家发现的效能。展开更多
Most entity ranking research aims to retrieve a ranked list of entities from a Web corpus given a user query. The rank order of entities is determined by the relevance between the query and contexts of entities. Howev...Most entity ranking research aims to retrieve a ranked list of entities from a Web corpus given a user query. The rank order of entities is determined by the relevance between the query and contexts of entities. However, entities can be ranked directly based on their relative importance in a document collection, independent of any queries. In this paper, we introduce an entity ranking algorithm named NERank+. Given a document collection, NERank+ first constructs a graph model called Topical Tripartite Graph, consisting of document, topic and entity nodes. We design separate ranking functions to compute the prior ranks of entities and topics, respectively. A meta-path constrained random walk algorithm is proposed to propagate prior entity and topic ranks based on the graph model. We evaluate NERank+ over real-life datasets and compare it with baselines. Experimental results illustrate the effectiveness of our approach.展开更多
文档中的关键实体可以抽象概括文本所描述的事件(或话题)的主体,推动面向实体的检索和问答系统等方面的研究.然而,文档中的实体是无序的,对文本中的实体进行排序显得尤为重要.提取文本实体特征并借助维基百科和词汇分布表示引入外部特征...文档中的关键实体可以抽象概括文本所描述的事件(或话题)的主体,推动面向实体的检索和问答系统等方面的研究.然而,文档中的实体是无序的,对文本中的实体进行排序显得尤为重要.提取文本实体特征并借助维基百科和词汇分布表示引入外部特征,提出了一种基于前向分步算法(Forward Stagewise Algorithm,FSAM)的排序模型LA-FSAM(FSAM based on AUC Metric and Logistic Function).该模型利用曲线下面积(Area Under the Curve,AUC)准则构造损失函数,逻辑斯谛函数整合实体特征,最后使用随机梯度下降法求解模型参数.通过LA-FSAM与基线方法的实验对比证明了所提方法的有效性.展开更多
文摘专家发现是实体检索的一个重要方面。经典的专家发现模型建立在专家与词项的条件独立性假设基础上。在实际应用中该假设通常不成立,使得专家发现的效果不够理想。本文提出了一种基于话题模型的专家发现方法,该方法无需依赖候选专家与词项的条件独立性假设,且其可操作性比经典模型更强。同时,使用了一种排序截断技术,该技术极大地降低了模型的计算复杂度。使用CERC(CSIRO Enterprise Research Collection)数据集对模型的性能进行评估。实验结果表明,基于话题模型的专家发现方法在各个评价指标上均优于经典的专家发现模型,能够有效地提高专家发现的效能。
文摘Most entity ranking research aims to retrieve a ranked list of entities from a Web corpus given a user query. The rank order of entities is determined by the relevance between the query and contexts of entities. However, entities can be ranked directly based on their relative importance in a document collection, independent of any queries. In this paper, we introduce an entity ranking algorithm named NERank+. Given a document collection, NERank+ first constructs a graph model called Topical Tripartite Graph, consisting of document, topic and entity nodes. We design separate ranking functions to compute the prior ranks of entities and topics, respectively. A meta-path constrained random walk algorithm is proposed to propagate prior entity and topic ranks based on the graph model. We evaluate NERank+ over real-life datasets and compare it with baselines. Experimental results illustrate the effectiveness of our approach.
文摘文档中的关键实体可以抽象概括文本所描述的事件(或话题)的主体,推动面向实体的检索和问答系统等方面的研究.然而,文档中的实体是无序的,对文本中的实体进行排序显得尤为重要.提取文本实体特征并借助维基百科和词汇分布表示引入外部特征,提出了一种基于前向分步算法(Forward Stagewise Algorithm,FSAM)的排序模型LA-FSAM(FSAM based on AUC Metric and Logistic Function).该模型利用曲线下面积(Area Under the Curve,AUC)准则构造损失函数,逻辑斯谛函数整合实体特征,最后使用随机梯度下降法求解模型参数.通过LA-FSAM与基线方法的实验对比证明了所提方法的有效性.