【目的】对国内外语义新颖性研究相关进展进行归纳整理,总结相关技术,为后续研究提供参考。【文献范围】利用“novelty of the literature”“semantic novelty”“文献新颖性”等关键词及“语义新颖性and文献评价”等检索式在Web of Sc...【目的】对国内外语义新颖性研究相关进展进行归纳整理,总结相关技术,为后续研究提供参考。【文献范围】利用“novelty of the literature”“semantic novelty”“文献新颖性”等关键词及“语义新颖性and文献评价”等检索式在Web of Science、Elsevier、Springer、谷歌学术及中国知网、万方、维普等数据库中进行文献检索,经过阅读整理并对具有代表性的相关理论进行溯源,最终筛选出70篇文献进行评述。【方法】对国内外语义新颖性相关研究进行梳理,围绕新颖性定义、新颖性评价指标和不同评价方法等分析科技文献语义新颖性评价的发展现状及未来趋势。【结果】语义新颖性评价逐渐受到学界的广泛关注,已有相关研究对语义内容进行挖掘评价,但尚未形成统一的度量指标。【局限】现有的文献新颖性多从外部特征进行评价,直接以语义新颖性为主题的研究文献数量较少,在支撑综述方面存在局限性。【结论】科技文献的语义新颖性评价根本在于语义内容的新颖性,定量研究已成为主流研究方法,但评价指标的计算方式尚需明确,未来的新颖性评价发展方向应结合定性与定量方法全面分析,实现科学、合理的综合学术评价。展开更多
Novelty detection is to retrieve new information and filter redundancy fromgiven sentences that are relevant to a specific topic. In TREC2003, the authors tried an approach tonovelty detection with semantic distance c...Novelty detection is to retrieve new information and filter redundancy fromgiven sentences that are relevant to a specific topic. In TREC2003, the authors tried an approach tonovelty detection with semantic distance computation. The motivation is to expand a sentence byintroducing semantic information. Computation on semantic distance between sentences incorporatesWordNet with statistical information. The novelty detection is treated as a binary classificationproblem: new sentence or not. The feature vector, used in the vector space model for classification,consists of various factors, including the semantic distance from the sentence to the topic and thedistance from the sentence to the previous relevant context occurring before it. New sentences arethen detected with Winnow and support vector machine classifiers, respectively. Several experimentsare conducted to survey the relationship between different factors and performance. It is provedthat semantic computation is promising in novelty detection. The ratio of new sentence size torelevant size is further studied given different relevant document sizes. It is found that the ratioreduced with a certain speed (about 0.86). Then another group of experiments is performedsupervised with the ratio. It is demonstrated that the ratio is helpful to improve the noveltydetection performance.展开更多
文摘【目的】对国内外语义新颖性研究相关进展进行归纳整理,总结相关技术,为后续研究提供参考。【文献范围】利用“novelty of the literature”“semantic novelty”“文献新颖性”等关键词及“语义新颖性and文献评价”等检索式在Web of Science、Elsevier、Springer、谷歌学术及中国知网、万方、维普等数据库中进行文献检索,经过阅读整理并对具有代表性的相关理论进行溯源,最终筛选出70篇文献进行评述。【方法】对国内外语义新颖性相关研究进行梳理,围绕新颖性定义、新颖性评价指标和不同评价方法等分析科技文献语义新颖性评价的发展现状及未来趋势。【结果】语义新颖性评价逐渐受到学界的广泛关注,已有相关研究对语义内容进行挖掘评价,但尚未形成统一的度量指标。【局限】现有的文献新颖性多从外部特征进行评价,直接以语义新颖性为主题的研究文献数量较少,在支撑综述方面存在局限性。【结论】科技文献的语义新颖性评价根本在于语义内容的新颖性,定量研究已成为主流研究方法,但评价指标的计算方式尚需明确,未来的新颖性评价发展方向应结合定性与定量方法全面分析,实现科学、合理的综合学术评价。
文摘Novelty detection is to retrieve new information and filter redundancy fromgiven sentences that are relevant to a specific topic. In TREC2003, the authors tried an approach tonovelty detection with semantic distance computation. The motivation is to expand a sentence byintroducing semantic information. Computation on semantic distance between sentences incorporatesWordNet with statistical information. The novelty detection is treated as a binary classificationproblem: new sentence or not. The feature vector, used in the vector space model for classification,consists of various factors, including the semantic distance from the sentence to the topic and thedistance from the sentence to the previous relevant context occurring before it. New sentences arethen detected with Winnow and support vector machine classifiers, respectively. Several experimentsare conducted to survey the relationship between different factors and performance. It is provedthat semantic computation is promising in novelty detection. The ratio of new sentence size torelevant size is further studied given different relevant document sizes. It is found that the ratioreduced with a certain speed (about 0.86). Then another group of experiments is performedsupervised with the ratio. It is demonstrated that the ratio is helpful to improve the noveltydetection performance.