Author name disambiguation(AND)is a central task in academic search,which has received more attention recently accompanied by the increase of authors and academic publications.To tackle the AND problem,existing studie...Author name disambiguation(AND)is a central task in academic search,which has received more attention recently accompanied by the increase of authors and academic publications.To tackle the AND problem,existing studies have proposed various approaches based on different types of information,such as raw document features(e.g.,co-authors,titles,and keywords),the fusion feature(e.g.,a hybrid publication embedding based on multiple raw document features),the local structural information(e.g.,a publication's neighborhood information on a graph),and the global structural information(e.g.,interactive information between a node and others on a graph).However,there has been no work taking all the above-mentioned information into account and taking full advantage of the contributions of each raw document feature for the AND problem so far.To fill the gap,we propose a novel framework named EAND(Towards Effective Author Name Disambiguation by Hybrid Attention).Specifically,we design a novel feature extraction model,which consists of three hybrid attention mechanism layers,to extract key information from the global structural information and the local structural information that are generated from six similarity graphs constructed based on different similarity coefficients,raw document features,and the fusion feature.Each hybrid attention mechanism layer contains three key modules:a local structural perception,a global structural perception,and a feature extractor.Additionally,the mean absolute error function in the joint loss function is used to introduce the structural information loss of the vector space.Experimental results on two real-world datasets demonstrate that EAND achieves superior performance,outperforming state-of-the-art methods by at least+2.74%in terms of the micro-F1 score and+3.31%in terms of the macro-F1 score.展开更多
Purpose: The ability to identify the scholarship of individual authors is essential for performance evaluation. A number of factors hinder this endeavor. Common and similarly spelled surnames make it difficult to isol...Purpose: The ability to identify the scholarship of individual authors is essential for performance evaluation. A number of factors hinder this endeavor. Common and similarly spelled surnames make it difficult to isolate the scholarship of individual authors indexed on large databases. Variations in name spelling of individual scholars further complicates matters. Common family names in scientific powerhouses like China make it problematic to distinguish between authors possessing ubiquitous and/or anglicized surnames(as well as the same or similar first names). The assignment of unique author identifiers provides a major step toward resolving these difficulties. We maintain, however, that in and of themselves, author identifiers are not sufficient to fully address the author uncertainty problem. In this study we build on the author identifier approach by considering commonalities in fielded data between authors containing the same surname and first initial of their first name. We illustrate our approach using three case studies.Design/methodology/approach: The approach we advance in this study is based on commonalities among fielded data in search results. We cast a broad initial net—i.e., a Web of Science(WOS) search for a given author's last name, followed by a comma, followed by the first initial of his or her first name(e.g., a search for ‘John Doe' would assume the form: ‘Doe, J'). Results for this search typically contain all of the scholarship legitimately belonging to this author in the given database(i.e., all of his or her true positives), along with a large amount of noise, or scholarship not belonging to this author(i.e., a large number of false positives). From this corpus we proceed to iteratively weed out false positives and retain true positives. Author identifiers provide a good starting point—e.g., if ‘Doe, J' and ‘Doe, John' share the same author identifier, this would be sufficient for us to conclude these are one and the same individual. We find email addresses similarly adequate展开更多
基金supported by the Major Program of the Natural Science Foundation of Jiangsu Higher Education Institutions of China under Grant Nos.19KJA610002 and 19KJB520050the National Natural Science Foundation of China under Grant No.61902270.
文摘Author name disambiguation(AND)is a central task in academic search,which has received more attention recently accompanied by the increase of authors and academic publications.To tackle the AND problem,existing studies have proposed various approaches based on different types of information,such as raw document features(e.g.,co-authors,titles,and keywords),the fusion feature(e.g.,a hybrid publication embedding based on multiple raw document features),the local structural information(e.g.,a publication's neighborhood information on a graph),and the global structural information(e.g.,interactive information between a node and others on a graph).However,there has been no work taking all the above-mentioned information into account and taking full advantage of the contributions of each raw document feature for the AND problem so far.To fill the gap,we propose a novel framework named EAND(Towards Effective Author Name Disambiguation by Hybrid Attention).Specifically,we design a novel feature extraction model,which consists of three hybrid attention mechanism layers,to extract key information from the global structural information and the local structural information that are generated from six similarity graphs constructed based on different similarity coefficients,raw document features,and the fusion feature.Each hybrid attention mechanism layer contains three key modules:a local structural perception,a global structural perception,and a feature extractor.Additionally,the mean absolute error function in the joint loss function is used to introduce the structural information loss of the vector space.Experimental results on two real-world datasets demonstrate that EAND achieves superior performance,outperforming state-of-the-art methods by at least+2.74%in terms of the micro-F1 score and+3.31%in terms of the macro-F1 score.
基金support from the US National Science Foundation under Award 1645237
文摘Purpose: The ability to identify the scholarship of individual authors is essential for performance evaluation. A number of factors hinder this endeavor. Common and similarly spelled surnames make it difficult to isolate the scholarship of individual authors indexed on large databases. Variations in name spelling of individual scholars further complicates matters. Common family names in scientific powerhouses like China make it problematic to distinguish between authors possessing ubiquitous and/or anglicized surnames(as well as the same or similar first names). The assignment of unique author identifiers provides a major step toward resolving these difficulties. We maintain, however, that in and of themselves, author identifiers are not sufficient to fully address the author uncertainty problem. In this study we build on the author identifier approach by considering commonalities in fielded data between authors containing the same surname and first initial of their first name. We illustrate our approach using three case studies.Design/methodology/approach: The approach we advance in this study is based on commonalities among fielded data in search results. We cast a broad initial net—i.e., a Web of Science(WOS) search for a given author's last name, followed by a comma, followed by the first initial of his or her first name(e.g., a search for ‘John Doe' would assume the form: ‘Doe, J'). Results for this search typically contain all of the scholarship legitimately belonging to this author in the given database(i.e., all of his or her true positives), along with a large amount of noise, or scholarship not belonging to this author(i.e., a large number of false positives). From this corpus we proceed to iteratively weed out false positives and retain true positives. Author identifiers provide a good starting point—e.g., if ‘Doe, J' and ‘Doe, John' share the same author identifier, this would be sufficient for us to conclude these are one and the same individual. We find email addresses similarly adequate