Suche

Wo soll gesucht werden?
Erweiterte Literatursuche

Ariadne Pfad:

Inhalt

Literaturnachweis - Detailanzeige

 
Autor/inn/enMihaljević, Helena; Santamaría, Lucía
TitelDisambiguation of author entities in ADS using supervised learning and graph theory methods.
QuelleIn: Scientometrics, (2021) 5, S.3893-3917
PDF als Volltext Verfügbarkeit 
Dokumenttypgedruckt; online; Zeitschriftenaufsatz
ISSN0138-9130
DOI10.1007/s11192-021-03951-w
SchlagwörterAuthor name disambiguation; Record linkage; Supervised learning; Label Propagation; Information retrieval; Digital libraries
AbstractAbstract Disambiguation of authors in digital libraries is essential for many tasks, including efficient bibliographical searches and scientometric analyses to the level of individuals. The question of how to link documents written by the same person has been given much attention by academic publishers and information retrieval researchers alike. Usual approaches rely on publications’ metadata such as affiliations, email addresses, co-authors, or scholarly topics. Lack of homogeneity in the structure of bibliographic collections and discipline-specific dissimilarities between them make the creation of general-purpose disambiguators arduous. We present an algorithm to disambiguate authorships in the Astrophysics Data System (ADS) following an established semi-supervised approach of training a classifier on authorship pairs and clustering the resulting graphs. Due to the lack of high-signal features such as email addresses and citations, we engineer additional content- and location-based features via text embeddings and named-entity recognition. We train various nonlinear tree-based classifiers and detect communities from the resulting weighted graphs through label propagation, a fast yet efficient algorithm that requires no tuning. The resulting procedure reaches reasonable complexity and offers possibilities for interpretation. We apply our method to the creation of author entities in a recent ADS snapshot. The algorithm is evaluated on 39 manually-labeled author blocks comprising 9545 authorships from 562 author profiles. Our best approach utilizes the Random Forest classifier and yields a micro- and macro-averaged BCubed $$\mathrm {F}_1$$ score of 0.95 and 0.87, respectively. We release our code and labeled data publicly to foster the development of further disambiguation procedures for ADS.
Erfasst vonOLC
Update2023/2/05
Literaturbeschaffung und Bestandsnachweise in Bibliotheken prüfen
 

Standortunabhängige Dienste
Bibliotheken, die die Zeitschrift "Scientometrics" besitzen:
Link zur Zeitschriftendatenbank (ZDB)

Artikellieferdienst der deutschen Bibliotheken (subito):
Übernahme der Daten in das subito-Bestellformular

Tipps zum Auffinden elektronischer Volltexte im Video-Tutorial

Trefferlisten Einstellungen

Permalink als QR-Code

Permalink als QR-Code

Inhalt auf sozialen Plattformen teilen (nur vorhanden, wenn Javascript eingeschaltet ist)

Teile diese Seite: