Skip to main content
Figure 3 | EPJ Data Science

Figure 3

From: Exploiting citation networks for large-scale author name disambiguation

Figure 3

Optimizing disambiguation parameters. (a) 10,000 random disambiguation parameters were tested for the 3,000 family names which we can validate with Google Scholar profiles. Results (indicated as black dots) close to the origin (0,0) yield the best trade-off between precision and h-index correctness. For samples A, B, C and D (consisting of 500 family names each), parameters were further optimized independently and cross-validated. (b) Curves represent a lower hull estimate for the results of a random parameter sampling when using only certain features of the metadata (C – Citations, R – References, A – Authors, S – Self-citations). The closer the curves come to the origin, the smaller the error. The combination of all four features lead to the best h-index reconstruction.

Back to article page