Quantifying the similarity between two articles based on their bibliographies. The similarity between two articles can be defined in terms of the overlap between their reference lists. The two articles P1 and P2 in panel (a) share only one citation; they should therefore be considered less similar than articles P3 and P4 in panel (b) which share four citations. This difference can be captured by the Jaccard index, which is equal to 0.2 in the former case and to 1.0 in the latter. However, the Jaccard index is equal to 1.0 also for the two articles in panel (c), which instead share only two citations. If citations are interpreted as proxies for knowledge flows, then the similarity between articles P7 and P8 in panel (d), which cite a highly-cited article, should be smaller than the similarity between articles P9 and P10 in panel (e), which instead are the only two articles citing P11. Our similarity measure, based on statistical validation, properly takes these heterogeneities into account.