Skip to main content

Table 2 Basic characteristics of the datasets

From: Ground truth? Concept-based communities versus the external classification of physics manuscripts

 

N

V

\(\boldsymbol{V_{\mathrm{gen}}}\)

〈 k 〉

\(\boldsymbol{L^{\mathrm {in}}_{\mathrm{idf}}}\)

\(\boldsymbol{L_{\mathrm{idf}}}\)

\(\boldsymbol{L^{\mathrm{in}}_{\mathrm {bp}}}\)

\(\boldsymbol{L_{\mathrm{bp}}}\)

arxivPhys2013

36,386

12,200

347

37

5.9 × 108

3.3 × 108

2.1 × 106

1.3 × 106

arxivPhys2014

41,848

12,728

344

38

7.8 × 108

4.5 × 108

2.5 × 106

1.6 × 106

  1. Total number of articles (N), total number of identified concepts (V) and the number of generic ones (\(V_{\mathrm{gen}}\)) among them; 〈k〉 gives the average number of non-generic concepts within arbitrary chosen article. The number of links in a unipartite network provided that the generic concepts are included (\(L^{\mathrm {in}}_{\mathrm{idf}}\)) or excluded (\(L_{\mathrm{idf}}\)) is two orders of magnitude larger than the corresponding number of links in bipartite networks (\(L^{\mathrm{in}}_{\mathrm{bp}}\) and \(L_{\mathrm{bp}}\), respectively). This results in significant differences in computational resources needed to perform community detection analysis.