A roadmap for the computation of persistent homology

EPJ Data Science

Table 3 Performance of the software packages measured in wall-time (i.e., elapsed time), and CPU seconds (for the computations running on the cluster)

Data set	(a) Computations on cluster: wall-time seconds
Data set	eleg	Klein	HIV	drag 2	random	genome
Size of complex	4.4 × 10⁶	1.1 × 10⁷	2.1 × 10⁸	1.3 × 10⁹	3.1 × 10⁹	4.5 × 10⁸
Max. dim.	2	2	2	2	8	2
javaPlex (st)	84	747	-	-	-	-
Dionysus (st)	474	1,830	-	-	-	-
DIPHA (st)	6	90	1,631	142,559	-	9,110
Perseus	543	1,978	-	-	-	-
Dionysus (d)	513	145	-	-	-	-
DIPHA (d)	4	6	81	2,358	5,096	232
Gudhi	36	89	1,798	14,368	-	4,753
Ripser	1	1	2	6	349	3

Data set	(b) Computations on cluster: CPU seconds
Data set	eleg	Klein	HIV	drag 2	random	genome
Size of complex	4.4 × 10⁶	1.1 × 10⁷	2.1 × 10⁸	1.3 × 10⁹	3.1 × 10⁹	4.5 × 10⁸
Max. dim.	2	2	2	2	8	2
javaPlex (st)	284	1,031	-	-	-	-
Dionysus (st)	473	1,824	-	-	-	-
DIPHA (st)	68	1,360	25,950	1,489,615	-	130,972
Perseus	542	1,974	-	-	-	-
Dionysus (d)	513	145	-	-	-	-
DIPHA (d)	39	73	1,276	37,572	79,691	3,622
Gudhi	36	88	1,794	14,351	-	4,764
Ripser	1	1	2	5	348	2

Data set	(c) Computations on shared-memory system: wall-time seconds
Data set	eleg	Klein	HIV	drag 2	genome	fract r
Size of complex	3.2 × 10⁸	1.1 × 10⁷	2.1 × 10⁸	1.3 × 10⁹	4.5 × 10⁸	2.8 × 10⁹
Max. dim.	3	2	2	2	2	3
javaPlex (st)	13,607	1,358	43,861	-	28,064	-
Perseus	-	1,271	-	-	-	-
Dionysus (d)	-	100	142,055	35,366	-	572,764
DIPHA (d)	926	13	773	4,482	1,775	3,923
Gudhi	381	6	177	1,518	442	4,590
Ripser	2	1	2	5	3	1,517

For each data set, we indicate the size of the simplicial complex and the maximum dimension up to which we construct the VR complex. For all data sets, we construct the filtered VR complex up to the maximum distance between any two points. We indicate the implementation of the standard algorithm using the abbreviation ‘st’ following the name of the package, and we indicate the implementation of the dual algorithm using the abbreviation ‘d.’ The symbol ‘-’ signifies that we were unable to finish computations for this data set, because the machine ran out of memory. Perseus implements only the standard algorithm, and Gudhi and Ripser implement only the dual algorithm. (a), (b) We run DIPHA on one node and 16 cores for the data sets eleg, Klein, and genome; on 2 nodes of 16 cores for the HIV data set; on 2 and 3 nodes of 16 cores for the dual and standard implementations, respectively, for drag 2; and on 8 nodes of 16 cores for random. (The maximum number of processes that we could use at any one time was 128.) (c) We run DIPHA on a single core.