Our approach consists of three steps. First, we map the temporal network to a static network between active nodes through a modified supra-adjacency representation that, despite being static, contains the whole information of the temporal network. Second, we apply standard embedding techniques for static graphs to this supra-adjacency network. We will consider embeddings based on random walks as they explore the temporal paths on which transmission between nodes can occur. Finally, we train a classifier to predict the dynamical state of all active nodes based on the vector representation of active nodes and the partially observed states. We now give details on each of these steps.
3.1 Supra-adjacency representation
We first map the temporal network to a supra-adjacency representation. The supra-adjacency representation has been first developed for multilayer networks [30, 31], in which nodes interact on different layers (for instance different communication channels in a social network). It has been generalized to temporal networks, seen as special multilayer networks in which every timestamp is a layer [23]: each node of the supra-adjacency representation is identified by the pair of indices \((i,t)\), corresponding to the node label i and the time frame t of the original temporal network. In this representation, the nodes \((i,t)\) are present for all nodes i and timestamps t, even if i is isolated at t.
We propose here to use a modified version in which we consider only the active times of each node. This results in a supra-adjacency representation whose nodes are the active nodes of the temporal network. More precisely, we define the supra-adjacency network as \(\mathcal{G} = (\mathcal{V}, \mathcal{E})\), where \(\mathcal{E}\) are (weighted, directed) edges joining active nodes. The mapping from the temporal network to the supra-adjacency network consists of the following two procedures (Fig. 1):
-
For each node i, we connect its successive active versions: for each active time \(t_{i, a}\) of i, we draw a directed “self-coupling” edge from \((i, t_{i, a})\) to \((i, t_{i, a+1})\) (recall that active times are ordered in increasing temporal order).
-
For each temporal edge \((i, j, t)\), the time t corresponds by definition to an active time for both i and j, that we denote respectively by \(t_{i, a}\) and \(t_{j, b}\). We then map \((i, j, t) \in E\) to two directed edges \(\in \mathcal{E}\), namely \(((i, t_{i, a}), (j, t_{j, b+1}) )\) and \(((j, t_{j, b}), (i, t_{i, a+1}) )\). In other words, the active copy of i at t, \((i,t)\), is linked to the next active copy of j, and vice-versa.
The first procedure makes each active node adjacent to its nearest past and future versions (i.e., at the previous and next active times). This ensures that a node carrying an information at a certain time can propagate it to its future self along the self-coupling edges, and is useful in an embedding perspective to favor temporal continuity. The second procedure encodes the temporal interactions. Crucially, all nodes are represented at all the times in which they interact, and all temporal edges are represented: the supra-adjacency representation does not involve any loss of temporal information, and the initial temporal network can be reconstructed from it. In particular, it yields the crucial property that any time-respecting path existing on the original temporal network, on which a dynamical process can occur, is also represented in the supra-adjacency representation. Indeed, if an interaction between two nodes i and j occurs at time t and potentially modifies their states, e.g., by contagion or opinion exchange or modification, this can be observed and will have consequences only at their next respective active times: for instance, if i transmits a disease to j at t, j can propagate it further to other neighbours only at its next active time, and not immediately at t. This is reflected in the supra-adjacency representation we propose.
The edges in \(\mathcal{E}\) are thus of two types, joining two active nodes corresponding either to the same original node, or to distinct ones. For each type, we can consider various ways of assigning weights to the edge. We first consider for simplicity that all self-coupling edges carry the same weight ω, which becomes thus a parameter of the procedure. Moreover, we simply report the weight \(w_{ij}(t)\) of each original temporal edge \((i,j,t)\) on the two supra-adjacency edges \(((i, t_{i, a}), (j, t_{j, b+1}) )\) and \(((j, t_{j, b}), (i, t_{i, a+1}) )\) (with \(t= t_{i, a} = t_{j, b}\)).
In the following, we will refer to the above supra-adjacency representation as dyn-supra. We will moreover consider two variations of this representation. First, we can ignore the direction of time of the original temporal network in the supra-adjacency representation by making all links of \(\mathcal{E}\) undirected. We will refer to this representation as dyn-supra-undirected. Another possible variation consists in encoding the time delay between active nodes into edge weights, with decreasing weights for increasing temporal differences. This decay of edge weights is consistent with the idea that successive active nodes that are temporally far apart are less likely to influence one another (which is the case for many important dynamical processes). In our case, we will consider, as a simple implementation of this concept, that the original weight of an edge \(((i, t), (j, t') )\) in the dyn-supra representation is multiplied by the reciprocal of the time difference between the active nodes, i.e., \(\vert 1/(t- t') \vert \). Each self-coupling edge has thus weight \(\omega /(t_{i, a+1}- t_{i, a})\), while a temporal edge \((i,j,t)\) with \(t=t_{i,a}=t_{j,b}\) yields the edges \(((i, t_{i, a}), (j, t_{j, b+1}) )\) with weight \(w_{ij}(t) / (t_{j, b+1}- t_{i, a})\) and \(((j, t_{j, b}), (i, t_{i, a+1}) )\) with weight \(w_{ij}(t) / (t_{i, a+1}- t_{j, b})\). We will refer to this representation as dyn-supra-decay.
3.2 Node embedding
The central idea of the embedding method we propose for temporal networks, which we call DyANE (Dynamics-Aware Node Embeddings), is to apply embedding methods developed for static networks to the supra-adjacency representation \(\mathcal{G}\) of the temporal graph. Numerous embedding techniques have been proposed for static networks, as surveyed in recent reviews [15, 16]: Most techniques consider as measure of proximity or similarity between nodes either first-order proximity (the similarity of two nodes increases with the strength of the edge connecting them) or second-order proximity (the similarity of two nodes increases with the overlap of their network neighborhoods). In particular, a popular technique to probe the (structural) similarity of nodes relies on random walks rooted at all nodes. Two of the most well-known embedding techniques, DeepWalk [26] and node2vec [27], are based on such an approach.
Methods based on random walks seem particularly appropriate to our framework as well: Indeed, in the supra-adjacency representation, random-walks will explore for each active node both self-coupling edges, connecting instances of the same node at different times, and edges representing the interactions between nodes. As explained above, these edges encode the paths along which information can flow over time, meaning that the final embedding will preserve structural similarities relevant to dynamical processes on the original network. Here we will use DeepWalk [26], as it is a simple and paradigmatic algorithm, and it is known to yield high performance in node classification tasks [16]. Note that DeepWalk does not support weighted edges, but it can easily be generalized so that the random walks take into account edge weights [27]. We remark that this choice is driven by the simplicity and popularity of DeepWalk, but that the embedding methodology we propose here can readily benefit from any other embedding techniques for static graphs.
3.3 Prediction of dynamical states
Once we have obtained an embedding for the supra-adjacency representation of the temporal network, we can turn to the task of predicting the dynamical states of active nodes. Since we assume that the set of possible states is known, this is naturally cast as a (supervised) classification task, in which each active node should be classified into one of the possible states. In our specific case, the three possible node states are S, I, and R. We recall that the classification task is not informed by the actual dynamical process (except knowing the set of possible node states). In particular, no information is available about the possible transitions nor about the parameters of the actual process.
We will use here a one-vs-rest logistic regression classifier, which is customarily used in multi-label node classification tasks based on embedding vectors. Naturally, we could use any other suitable classifier.
We remark that we seek to predict active node states for individual realizations of the dynamics. This is relevant to several applications: for example, in the context of epidemic spreading, and given a temporal interaction network, one might use such a predictive capability to infer the history of the states of all nodes from the observed states of few active nodes (“sentinel” nodes). The task however does not concern the future evolution of the epidemic after the end of the temporal network data.
3.4 Evaluation
The performance of our method can be evaluated along different lines. On the one hand, we can use standard measures used in prediction tasks, counting for each active node whether its state has been correctly predicted. We construct then a confusion matrix C, in which the element \(C_{ss'}\) is given by the number of active nodes that are in state s in the simulated spread and predicted to be in state \(s'\) by the classification method. The number \(TP_{s}\) of true positives for state s is then the diagonal element \(C_{ss}\) (and the total number of true positives is \(TP = \sum_{s} C_{ss}\)), while the number of false negatives \(FN_{s}\) for state s is \(\sum_{s' \ne s} C_{ss'}\). Similarly, the number of false positives \(FP_{s}\) is \(\sum_{s'\ne s} C_{s's}\) (active nodes predicted to be in state s while they are in a different state in the actual simulation).
The standard performance metrics for each state s, namely precision and recall, are given respectively by \(\mathrm{PRE}_{s} = TP_{s} / (TP_{s} + FP_{s})\) and \(\mathrm{REC}_{s} = TP_{s} / (TP_{s} + FN_{s})\) and the F1-score is \(F1_{s} = 2 \mathrm{PRE}_{s} \cdot \mathrm{REC}_{s} / (\mathrm{PRE}_{s} + \mathrm{REC}_{s})\). In order to obtain overall performance metrics, it is customary to combine the per-class F1-scores into a single number, the classifier’s overall F1-score. There are however several ways to do it and we resort here to the Macro-F1 and Micro-F1 indices, which are widely used for evaluating multi-label node classification tasks [16]. Both indices range between 0 and 1, with higher values indicating better performance.
Macro-F1 is an unweighted average of the F1 scores of each label, \(\sum_{s \in {\mathcal{S}}} F1_{s} / |{ {\mathcal{S}}}|\). On the other hand, Micro-F1 is obtained by using the total numbers of true and false positives and negatives. The total number of true positives is \(TP = \sum_{s} C_{ss}\), and, since any classification error is both a false positive and a false negative, the total numbers of false positives and of false negatives are both equal to \(FP=FN=\sum_{s \ne s'} C_{ss'}\). As a result, Micro-F1 is \(\sum_{s} C_{ss} / \sum_{s, s'} C_{ss'}\) (sum of the diagonal elements divided by sum of all the elements). In the case of imbalanced classes, Micro-F1 gives thus more importance to the largest classes, while Macro-F1 gives the same importance to each class, whatever its size. In our specific case of the SIR model, the three classes S, I, R might indeed be very imbalanced, depending on the model parameters, so that it is important to use both Macro- and Micro-F1 to evaluate the method’s performance in a broad range of conditions.
From an epidemiological point of view, it is also interesting to focus on global measures corresponding to an evaluation of the correctness of the prediction about the overall impact of the spread, as measured by the epidemic curve and the final epidemic size. For instance, if we denote by \(I_{a}^{\mathrm{real}}(t)\) the numbers of infectious active nodes at time t in the simulated spread, and by \(I_{a}^{\mathrm{pred}}(t)\) the number predicted in the classification task, we can define as measure of discrepancy between the real and predicted epidemic curves:
$$ \Delta _{I} = \frac{1}{T} \sum _{t=1}^{T} g(t),\quad g(t) = \textstyle\begin{cases} 0 & \text{if } \vert V_{t} \vert = 0, \\ \frac{ \vert I_{a}^{\mathrm{pred}}(t) - I_{a}^{\mathrm{real}}(t) \vert }{ \vert V_{t} \vert } & \text{otherwise}. \end{cases} $$
We can also focus on the final impact of the spread, as an evaluation of the global impact on the population, and compute the discrepancy in the final epidemic size
$$ \Delta _{\mathrm{size}} = \bigl[\bigl(I^{\mathrm{pred}}(T) + R^{\mathrm{pred}}(T) \bigr) - \bigl(I^{\mathrm{real}}(T)+ R^{\mathrm{real}}(T) \bigr)\bigr]/N \ . $$
Note that not all nodes might be active at the last time stamp T, so we can in this case and for simplicity consider for each node its last active time and assume that it does not change state until T.
3.5 Comparison with other methods and sensitivity analysis
Our framework entails two choices of procedures: the way in which the temporal network is represented as a static supra-adjacency object, and the choice of the node embedding method.
First, we consider a variation of our proposed supra-adjacency representation (dyn-supra), using a “baseline” supra-adjacency representation, which we denote by mlayer-supra: in this representation, we simply map each temporal edge \((i,j,t)\) to an edge between active nodes, namely \(((i,t), (j, t) )\), similarly to the original supra-adjacency representation developed for multilayer networks [31]. Self-coupling edges are drawn as in dyn-supra. This static representation of the temporal network is also lossless.
Moreover, for both dyn-supra and mlayer-supra, we consider an alternate embedding method to DeepWalk, namely LINE [32], which embeds nodes in a way to preserve both first and second-order proximity.
In addition, we consider four state of the art embedding methods for temporal networks, which do not use the intermediate step of using a supra-adjacency representation, but directly embed the temporal network, namely: (i) DynamicTriad (DTriad) [33], which embeds the temporal network by modeling triadic closure events; (ii) DynGEM [34], which is based on a deep learning model. It outputs an embedding for the network of each timestamp, initializing the model at timestamp \(t+1\) with the weights found at time t, thus transferring knowledge from t to \(t+1\) and learning about the changes from \(G_{t}\) to \(G_{t+1}\); (iii) StreamWalk [35], which uses time-respecting walks and online machine learning to capture temporal changes in the network structure; (iv) Online learning of second order node similarity (Online-neighbor) [35], which optimizes the embedding to match the neighborhood similarity of pairs of nodes, as measured by the Jaccard index of these neighborhoods.
Overall, we obtain eight methods to create an embedding of the temporal network – four variations of DyANE and four methods that directly embed temporal networks – which we denote respectively dyn-supra + DeepWalk, dyn-supra + LINE, mlayer-supra + DeepWalk, mlayer-supra + LINE, DTriad, DynGEM, StreamWalk and Online-neighbor.
Each variation of DyANE has moreover two parameters whose value can be a priori arbitrarily chosen, namely the weight ω and the embedding dimension d. In each of these variations of DyANE, it is also possible as explained above to consider undirected edges and to take into account the difference of the times between linked active nodes.
For each obtained embedding, we thus explore the performance of the classification task to explore the robustness of the results and their potential dependency on specific choices of the embedding method and of the parameter values.