Our approach consists of three steps. First, we map the temporal network to a static network between active nodes through a modified supraadjacency representation that, despite being static, contains the whole information of the temporal network. Second, we apply standard embedding techniques for static graphs to this supraadjacency network. We will consider embeddings based on random walks as they explore the temporal paths on which transmission between nodes can occur. Finally, we train a classifier to predict the dynamical state of all active nodes based on the vector representation of active nodes and the partially observed states. We now give details on each of these steps.
3.1 Supraadjacency representation
We first map the temporal network to a supraadjacency representation. The supraadjacency representation has been first developed for multilayer networks [30, 31], in which nodes interact on different layers (for instance different communication channels in a social network). It has been generalized to temporal networks, seen as special multilayer networks in which every timestamp is a layer [23]: each node of the supraadjacency representation is identified by the pair of indices \((i,t)\), corresponding to the node label i and the time frame t of the original temporal network. In this representation, the nodes \((i,t)\) are present for all nodes i and timestamps t, even if i is isolated at t.
We propose here to use a modified version in which we consider only the active times of each node. This results in a supraadjacency representation whose nodes are the active nodes of the temporal network. More precisely, we define the supraadjacency network as \(\mathcal{G} = (\mathcal{V}, \mathcal{E})\), where \(\mathcal{E}\) are (weighted, directed) edges joining active nodes. The mapping from the temporal network to the supraadjacency network consists of the following two procedures (Fig. 1):

For each node i, we connect its successive active versions: for each active time \(t_{i, a}\) of i, we draw a directed “selfcoupling” edge from \((i, t_{i, a})\) to \((i, t_{i, a+1})\) (recall that active times are ordered in increasing temporal order).

For each temporal edge \((i, j, t)\), the time t corresponds by definition to an active time for both i and j, that we denote respectively by \(t_{i, a}\) and \(t_{j, b}\). We then map \((i, j, t) \in E\) to two directed edges \(\in \mathcal{E}\), namely \(((i, t_{i, a}), (j, t_{j, b+1}) )\) and \(((j, t_{j, b}), (i, t_{i, a+1}) )\). In other words, the active copy of i at t, \((i,t)\), is linked to the next active copy of j, and viceversa.
The first procedure makes each active node adjacent to its nearest past and future versions (i.e., at the previous and next active times). This ensures that a node carrying an information at a certain time can propagate it to its future self along the selfcoupling edges, and is useful in an embedding perspective to favor temporal continuity. The second procedure encodes the temporal interactions. Crucially, all nodes are represented at all the times in which they interact, and all temporal edges are represented: the supraadjacency representation does not involve any loss of temporal information, and the initial temporal network can be reconstructed from it. In particular, it yields the crucial property that any timerespecting path existing on the original temporal network, on which a dynamical process can occur, is also represented in the supraadjacency representation. Indeed, if an interaction between two nodes i and j occurs at time t and potentially modifies their states, e.g., by contagion or opinion exchange or modification, this can be observed and will have consequences only at their next respective active times: for instance, if i transmits a disease to j at t, j can propagate it further to other neighbours only at its next active time, and not immediately at t. This is reflected in the supraadjacency representation we propose.
The edges in \(\mathcal{E}\) are thus of two types, joining two active nodes corresponding either to the same original node, or to distinct ones. For each type, we can consider various ways of assigning weights to the edge. We first consider for simplicity that all selfcoupling edges carry the same weight ω, which becomes thus a parameter of the procedure. Moreover, we simply report the weight \(w_{ij}(t)\) of each original temporal edge \((i,j,t)\) on the two supraadjacency edges \(((i, t_{i, a}), (j, t_{j, b+1}) )\) and \(((j, t_{j, b}), (i, t_{i, a+1}) )\) (with \(t= t_{i, a} = t_{j, b}\)).
In the following, we will refer to the above supraadjacency representation as dynsupra. We will moreover consider two variations of this representation. First, we can ignore the direction of time of the original temporal network in the supraadjacency representation by making all links of \(\mathcal{E}\) undirected. We will refer to this representation as dynsupraundirected. Another possible variation consists in encoding the time delay between active nodes into edge weights, with decreasing weights for increasing temporal differences. This decay of edge weights is consistent with the idea that successive active nodes that are temporally far apart are less likely to influence one another (which is the case for many important dynamical processes). In our case, we will consider, as a simple implementation of this concept, that the original weight of an edge \(((i, t), (j, t') )\) in the dynsupra representation is multiplied by the reciprocal of the time difference between the active nodes, i.e., \(\vert 1/(t t') \vert \). Each selfcoupling edge has thus weight \(\omega /(t_{i, a+1} t_{i, a})\), while a temporal edge \((i,j,t)\) with \(t=t_{i,a}=t_{j,b}\) yields the edges \(((i, t_{i, a}), (j, t_{j, b+1}) )\) with weight \(w_{ij}(t) / (t_{j, b+1} t_{i, a})\) and \(((j, t_{j, b}), (i, t_{i, a+1}) )\) with weight \(w_{ij}(t) / (t_{i, a+1} t_{j, b})\). We will refer to this representation as dynsupradecay.
3.2 Node embedding
The central idea of the embedding method we propose for temporal networks, which we call DyANE (DynamicsAware Node Embeddings), is to apply embedding methods developed for static networks to the supraadjacency representation \(\mathcal{G}\) of the temporal graph. Numerous embedding techniques have been proposed for static networks, as surveyed in recent reviews [15, 16]: Most techniques consider as measure of proximity or similarity between nodes either firstorder proximity (the similarity of two nodes increases with the strength of the edge connecting them) or secondorder proximity (the similarity of two nodes increases with the overlap of their network neighborhoods). In particular, a popular technique to probe the (structural) similarity of nodes relies on random walks rooted at all nodes. Two of the most wellknown embedding techniques, DeepWalk [26] and node2vec [27], are based on such an approach.
Methods based on random walks seem particularly appropriate to our framework as well: Indeed, in the supraadjacency representation, randomwalks will explore for each active node both selfcoupling edges, connecting instances of the same node at different times, and edges representing the interactions between nodes. As explained above, these edges encode the paths along which information can flow over time, meaning that the final embedding will preserve structural similarities relevant to dynamical processes on the original network. Here we will use DeepWalk [26], as it is a simple and paradigmatic algorithm, and it is known to yield high performance in node classification tasks [16]. Note that DeepWalk does not support weighted edges, but it can easily be generalized so that the random walks take into account edge weights [27]. We remark that this choice is driven by the simplicity and popularity of DeepWalk, but that the embedding methodology we propose here can readily benefit from any other embedding techniques for static graphs.
3.3 Prediction of dynamical states
Once we have obtained an embedding for the supraadjacency representation of the temporal network, we can turn to the task of predicting the dynamical states of active nodes. Since we assume that the set of possible states is known, this is naturally cast as a (supervised) classification task, in which each active node should be classified into one of the possible states. In our specific case, the three possible node states are S, I, and R. We recall that the classification task is not informed by the actual dynamical process (except knowing the set of possible node states). In particular, no information is available about the possible transitions nor about the parameters of the actual process.
We will use here a onevsrest logistic regression classifier, which is customarily used in multilabel node classification tasks based on embedding vectors. Naturally, we could use any other suitable classifier.
We remark that we seek to predict active node states for individual realizations of the dynamics. This is relevant to several applications: for example, in the context of epidemic spreading, and given a temporal interaction network, one might use such a predictive capability to infer the history of the states of all nodes from the observed states of few active nodes (“sentinel” nodes). The task however does not concern the future evolution of the epidemic after the end of the temporal network data.
3.4 Evaluation
The performance of our method can be evaluated along different lines. On the one hand, we can use standard measures used in prediction tasks, counting for each active node whether its state has been correctly predicted. We construct then a confusion matrix C, in which the element \(C_{ss'}\) is given by the number of active nodes that are in state s in the simulated spread and predicted to be in state \(s'\) by the classification method. The number \(TP_{s}\) of true positives for state s is then the diagonal element \(C_{ss}\) (and the total number of true positives is \(TP = \sum_{s} C_{ss}\)), while the number of false negatives \(FN_{s}\) for state s is \(\sum_{s' \ne s} C_{ss'}\). Similarly, the number of false positives \(FP_{s}\) is \(\sum_{s'\ne s} C_{s's}\) (active nodes predicted to be in state s while they are in a different state in the actual simulation).
The standard performance metrics for each state s, namely precision and recall, are given respectively by \(\mathrm{PRE}_{s} = TP_{s} / (TP_{s} + FP_{s})\) and \(\mathrm{REC}_{s} = TP_{s} / (TP_{s} + FN_{s})\) and the F1score is \(F1_{s} = 2 \mathrm{PRE}_{s} \cdot \mathrm{REC}_{s} / (\mathrm{PRE}_{s} + \mathrm{REC}_{s})\). In order to obtain overall performance metrics, it is customary to combine the perclass F1scores into a single number, the classifier’s overall F1score. There are however several ways to do it and we resort here to the MacroF1 and MicroF1 indices, which are widely used for evaluating multilabel node classification tasks [16]. Both indices range between 0 and 1, with higher values indicating better performance.
MacroF1 is an unweighted average of the F1 scores of each label, \(\sum_{s \in {\mathcal{S}}} F1_{s} / { {\mathcal{S}}}\). On the other hand, MicroF1 is obtained by using the total numbers of true and false positives and negatives. The total number of true positives is \(TP = \sum_{s} C_{ss}\), and, since any classification error is both a false positive and a false negative, the total numbers of false positives and of false negatives are both equal to \(FP=FN=\sum_{s \ne s'} C_{ss'}\). As a result, MicroF1 is \(\sum_{s} C_{ss} / \sum_{s, s'} C_{ss'}\) (sum of the diagonal elements divided by sum of all the elements). In the case of imbalanced classes, MicroF1 gives thus more importance to the largest classes, while MacroF1 gives the same importance to each class, whatever its size. In our specific case of the SIR model, the three classes S, I, R might indeed be very imbalanced, depending on the model parameters, so that it is important to use both Macro and MicroF1 to evaluate the method’s performance in a broad range of conditions.
From an epidemiological point of view, it is also interesting to focus on global measures corresponding to an evaluation of the correctness of the prediction about the overall impact of the spread, as measured by the epidemic curve and the final epidemic size. For instance, if we denote by \(I_{a}^{\mathrm{real}}(t)\) the numbers of infectious active nodes at time t in the simulated spread, and by \(I_{a}^{\mathrm{pred}}(t)\) the number predicted in the classification task, we can define as measure of discrepancy between the real and predicted epidemic curves:
$$ \Delta _{I} = \frac{1}{T} \sum _{t=1}^{T} g(t),\quad g(t) = \textstyle\begin{cases} 0 & \text{if } \vert V_{t} \vert = 0, \\ \frac{ \vert I_{a}^{\mathrm{pred}}(t)  I_{a}^{\mathrm{real}}(t) \vert }{ \vert V_{t} \vert } & \text{otherwise}. \end{cases} $$
We can also focus on the final impact of the spread, as an evaluation of the global impact on the population, and compute the discrepancy in the final epidemic size
$$ \Delta _{\mathrm{size}} = \bigl[\bigl(I^{\mathrm{pred}}(T) + R^{\mathrm{pred}}(T) \bigr)  \bigl(I^{\mathrm{real}}(T)+ R^{\mathrm{real}}(T) \bigr)\bigr]/N \ . $$
Note that not all nodes might be active at the last time stamp T, so we can in this case and for simplicity consider for each node its last active time and assume that it does not change state until T.
3.5 Comparison with other methods and sensitivity analysis
Our framework entails two choices of procedures: the way in which the temporal network is represented as a static supraadjacency object, and the choice of the node embedding method.
First, we consider a variation of our proposed supraadjacency representation (dynsupra), using a “baseline” supraadjacency representation, which we denote by mlayersupra: in this representation, we simply map each temporal edge \((i,j,t)\) to an edge between active nodes, namely \(((i,t), (j, t) )\), similarly to the original supraadjacency representation developed for multilayer networks [31]. Selfcoupling edges are drawn as in dynsupra. This static representation of the temporal network is also lossless.
Moreover, for both dynsupra and mlayersupra, we consider an alternate embedding method to DeepWalk, namely LINE [32], which embeds nodes in a way to preserve both first and secondorder proximity.
In addition, we consider four state of the art embedding methods for temporal networks, which do not use the intermediate step of using a supraadjacency representation, but directly embed the temporal network, namely: (i) DynamicTriad (DTriad) [33], which embeds the temporal network by modeling triadic closure events; (ii) DynGEM [34], which is based on a deep learning model. It outputs an embedding for the network of each timestamp, initializing the model at timestamp \(t+1\) with the weights found at time t, thus transferring knowledge from t to \(t+1\) and learning about the changes from \(G_{t}\) to \(G_{t+1}\); (iii) StreamWalk [35], which uses timerespecting walks and online machine learning to capture temporal changes in the network structure; (iv) Online learning of second order node similarity (Onlineneighbor) [35], which optimizes the embedding to match the neighborhood similarity of pairs of nodes, as measured by the Jaccard index of these neighborhoods.
Overall, we obtain eight methods to create an embedding of the temporal network – four variations of DyANE and four methods that directly embed temporal networks – which we denote respectively dynsupra + DeepWalk, dynsupra + LINE, mlayersupra + DeepWalk, mlayersupra + LINE, DTriad, DynGEM, StreamWalk and Onlineneighbor.
Each variation of DyANE has moreover two parameters whose value can be a priori arbitrarily chosen, namely the weight ω and the embedding dimension d. In each of these variations of DyANE, it is also possible as explained above to consider undirected edges and to take into account the difference of the times between linked active nodes.
For each obtained embedding, we thus explore the performance of the classification task to explore the robustness of the results and their potential dependency on specific choices of the embedding method and of the parameter values.