The Network Picture of Labor Flow

We construct a data-driven model of flows in graphs that captures the essential elements of the movement of workers between jobs in the companies (firms) of entire economic systems such as countries. The model is based on the observation that certain job transitions between firms are often repeated over time, showing persistent behavior, and suggesting the construction of static graphs to act as the scaffolding for job mobility. Individuals in the job market (the workforce) are modelled by a discrete-time random walk on graphs, where each individual at a node can possess two states: employed or unemployed, and the rates of becoming unemployed and of finding a new job are node dependent parameters. We calculate the steady state solution of the model and compare it to extensive micro-datasets for Mexico and Finland, comprised of hundreds of thousands of firms and individuals. We find that our model possesses the correct behavior for the numbers of employed and unemployed individuals in these countries down to the level of individual firms. Our framework opens the door to a new approach to the analysis of labor mobility at high resolution, with the tantalizing potential for the development of full forecasting methods in the future.


Introduction.
High employment is one of the central goals of any economic policy, because this is associated with economic, social and political prosperity of countries. Of the many perspectives that need to be considered to understand the problem of employment, job search has attracted a large amount of interest for its relatively well defined nature, and the perception that economic policies can have an important impact in its optimization; numerous important results have been obtained and are well summarized in reviews such as [1,2]. The main approach to understand job search is known as search and matching modeling [3,4]. Search and matching models broadly consist of a stochastic process by which two kinds of entities (e.g., unemployed individuals and vacancies) join to create a new match, and this joining is mediated by a success rate called the aggregate matching function [1,5]. These models have been successful at predicting quantitative and qualitative features of employment, and are well accepted [1]. However, despite their success, search and matching models have inherent limitations in the way they are constructed. One of those limitations, the notion of aggregation, eliminates from consideration the role played in the dynamics of employment by specific companies (firms 1 ) in the economy. At first glance, this may not seem critical because any country has a large number of firms, many of which are quite similar. But upon more detailed consideration, one finds it is also true that in most countries there are firms that play central roles, and when these particular firms are affected by any number of factors such as technological change, new economic policies, or competition, the impact on employment can be considerable and have downstream effects on the entire economy of the country. Indeed, empirical evidence has shown that the shocks experienced by the largest firms are responsible for the majority of fluctuations in the total production of an economy, and not necessarily because of firms' sizes, but because of the propagation effects through the entire system [6,7].
For some decades, governments from several countries have stored highly granular micro-data about firms and workers, constructed from social security records [8]. However, detailed analysis of coupled firm and labor dynamics is not common in the economics literature, due to the limitations of commonly employed methods. Such data, in conjunction with the framework here proposed, offers the opportunity to uncover the specific roles that firms play in employment.
By approaching the problem of job mobility in a novel way, and using data already available, it is possible to construct more detailed models of job transitions with resolution at the level of individuals and firms in entire countries, offering a new approach to the study of labour dynamics. This is the focus of the current article. Specifically, we introduce a stochastic process on graphs that accurately represents observed employment and unemployment patterns in two comprehensive micro-datasets. Labor mobility occurs within graphs (or networks), which summarize the constraints that agents encounter while moving between jobs. These graphs are constructed as follows: vertices (nodes) represent firms, and edges represent previously observed job transitions between the firms (one or more workers changed jobs from one of the nodes to the other in a chosen time period). Workers are modeled as performing a set of simple decisions: when employed, they separate from their job with a firm-dependent probability, and when unemployed they choose to apply to one of the neighboring firms on the graph that is open to hire. These rules amount to a version of random walks on graphs (for reviews on this topic, see e.g. [9,10,11]). Although the model is simple, it is able to reconstruct relevant detailed employment features of the micro-data.
One of the advantages of such a model is that it can be directly calibrated from real data down to the level of firms. This calibration, together with plausible scenarios based on the introduction of economic policies, market, or technological changes affecting particular firms, or changes in labor laws, to name a few, can potentially lead to more accurate and highly resolved forecasting of job mobility trends. Such a forecasting tool would be very valuable for those responsible for economic and labor policy-making.
As a technical consequence of our approach, we find it useful to introduce an innovative concept: firm specific unemployment. In the model, this concept is necessary because individuals that have recently stopped working at a particular firm engage in job searching only along the edges adjacent to their most recent employer, which we term local search. Individuals then remain associated with a firm from the moment their employment finishes, to the date when they find a new job in a different firm. This concept leads to a set of new considerations about the way in which we interpret unemployment.
Finally, we present statistical evidence derived from the micro-datasets that supports our approach. First, we corroborate that our set of assumptions for the model are consistent with reality. This corroboration includes verifying that the structure of the graphs used are persistent, i.e., that job transitions over time do not simply occur randomly, but instead are regularly repeated over time, lending strength of our use of static graphs to model job movements. To our knowledge, this is the first time such a test is performed in large scale disaggregated data. With our assumptions, we show that a restricted version of our model is consistent with the general statistical features of the data including the typical number of employees at each firm, and the number of people looking for jobs after being separated from their previous firm.
Graph approaches to the problem of job search have been considered before, albeit not with our focus. Notably, in Refs. [12,13,14,15,16] and related work, job search is analyzed as a social network, where information about vacancies travels along social ties. This approach, related to the ideas and thinking of other social scientists such as Granovetter [17] have been shown to be consistent with empirical observations. However, the disadvantage of the social network framework is that social ties are not usually susceptible to the tools of economic policy, and are also hard to characterize empirically. Our approach is fundamentally different in that it focuses on the very entities in which employment takes place: firms. Somewhat related work was carried out in Ref. [18], where the authors consider a purely theoretical model of worker transitions between firms as a Markov process, but their approach mixes aggregate and disaggregate features, and does not tackle any empirical verification. Two of the authors of the current manuscript proposed the framework of labour flow networks in a previous publication [19], but focused on studying their empirical properties and modeling them as the result of economic interactions. In contrast, we use the networks as a static and persistent structure that shapes labour mobility. In this publication, we attempt to build the basic modeling framework that can lead to predictions of job mobility at high resolution (down to the firm level) and show that this approach is consistent with the collected empirical evidence.
The article is structured as follows: Sec. 2 is dedicated to the construction and calculations of our labor flow model, including the derivation of the equations that broadly govern the problem, the main model predictions, and a sketch of the algorithm necessary to apply the model to data; Sec. 3 is concerned with the empirical analysis of the data to both justify our choices in building the model, and to compare data with the predictions of the model, and; Sec. 4 presents the final discussion and conclusions of our work.
2. Modelling worker movement. In order to provide a clear framework, we begin our detailed discussion by first introducing the assumptions of the model. This is followed by the calculation of the generating functions and moments of the distributions of numbers of employees and unemployed agents associated with a firm. We then present a treatment of the evolution of an individual agent, including job and unemployment times, which offers an alternative way to calculate the properties of the model. We finalize the section by explaining how the model can be written directly from measurable quantities in available data, and how such data needs to be used in order to predict labor flows.

Modelling rules and assumptions.
Job search is a complicated problem, influenced by a number of factors such as skills of an agent, the type of business undertaken by a firm, effectiveness in advertising and recruiting workers by firms, etc. In order to model the search process in a tractable but efficient way, we must build a framework that is at the same time rich enough to avoid losing critical behavior, but simple enough that it can shed light on the qualitative features of the problem 2 .
Broadly speaking, there are two basic elements that need to be modelled: the structure of the economy, and the behavior of the agents.
To represent an economy, we construct a graph G that encodes N firms as nodes, and edges that represent allowed job transitions agents can undertake (for a review of graph theory, see [21]). The graph is assumed to be both undirected and unweighted. When dealing with real data, we develop a procedure to construct such graphs (see Sec. 3.2), but for the purpose of modelling the agent's behavior, the graph is taken as input to the model. For a theoretical investigation, one can, for instance, study job mobility in a graph sampled from an ensemble of random graphs with features relevant to a research question of interest; for a fully empirical study, one can use a single graph constructed from data for a specific economic system such as a country. Either way, we consider the graph to be static, which is to say, not changing in time. This assumption is in fact driven by our empirical findings (see Sec. 3).
Firms are also characterized by a number of parameters that govern the agent dynamics, and these are also considered constant in time. One of these parameters is the probability λ i that an agent at firm i becomes unemployed at any given time step. While the agent is employed at i it is said to be in state L i , and if it is unemployed with i being its last employer, it is in state U i . Probability λ i corresponds to the rate by which agents at i move from state L i to state U i . This probability can vary from firm to firm, but any agent employed in a specific firm has the same probability to become unemployed. An equal probability to become unemployed at a given firm is equivalent to having equal average employment time (tenure) for all employees of that firm (see Sec. 2.5).
Another parameter that is associated to a firm i is the probability v i per time step that it will be accepting applications. This parameter is also assumed to be firm dependent, and it may be interpreted as a combination of a firm's financial strength, need for personnel, aggressiveness of recruitment, etc.
The last parameter that we must define for a firm i is its rate h i of hiring applicants, i.e., the probability that any individual that applies for a job at i becomes employed. Parameters h i and v i play an important role in regulating the size of a firm. In real economies, even though detailed and systematic data is not available to determine h i and v i , they are sensible parameters that one expects to find in associated with firms. We assume their values are in the interval (0, 1] in order to be meaningful in the model. The behavior of agents is governed by the following rules. First, an agent employed at a firm (say i) at time step t tests whether it is to remain employed (in state L i ) or not (move to state U i ) with probability λ i . If it remains in L i , it continues onto the next time step t + 1. If it moves to U i , it waits one time step and then looks for a job at step t + 1. To search for a job, an agent in state U i identifies all node neighbors j that belong to Γ i (the node neighbors of i in G) and that are accepting applications at that time step, each with probability v j . The agent then applies to one of those neighbors with uniform probability. If none of the neighbors are open, the agent does not submit any applications and remains in U i for an additional time step, when it again tries to find a job. The agent constraint of looking for a job only inside the graph leads us to define a firm specific unemployment, which reflects the continued "association" that agents have to their most recent employer. We assume all agents are fully aware of all neighbors that are currently accepting applications.
perspective is contained in an upcoming publication [20] by the authors.
With the model defined as above, we calculate analytical solutions for the average numbers of employed and unemployed agents at the firms of the graph, the probability of any agent to be employed or unemployed at a given firm, and provide the recipe for calculating other quantities of the model. Since most economies spend large proportions of time in states of small overall change, we focus on the steady state behavior of the model. This serves as a reasonable starting point for comparing the predictions of our approach to data from the real world.
Our model, at its core, corresponds to a random walk process on graphs in which some of the time scales have been modified by the waiting times that occur both in the employed and unemployed states. Formally, the process is a Markov chain, as the state of the system depends only on the previous time step.
2.2. Evolution of the state of a firm. To begin our detailed study, consider a given connected undirected graph G with N nodes, and H agents distributed among the nodes of the graph (the workforce). We focus on the evolution of the system, captured by the probability distribution Q i,t (U i,t , L i,t ) of there being U i,t unemployed and L i,t employed agents at i at time t, where U i,t , L i,t are random variables that can take on values from 0 to H. To learn about the steady state of the system, we must first write down the explicit evolution equation, and consider its behavior in the steady state where Q i,t = Q (s) i for all t, i.e., the distribution becomes stationary in time.
To specify the evolution equation of the system, we break down each individual mechanism of the flow process for node i between time steps t and t + 1, where the number of employed and unemployed agents at i and t are L i,t and U i,t , respectively. Consider first ∆ u , the random variable that represents the number of agents becoming unemployed at a given time step. Because each agent acts independently, ∆ u has a binomial distribution, i.e., Another mechanism affecting the number of employed agents is the acceptance or hiring rate h i of a firm. The number of new employees depends on both the number of agents that apply for a job at firm i, and those that are accepted. Given a number of applicants A i,t , the probability to accept ∆ l of them is also given by a binomial The processes related to (2.1) and (2.2) are responsible for the number of agents that are employed at i at time t + 1, namely L i,t+1 = L i,t − ∆ u + ∆ l , with probability given by the product of the two binomials above 3 .
From the standpoint of the number of unemployed agents at t + 1, U i,t+1 depends upon U i,t , ∆ u , and the agents in state U i that find employment elsewhere, which we specify in detail below. For that purpose, we define γ i,t , the subset of Γ i of neighbors of i that are accepting job applications at time step t. The probability to draw any given subset γ i,t is given by the joint distribution where the set γ i,t is the complement set of γ i,t with respect to Γ i , i.e., γ i,t ∪ γ i,t = Γ i and γ i,t ∩ γ i,t = ∅. The use of t when referring to any γ i,t is not strictly necessary, as the configurations of open neighbors are sampled independently each time step, and thus we drop reference to t for these sets. When at least one neighbor is accepting applications, the probability for any agent to apply to a specific open neighbor of i is equal to 1/|γ i |. Therefore, job applications are distributed among γ i according to a multinomial distribution. Given U i,t unemployed agents, with ν ij applying to neighbor j ∈ γ i , and using the symbol ν i to represent the entire application allocation to all nodes in γ i , the distribution of applications to the neighbors is given by where we have used a shorthand notation for the multinomial coefficient given by . . , j |γi| the elements of γ i . Given an acceptance rate of h j for neighbor j, η ij agents are hired at j out of the ν ij that apply, and this random variable is also distributed in binomial fashion, Altogether, representing the total accepted applications by η i := (η ij1 , . . . , η ij |γ i | ), the probability for those acceptances is If in a given time step all neighbors are closed to new applicants then, by construction, ν ij = 0 for all j, and similarly for η ij . Symbolically, γ i = ∅ and γ i = Γ i , and this occurs with probability j∈Γi (1 − v j ). In this case, when all neighbors are closed. Let |η i | represent the total number of agents accepted into other positions, and given by |η i | = j∈γi η ij . Then the number of agents in state U in time t + 1 is given by To summarize the evolution, we must collect all the previous mechanisms, summing over all possible γ i , A i,t , ∆ u , ∆ l , ν i , η i , and in addition, since there are multiple states at time t compatible with a given state at time t + 1, one must also sum over U i,t , L i,t . Writing a single summation symbol for the previous variables, the full expression for the evolution of Q i,t is given by (omitting the conditionals on the distributions) where the δ[γ i , ∅] = 1 only when γ i = ∅ and 0 otherwise, and similarly δ[ν i , 0] = 1 only when all ν ij = 0 and 0 otherwise. The use of |η i | in both terms of the brackets is a shorthand for the fact that in order to have a net outflow of agents equal to |η i |, one must take all possible combinations of {η ij } j∈γi for given γ i , and take those for which the overall flow is |η i |; in other words, we are implicitly using an additional It is convenient to employ the generating function formalism [22] for calculating moments of the distribution. By definition, the generating function of and similarly for Q i,t+1 . Using this definition on (2.8) applied to time t + 1, one obtains the relation where φ is the generating function associated with the distribution Pr(A i,t ), h γi := j∈γi h j /|γ i |, and the notation γi =∅ means that the sum runs over all possible configurations {γ i } of open neighbors of i except for the case when all neighbors are not accepting applications.
The previous results can be specialized to the steady state, where Q i,t → Q (s) i is independent of time. For now, we assume that this steady state exists and determine some of the statistical properties of the process such as the number of employed and unemployed agents; the existance of a steady state solution is shown later (Secs. 2.3 and 2.4).
The generating function (2.10) can be used to calculate moments of Q , although the algebra can be cumbersome for higher moments. For the average unemployment associated with firm i, we have and for the average employment, By substituting the steady state distribution Q (s) i on both sides of (2.10), and using the chain rule when taking derivatives of x and y 4 , we obtain from (2.11) and (2.12) where we drop t since we are in the steady state, and the sum is over all possible γ i except γ i = ∅. Note that this expression indicates how average employment and unemployment relate to each other, but does not provide a solution that is solely based on the basic parameters of the problem. To construct a full solution, we must analyze in detail the flows of agents in the system, which we proceed to tackle next.
2.3. Average employment and unemployment of a firm. In order to make progress, we study the full distribution Pr(|η i |) of outgoing agents from firm i. Let us recall that the distribution of outgoing application allocations is governed by Pr(ν i |U i,t , γ i ) and the hirings by Pr(η i |ν i , γ i ). Furthermore, the overall flow is also dependent on γ i and U i,t (through Q i,t (U i,t , L i,t )). We must also keep in mind that γ i can be the empty set when no neighbors are receiving applicants. Therefore, summing over U i,t , L i,t , ν i and {γ i } (the set of all possible configurations of open and closed neighbors to i), we have where we have kept the conditionals to avoid confusion. The corresponding generating function for Pr(|η i |) is given by Since ψ(x) = Pr(|η i |)x |η i | , the sum over L i,t remains expressed since there is no additional variable y that sums over the second argument of Q i,t (U i,t , L i,t ). Despite this, we still use Q i,t to represent the generating function summed only over U i,t . In the steady state, the average outflow is given by the first derivative dψ/dx evaluated at x = 1, which produces and with the use of (2.13), which is intuitively sound, as the number of agents that become unemployed and look for jobs is on average λ i L i and therefore they must flow elsewhere for the steady state to be achieved. A similar calculation leads to the average steady state agent 4 Note, for instance, that taking x derivative of Q (s) flow along a particular edge, which is The steady state condition is satisfied if the average flows into and out of a node (firm) are equal. This implies Using (2.13), (2.17), and (2.18), one can restate this as .
This expression provides a system of equations that can in principle be solved for all L i , provided such solution exists.
To understand this further, we write (2.20) in matrix form making use of the adjacency matrix of the graph, A, for which A ij = A ji = 1 if i and j have an edge connecting them, and zero otherwise. This produces the expression for all i. This represents a homogeneous system of linear equations, which always has the trivial null solution, and has non-trivial solutions if and only if the matrix contained inside brackets is singular which, among other things, implies that the matrix does not have full rank [23]. To show that our model has non-trivial solutions indeed, we define the matrix Λ, with element Λ ij corresponding to the expression inside brackets This matrix does not possess full rank as can be explicitly seen from the fact that all columns add to zero. To show this, we first sum Λ ij over i We can now show that the numerator and denominator of the second term are indeed equal. To see this in detail, we organize the j |, and rewrite the numerator as where the last sum is over all elements of {γ (i) j } with equal size c. Now, the sum over i guarantees that each neighbor of j belonging to a particular γ (i) j is summed, along with the corresponding h r , where r ∈ γ (i) j . Therefore, the sum over i can be rewritten as and inserting this into the sum over c leads to Therefore, which means that for all j, (2.23) is identically zero.
The fact that Λ has reduced rank can also be seen from (2.16) and (2.18), which imply and because, by definition |η j | = j η ji , one arrives at N i=1 Λ ij = 0 as before. But matrix Λ, as expressed in (2.28), manifestly represents the Laplacian matrix of a random walk with heterogeneous transitions probabilities on the edges of the graph, a well-understood process [21]. Such walks are known to be ergodic, and their convergence rate can be calculated through the spectral properties of G.
To develop the rest of the theory, we focus on graphs with a single connected component containing all nodes N (a connected graph), and explain the more general case below (see Sec. 2.6). Defining the column matrix for the average employment in the firms of the system, one obtains the homogeneous system of equations where the right hand side is the column matrix of dimension N × 1 of zeros. The non-trivial solutions to this system, if they exist, depend on Λ being singular, which is valid in our case. Since the matrix for a connected graph has rank N − 1, its kernel is one-dimensional, and thus, to choose a unique solution that belongs to the kernel of Λ one needs a single additional condition. In our case, this condition corresponds to the total number of agents H in the system, i.e.
Application of (2.31), as illustrated below, leads to the desired unique solution.
Solving (2.30) and (2.31) in the general case does not produce compact solutions. However, it is possible to obtain some explicit solutions for simple cases, such as when the probability that a firm is open to hire is homogeneous over all nodes (v j = v for all j). Explicitly, note that in the homogeneous case Pr It is common in the networks and graph theory literature to use the notation k j = |Γ j |, and refer to k j as the degree of node j. Then, For the sum γj =∅ h γj Pr(γ j ), we note that each acceptance rate h i for i ∈ Γ j appears kj −1 |γj |−1 times among all the terms where there are |γ j | open neighbors to j. One can then write in the homogeneous case (2.33) where h Γj := i∈Γj h i /k j , i.e., the average hiring rate of the full neighbor set of j.
In this case, the matrix Λ takes on the form and in the very simple example where all h i are equal, Λ is equal to the usual normalized Laplacian for random walks on an unweighted graph. To refer to this model, we introduce the superscript (v) as a reminder that this quantity is now constant. By inspection, we can find a solution for X, which provides where ρ is a constant that can be obtained by imposing (2.31), and is given by This quantity has an intuitive interpretation, in that it captures the average flow rate of workers all through the system.
2.4. The agent perspective. The results from the previous section were derived with the population of agents in mind. In that context we showed that there are non-trivial solutions for L i in the model, and derived the equation that describes the system.
An alternative approach to the solution of the model is to consider the single agent perspective. This approach is a valid alternative to solve the model because agents are non-interacting, and therefore the dynamics of any one of them are sufficient to rederive the results above. In this section, we elaborate on this approach still within the context of connected graphs.
Taking the view of an individual agent, it is convenient to define the probabilities r(i, t) and s(i, t) that the agent would be, respectively, employed or unemployed at the node i at time t. These two probabilities, explained in detail below, satisfy the equations where the square brackets of the second equation can be simplified to 1− γi =∅ h γi Pr(γ i ). The first equation states that the probability for an agent to be at node i at time t is given by the probability to be at node i at time t − 1 and not become unemployed, plus the probability that the agent is unemployed at one of the neighbors of i, that i is accepting applications, that the agent choses to apply to i, and that the application by the agent leads to being hired. The second equation states that the probability to be unemployed at i at time t is given by the probability to be employed at i at time t − 1 and be separated with probability λ i , or to have been unemployed at time t − 1 at i but not find a job among the neighbors of i, either because they are all closed, or because the agent chooses to apply to one of the neighbors and is not hired.
The previous results lead to a set of difference equations that can be written as a matrix equation with block structure. In the steady state, this matrix equation is simplified because the conditions r(i, t) − r(i, t − 1) = 0 and s(i, t) − s(i, t − 1) = 0 are satisfied. Given that in the steady state r and s no longer depend on time, we write the equations for r(i, t) → r ∞ (i) and s(i, t) → s ∞ (i) in the steady state 41) which can be solved by first expressing s ∞ (i) in terms of r ∞ (i) , and substituting into (2.40) to produce The matrix in brackets is simply Λ defined in (2.22). As we have seen, the matrix does not have complete rank, guaranteeing the existence of non-trivial solutions. The steady state with homogeneous probability v i = v for firms to be open leads to solutions similar as those above for the entire population of agents, but with a different ρ which we relabel as χ, i.e., where the normalization condition is i [r(i) + s(i)] = 1 (independent of the steady state or the condition v i = v).
Once r ∞ (i) and s ∞ (i) have been determined for the model of interest (homogeneous or heterogeneous h, v, etc.), the number of employed and unemployed agents at firm i can then be computed via These expressions reproduce the results presented in the previous sections, and can also be used to calculate higher moments of the distributions on the basis of the steady state distributions for a single agent. For instance, the variance for L i and U i can be calculated via well-known expressions for binomial distributions, yielding From the practical standpoint, it is useful to realize that, if r ∞ (i), s ∞ (i) need to be estimated (say numerically), (2.47), (2.48), (2.49), (2.50), and other quantities that can be calculated as functions of r ∞ (i), s ∞ (i) become particularly useful because it is no longer necessary to try to solve (2.30) and (2.31) directly, which could be demanding for very large economies. Instead, estimates of r ∞ (i), s ∞ (i) could be utilized to arrive at meaningful results.
2.5. Employment tenure and unemployment spells. In our model, the mechanism for job separation is characterized by a geometric distribution. Hence, an agent employed in firm i has a probability λ i to be separated per time step. Therefore, the distribution Pr(t (l) ) of employment duration t (l) (also known as job tenure), is given by The average time of employment in firm i is given by A similar calculation provides us with the duration t (u) of unemployment spells. In particular, the probability for an agent to find a job among the neighbors of firm i (i.e., the effective rate of hiring) depends on ξ i := γi h γi Pr(γ i ), and therefore, the distribution of unemployment spells is given by In the case of homogeneous probability for firms to accept applications (v i = v for all i), unemployment spells are charaterized by ξ . This allows us to rewrite (2.36) and (2.45) in the more intuitive forms which also exposes the symmetrical nature of the values for average employment and unemployment as we see in detail in (2.57) below.
The characteristic times calculated above help us provide an intuitive understanding of (2.37) and (2.46). Specifically, note that the joint distribution of being employed for t (l) time steps and subsequently unemployed for t (u) time steps is distributed as the convolution of the two geometric distributions above, and the average time for this joint distribution is 1/λ i + 1/ξ i . From this, one realizes that the terms in brackets in the denominators of (2.37) and (2.46) correspond to the average durations of employment plus unemployment of agents at firm i. The factor h i h Γi k i corresponds to the probability to enter and exit each edge connected to i. Therefore, ρ measures the amount of overall job mobility in the entire economy, and χ corresponds to the per-agent ρ.
Expressions (2.35) and (2.55) are very similar, the only difference being the exchange of λ i and ξ (h) i . By taking the quotient of (2.35) and (2.55) we obtain In Sec. 3.4, we compare this result with empirical data (see Eq. (3.7)). This relation is potentially useful because (2.57) only involves model quantities that can be measured in the empirical data. Note that the ratio in (2.57) measures how much time an agent spends employed at firm i compared to the time the agent spends looking for a job among the neighbors of i.
2.6. Application of the model. An attempt to apply our model to real-world situations runs into the difficulty that data is not available for all parameters. In particular, we do not have data to determine the rates of opening of positions {v i } nor hiring rates {h i } of firms. These parameters, relevant from the economic standpoint as they allow calculation of endogenous effects for each firm, are not usually collected by statistical authorities. These difficulties, however, can be overcome by expressing the equations of the system in terms of available information.
As observed above in (2.28), Λ can be written in terms of outflows which can be directly measured from data, and then inserted into (2.30). We also require a method to express the uniqueness condition (2.31) in terms of the data. Note that (2.31), with the use of (2.13) and the definition of ξ i , can be rewritten as (2.58) where both λ i and ξ i can be measured via average employment and unemployment times of workers at firm i. Using (2.58), one can construct a modified matrixΛ where one of the rows from Λ is eliminated (any row) and substituted by a row based on (2.31). This leads to a non-homogeneous linear set of equations with a unique solution. One possible concrete form of this can be to eliminate the last row to generateΛ ij with the form WithΛ defined in this way, one can further introduce leading to the matrix equation where X is still the column matrix of employment sizes as in (2.30). As a final ingredient, we relax the condition that G is a connected graph, and allow for the presence of C disconnected components. We continue to assume that the structure of the components is generally non-trivial in real data, which is to say that in what follows we do not expand on the cases where G has isolated nodes or very small components (containing, say, only two nodes) where the behavior is trivially simple but requires more technicalities to be described with rigor. It is known that for a graph with C connected components, the rank of the adjacency matrix is N − C. This reflects the fact that the dynamics of walkers on each component runs independently of the other components due to the lack of connections among them. This reduced rank value corresponds to the need for C distinct conditions stemming from the number of workers on each isolated component. For a given initial distribution of such workers {H c } c=1,...,C where C c=1 H c = H over the components, one obtains a set of C conditions and X i = λ i L i , as before.
The previous comments regarding the number of connected components can be related to the dynamics of the individual agent treated in Sec. 2.4. It is clear that, in the case of C components, a prerequisite to analyzing the problem is to provide an initial condition that specifies the location of the agent. If the agent is placed at component c at time t = 0, solving for r ∞ (i) and s ∞ (i) provides information relevant to c only. If, on the other hand, we specify that at time t = 0 the agent can be found in component c with a probability H c /H, then we can develop analysis to describe the more general case of agents distributed across the graph.
In this section, we have developed one particular approach for solving (2.30), but other approaches are possible. Among those, one of the most general is to apply singular value decomposition to determine a decomposition for the space of solutions of Λ, including the kernel of the matrix, and then apply the uniqueness conditions. Ultimately, the application of a particular method is a practical matter that takes into consider aspects of the problem that may go beyond the theoretical ones.
3. Empirical analysis. The theory developed above rests on a set of assumptions described in the previous section; namely, the presence of a well defined network, a steady state flow of workers during significant periods of time, and a simple approximation to the possible behavior of workers as they leave firms and seek new employment. The aim of this section is to explore empirically the model, both by corroborating that the assumptions we make are close to the observed behavior of the system, and by studying some of the consequences of our model in terms of how well it predicts the statistical characteristics of real systems. We find that both our assumptions and the results that the model predicts fit the data well.
We now describe our approach in more detail. The first assumption employed is that the edges of the graph can reasonably be considered static. We test this by introducing the notion of persistence of the graph, i.e., the property that over successive time intervals, many edges produced by agent's job transitions between firms re-occur, indicating that the graph does not change over time in a random way, but instead exhibits a static behavior (at least partially). By restricting our model to static networks, our approach assumes labor markets are static, and although this is not entirely true, it appears that it is sufficiently true to capture the main behavior of the system as our analysis indicates 5 .
The second assumption tested is that the system is close to the steady state. To confirm this assumption, we study the histograms of agent in and out flows at firms, finding that, by far, the most typical situation is that these flows are virtually balanced for each firm, indicating that in fact the system is typically not growing or declining, but rather remaining steady on a firm by firm basis.
In order to characterize the agreement between empirical data and our model, we also measure the per-firm values of the separation rate λ i and the job finding rate ξ i . These two quantities, which we assume to be related to the firms rather than the individual agents, appear to be satisfactory quantities when comparing model and data.
As a way to illustrate the fact that our framework captures some of the relevant features of job mobility, we test the consequences of the homogeneous opening rates model (v i = v) against data. In particular, we study the ratio L i and U i , vs. the ratio of λ i and ξ i (in effect checking eq. (2.57)), and also whether L i and U i are consistent with the predicitions of (2.35) and (2.55). We find that indeed the model and data agree sufficiently to accept our approach as a plausible way to model job mobility.
We begin our detailed analysis with a description of the data we use, and then proceed to present the tests mentioned above.
3.1. The data. We use a high-resolution dataset and a support dataset where the information is more aggregated. The main dataset consists of employer-employee matched records at a daily resolution. It is a sample of ≈ 4 × 10 5 workers and ≈ 8 × 10 4 firms provided by the "Instituto Mexicano del Seguro Social" (Mexican Social Security Institute or IMSS). The workers were sampled from the universe of individuals who were registered at IMSS between 2000 and 2008 (all individuals who work in the private sector are registered at IMSS). Then, the complete employment history of each worker was extracted from the database (that includes any activity before 2000). The fraction of workers captured in this dataset is approximately 1% of the total workforce of Mexico in the private sector.
These employer-employee matched records are constructed in the following way. For each worker, every time there is a job transition between employment and unemployment (in either direction), the worker's record is updated: i) when hired into a firm, the record contains the day at which the worker starts employment and a unique identifier for the firm (consistent for all workers in the data set), and ii) when separated from a firm, the day in which this occurred. Note that the dataset does not track firms directly, and thus the only means of tracking them is through individuals in the dataset.
We employ the IMSS dataset for our main analysis due to its high resolution. The support dataset consists of employer-employee matched records from the universe of employed workers and firms in Finland, constructed from social security records, and provided by Statistics Finland. These records consist of annual observations that track each employed worker in the economy between 2000 and 2008. Each year contains approximately 200,000 firms and 1.5 million individuals.

Persistent flows.
There is some empirical evidence that, whenever a person leaves firm i and then gets a job at firm j, transitions between i and j (in either direction) are likely to be repeated in the future [24]. If indeed such transitions are repeated, we consider them to be persistent. We employ our datasets in order to measure persistence in both Mexico and Finland.
Let t and t+1 be the starting and ending times of a given period of job transitions, t + 1 to t + 2 a second period, etc. 6 We denote such time intervals with I(t; 1) := [t, t + 1] and use I(t) := I(t; 1) for simplicity. To construct a graph of firms and transitions based on empirical data, we proceed as follows: when we observe a worker transitioning from firm i to j or vice versa, we introduce an undirected edge between i and j; we do not consider weights, so once an edge has been created, additional i ↔ j transitions in the same period have no further consequences in the graph structure. In addition, our data allows us to observe firms that may not have had any incoming or outgoing job transitions, and these firms are encoded as isolated nodes. The graph G I(t) is constituted by all the edges occurring in the period I(t), the nodes to which those edges are incident, and the isolated nodes that display no transitions. To measure persistence, the relevant question is: how many edges in period I(t + 1) also occurred in the previous period I(t)? To assess this, we define the set of common edges between graphs G I(t) and G I(t+1) , where E(.) is set of edges of the argument graph. Then, |P E t |/|E I(t+1) | is the fraction of the edges E(G I(t+1) ) that are persistent. This concept of persistence captures repetition of job transitions. However, a random job search process can produce repeated job transitions by chance. Therefore, persistence is only meaningful to the extent that it occurs more frequently than what a random process would lead to. Furthermore, one should be able to define confidence intervals addressing whether the persistence found could emerge as a consequence of random fluctuations. A natural random (null) model one could use to compare persistence in real vs. random job search is to allow any individual looking for a job to apply and potentially fill any of the vacancies offered by the firms of the graph. In this model, firms have a defined number of vacancies and of job-seekers (both determined from our datasets) and individuals are allowed to apply and potentially be hired into any of those jobs (except the ones of its last employer). Below, we develop a set of statistical tests to determine confidence intervals for persistence using this approach, and apply it to the IMSS data from Mexico. Given that the absence of an edge is potentially meaningful because it may signal a genuine lack of affinity between firms that never connect, we perform additional confidence testing to take this into account. We find that there is a large degree of confidence that persistence is indeed present, and that adding tests to account for lack of connections only increases the confidence levels. We should briefly mention here that our null model is indeed an appropriate test to compare to current economic thinking, which assumes aggregate matching processes that ignore firms and their contributions into the heterogeneity of the real job transition process.
3.2.1. Hypothesis testing for persistence. The null models are constructed independently for every pair of consecutive time intervals. For one such pair of time intervals, we first determine for each firm i and time interval I(t) or I(t+1) the number of hires into i coming from other firms (η .i (t) and η .i (t+1)) and job separations (∆ u,i (t) and ∆ u,i (t1)) that occur. For each of the intervals, for instance I(t), a random job transition graph G (r) I(t) is built by taking for each node the number η .i (t) as vacancies that need to be filled by i, and ∆ u,i (t) as the number of individuals that leave i and seek other jobs. With these two number as constraints for every node, the vacancies and job-seekers are then randomly matched over the entire set of nodes, forbidding jobseekers to go back to their previous employer. This approach is basically equivalent to a random configuration model (for a review, see [26]). A number M = 300 of such random realizations is computed for each interval I(t), generating an ensemble of random graphs. We use this ensemble to obtain the distribution of the statistic where P E ′ and P E are defined via (3.1), with P E ′ representing the fraction of persistent edges between random graph samples, and P E the fraction of persistent edges between the corresponding empirical graphs. There are M 2 values of P E ′ generated by our procedure. The statistic ψ t measures the extent to which the global random matching mechanism explains the observed transitions over the ensemble built for multiple pairs of years covering an overall span of 8 years.
As mentioned above, the absence of an edge can contain relevant information about the lack of affinity between pairs of nodes. In the economics literature this would be thought of as a friction. Therefore, if the global search model explains both the observed persistence as well as the persistent lack of labor flows between firms, one would require an additional statistic to capture the persistence of lack of connections. These distributions were generated from a Monte Carlo procedure. As the probability of local search q increases, the distribution of ψ shifts to the right. When ψ falls inside the distribution (e.g., bellow the 90 th percentile) it means that the corresponding level of q is enough to not reject the null model. This level is approximately q = 0.5 for ψ and q = 0.8 for ̺. Bottom: Rejection zones. The panels show the levels of q for which the null hypothesis is rejected. The black area represents the case in which the null model is rejected for both ψ and ̺. For a q in the gray zone, the null model is rejected only for ψ. In the white area the null model explains the empirical levels of both ψ and ̺. Synthetic distributions were created for values of q ∈ [0, 1] equally spaced by 0.1.
We therefore define where P F denotes the set of persistent frictions (the pairs of firms that are not connected in the network). Using Monte Carlo simulation, we performed a one-sided test for ψ and ̺. The null hypothesis is that, under a global search process, we would expect ψ = 1 and ̺ = 1. The global search hypothesis was rejected with 99% confidence in both cases. For an illustration, the top panel in figure 1 shows the probability distribution of ψ (the one on the far left) generated form the Monte Carlo procedure. Clearly, the confidence interval of the distribution is far bellow 1 (the mean is ψ = 0.001), implying that the global search fails to explain the persistent labor flows between firms.

A tunable model for the contribution of persistence.
The global search mechanism fails to explain both the persistent edges and frictions across the eight years of data. This is consistent with intuition, given the large space of possible matches that can emerge when all job vacancies are accessible to all job-seekers. If in reality, as our results suggest, job-seekers use a subset of possible job transitions, the matching mechanism of our null model should restrict the job search. Therefore, we introduce an additional mechanism: with probability q a job seeker searches through the graph and with probability 1 − q searches globally. Clearly, when q → 1 we obtain the mechanism proposed in section 2 and when q → 0, the search is global over all firms.
With the local search mechanism in place, we need an additional assumption for the null model. Consider the null networks G I(t) and G I(t+1) with corresponding sets of edges E(G I(t) ) and E(G I(t+1) ). Since we are in a steady state, it is reasonable to assume that any edge in G I(t) can also exist in G I(t+1) (and vice versa), even if it is not observed in the data. Then, when a worker searches locally under the null model, it does so by using the network G * t , such that E * t = E(G I(t) ) ∪ E(G I(t+1) ). This assumption captures the time-invariant aspect of the steady state, and allows ̺ to take values higher than 1.
We compute the null model for different levels of q, so we can answer the question: what is the minimum q needed to generate at least the level of persistence observed in empirical data. First, we randomize the matches between job seekers and vacancies, generating new datasets. Then, we construct null networks from these datasets. Next, we compute (3.2) and (3.3) for each pair of null networks to generate their distributions. Finally, we use these distributions to perform a one-sided test with for each statistic. If the statistic falls beyond the 90 th percentile, the null model does not explain the persistence of edges or frictions. When we find a q such that the statistic is below the 90 th percentile we cannot reject the null model. The smallest q under which we cannot reject the null model for neither ψ nor ̺ is an indicator of at least how frequently people should search locally in a model in order to explain the structure of the empirical data. Figure 1 shows the results from this analysis. In general, a higher q is needed to explain both persistent edges and frictions than just edges. An approximate estimate suggests, in order to explain empirical persistence, a job-seeker needs to search on the network at least 75% of the time. This result is consistent across both datasets and strongly suggests that the network approach is much more empirically relevant than the global search one, providing a solid motivation for the model developed in this paper.

Model validation.
In this article we concentrate on the steady state behavior of the system. In order to validate this choice, we first study the distribution of ∆ l,i − |η i | from the data. From this point on, we concentrate on the IMSS data since its daily resolution allows us to identify the duration of employment and unemployment spells of each individual, which is crucial to our analysis. Then, if the system is close to the steady state, the distribution of ∆ l,i − |η i | should be concentrated around 0. Figure 2 corresponds to the distribution of average agent daily flows over the period of 1 year into and out of a firm, with a pronounced peak around zero, which corresponds to our intuition. The averages have been taken by using the periods of observations of the workers associated with firms.
Next, we determine the rates of separation and hiring. To estimate the values of separation rates, we proceed by tracking all employees of a firm that are observed to enter and exit that firm. Separation of an agent from a job is characterized by (2.51). In order to estimate λ i for a firm, we perform a maximum likelihood estimation. For a sample of agents of size S i log λ i and the maximum likelihood (ML) estimator is the value of λ i that maximizes (3.4). The effective rate of hiring ξ i can be estimated in the same way, with the ML estimator given by The measurements ofλ i andξ i can be studied via their distributions, as shown in Figs. 3(a) and (b). For the distribution ofλ i , the sample of individuals was restricted to those who began and ended their tenure of employment within the time frame of the data; similar considerations were applied to the distribution ofξ i , restricting the sample to individuals that become unemployed and subsequently found employment during the window of observation. The distributions ofλ i andξ i both exhibit decaying heavy-tails, indicating a wide variation in the rates of agent separation or hiring. 3.4. An illustration: homogeneous opening rates. The analysis presented above supports a picture of considerable heterogeneity in real economic systems. Therefore, a full treatment of the data is likely to require detailed application of our model, accompanied by robust statistical analysis that is yet to be fully developed.
However, for the purposes of illustration in this article, it is useful to perform some basic comparisons between the data and some version of our model. Given the absence of information for {v i } and {h i }, it seems reasonable to compare a model that simplifies at least one of these parameters while assuming the other continues to be heterogeneous. This provides some flexibility so that the model is able to cope with at least some level of complexity from the real data. Therefore, we chose to compare the data with the model characterized by homogeneous rates v i = v for firms to accept applications (opening rates). We find that, even for this simple case, there is evidence to support the plausibility of our approach.
As a first test, we explore the ratio (2.57), which is convenient because it only contains directly measured parameters. Note that here, since all the parameters emerge from measurement, we are not concerned with using the superindex (v) to symbolize the homogeneity in v. Using the dataset from Mexico, we estimated L i and U i for 2008. In order to assure independence across the errors, we estimateξ i andλ i from observations of employment and unemployment spells that concluded at least three years prior to 2008. We excluded firms for which U i = 0 and estimateα andβ defined by Due to the large variance heterogeneity in the data, we make use of the random re-sample consensus algorithm (RANSAC) [25] in order to estimate α and β. The algorithm randomly samples the data in order to discriminate the outliers and fit (3.7) via OLS to the in-liers iteratively. Since the RANSAC algorithm is non-deterministic, the estimators vary from run to run. In order to illustrate the coherence of the model, we performed 10,000 estimations using this procedure and analyzed the distribution ofα andβ. Figure 4 shows the histogram of the estimatorβ. The average β is 0.98, while the most frequent is 1.0031. The average estimatorα of the intercept is 1.1425 ± 0.0007. These results are quite close to the theoretical prediction of (2.57).
To perform a second test, we consider whether (2.35) and (2.55) may be consistent with the data. For this, we concentrate on two conditional probabilities: i) Pr(L i |k i /λ i ) for the number of employed individuals at a firm, given the firm is characaterized by the ratio k i /λ i , and ii) Pr(U i |k i /ξ i ) for the number of unemployed individuals at a firm, given the firm is characterized by the ratio k i /ξ i . In particular, we want to learn whether the basic predictions contained in (2.35) and (2.55) are satisfied, i.e., that L i ∼ k i /λ i and U i ∼ k i /ξ i .
In Figs. 5 and 6, we present contour and 3-dimensional plots of log 10 Pr(L i |k i /λ i )/Pr(L * i |k i /λ i ) , and log 10 Pr(U i |k i /ξ i )/Pr(U * i |k i /ξ i ) , respectively. Here, Pr(L * i |k i /λ i ) and Pr(U * i |k i /ξ i ) correspond to the probabilities associated with the conditional modes of L i and U i . The reason to plot the ratios just defined is that (2.35) and (2.55) are concerned with averages rather than distributions, and we therefore must devise a way to relate the empirical analysis with our predictions. To interpret the plots, we introduce a line of slope 1 (linear relation) in Figs. 5(a) and 6(a). Such lines, by definition, scale as k i /λ i and k i /ξ i . The relevant feature that the plots show is that these lines runs parallel to the contour for the largest value of Pr(L i |k i /λ i ) and Pr(U i |k i /ξ i ), or in other words, L * i ∼ k i /λ i and U * i ∼ k i /ξ i . Figures 5(b) and 6(b), showing in 3-dimensions the surface of the logarithm of the distribution ratios, reinforce our interpretation: in these plots the location of the maxima for the surfaces is cut by the planes that have been constructed to coincide with the linear maps of Figs. 5(a) and 6(a). These relations hold for small and intermediate values of k i /λ i and k i /ξ i , but eventually fail for the largest values of both ratios, probably due to poorer sampling at such large values of k i /λ i and k i /ξ i .
The results presented in this section focus on a very simple comparison and, notwithstanding the partial differences we encounter between our equations and the measurements, support the plausibility of our model as a way to explain job mobility.

Conclusion.
Detailed high resolution data on employment at large scale is becoming rapidly available, and this provides an opportunity to revisit the way in which job mobility and labor flows are studied. In particular, it makes it possible to move away from aggregate models that, while having been very useful, have been unable to address some important outstanding problems, such as the construction of realistic shock scenarios, which are necessary if one is to attempt to design realtime forecasting models of high resolution employment flow. This task, which has not yet been possible, may be within our reach for the first time, with considerable potential value for economic policy design that is well grounded empirically and for which impacts can be forecast in great detail.
In this manuscript, we have introduced a new basic framework that takes into account the role of firms in employment, and makes extensive use of real data. By performing a number of tests, we have been able to see that indeed the model behaves in similar ways to the data. Furthermore, we have provided the basic ingredients for algorithms to calculate the average numbers of employed and unemployed agents associated with a firm. The notion of firm specific unemployment, which we have introduced here, is a new concept that allows us to keep track of the information that is implicitly contained in the fact that an agent has held a job in a certain firm, indicating that agent's affinity to some firms but not others of the economy.
An interesting consequence arising from (2.57) is that in the steady state the numbers of employed and unemployed agents of a firm are not independent of each other and therefore, firms that have large numbers of employees could contribute large numbers of unemployed people if the ratio between the average times of employment and post-employment search is low. This is a question worth further exploration.
Finally, our introduction of a framework based on random walks on graphs to study job mobility can be a useful development. Random walks on graphs have a considerably long history, and a great deal is known about them (see, e.g. [10] for a review). Being able to deploy such a toolkit on questions regarding employment may lead to new results with potential academic and practical impact.

Acknowledgements.
We thank Andrew Elliott, José Javier Ramasco, Felix Reed-Tsochas, Gesine Reinert, Jari Saramäki, and Margaret Stevens for helpful discussions about the research and manuscript.