### 2.1 Social Contact Survey data

A cross-sectional study was conducted between May 2009 and October 2010, recruiting households and individuals through postal and online questionnaires, supported by a large random-address mailshot and a modest online and media promotion [30, 31]. Questionnaires asked respondents to report on the number of distinct individuals they encountered the previous day: their contacts. Respondents were able to report contacts either as individuals or as members of a group with a reported size. Allowing the reporting of groups of individuals was a deliberate methodological design to permit the easy reporting of large numbers of contacts, to avoid the approach taken by previous studies [11], which imposed a high burden on respondents with large number of contacts, and to ensure the best capture of the right-hand tail of the degree distribution. In general, we expect that such data will become increasingly available due to the epidemiological importance of this tail (e.g. the study of Read et al. [21]).

In total, completed questionnaires were received from 5,388 participants in Great Britain, 3,901 of which were from postal surveys. There was some bias in demographical representation, most notably younger age groups and males were generally under-represented (see Danon et al. [31] for more details). The data is available at http://wrap.warwick.ac.uk/54273/.

### 2.2 Generalised preferential attachment

As noted by Durrett [32], Barabási [28], and Simkin and Roychowdhury [33], the basic idea behind the preferential attachment model is close to the population model of Yule [34]. We consider a Yule-like stochastic process described precisely as follows. In a population of *N* individuals indexed by *i* each individual has an integer-valued random variable \(K_{i}(t)\) for its number of contacts. Individual *i* starts with \(K_{i}(0)=0\) and makes new contacts over a time period \(T_{i}\), which is given by a positive real-valued random variable with probability density function \(\rho(t)\). The generation of new social contacts is modelled by a continuous-time Markov chain with the following events and rates:

$$ K_{i} \rightarrow K_{i} + 1 \quad\text{at rate } f_{K_{i}} := 1 + \tau K_{i} . $$

(3)

We take the preferential attachment hypothesis PA to be stated mathematically as

$$ \text{PA} \Leftrightarrow\tau>0 . $$

(4)

Writing \(p_{k}(t)\) for the probability that \(K_{i}=k\) at time \(t< T_{i}\), we can use the method of characteristics to derive an expression for the probability generating function of \(K_{i}\),

$$ g(t,s) = \sum_{k} p_{k}(t) s^{k} = \bigl(s - (s-1) \mathrm{e}^{\tau t} \bigr)^{-1/\tau} ,\quad s \in[0,1] . $$

(5)

From this, we can derive expressions for the probability mass function,

$$\begin{aligned} p_{k}(t) =& \frac{1}{k!} \frac{\partial^{k} g}{\partial s^{k}} \bigg|_{s=0} = \frac{\Gamma(k+\frac{1}{\tau})}{\tau\Gamma(\frac{1}{\tau })\Gamma(k+1)} \mathrm{e}^{-t}\bigl( \mathrm{e}^{\tau t} - 1\bigr)^{k} \\ \rightarrow& \kappa_{t} \bigl(\mathrm{e}^{\tau t} - 1 \bigr)^{k} k^{{(1-\tau)}/{\tau}} , \end{aligned}$$

(6)

where \(\kappa_{t}\) is a function of *t* but not *k* and the asymptote holds as *k* becomes large. This is not a simple power-law relationship, and so the asymptotic behaviour of the moments is not determined by the power-law exponent, but rather through the moment generating function \(M(t,z) = g(t,\mathrm{e}^{z})\), \(z\in(-\infty,0]\), such that the *r*th moment of the degree distribution, conditional on \(t< T\), is

$$ m_{r}(t) = \frac{\partial^{r} M}{\partial z^{r}} \bigg|_{z=0} . $$

(7)

In particular,

$$ m_{1}(t) = \frac{1}{\tau}\bigl(\mathrm{e}^{\tau t} -1\bigr) , \quad\quad m_{2}(t) = m_{1}(t) + (\tau+1) \bigl(m_{1}(t)\bigr)^{2} ,\quad\quad \ldots. $$

(8)

Then accounting for the randomness of the times, the *r*th moment of the degree distribution will be

$$ \overline{m}_{r} = \int_{t=0}^{\infty} \rho(t) m_{r}(t)\,\mathrm{d}t . $$

(9)

We will be interested in the empirical evidence for whether such moments converge or diverge, in light of the epidemiological relationship (1).

### 2.3 Phase-type holding times

The question is then posed as to an appropriate distribution from which to draw the holding times \(\{T_{i}\}\) for the amount of time spent making new contacts on the day for which individuals provide data. In previous work [30] on a related model of contact formation we considered holding times \(T_{i}\) that were log-normally distributed. This provided a good fit to data, but was computationally intensive and lacked a mechanistic interpretation. We therefore consider here a class of distributions for the holding times that is highly flexible, but which has analytic and numerical benefits - the distributions of *phase type* [35]. Phase-type distributions are dense in the space of positive-valued probability distributions [36], meaning that they can be made arbitrarily close to any other distribution. They have a mechanistic interpretation and allow for analytic manipulations that greatly reduce the numerical cost of likelihood evaluation.

The basic idea behind the model is shown in Figure 1. A set of phases is indexed by \(a,b=1,\ldots, m\); the probability of starting in phase *a* is \(\nu_{a}\) (meaning these parameters must sum to unity); the rate of stopping making new social contacts is \(\mu_{a}\) for an individual in phase *a*; and the rate of moving from phase *a* to phase *b* is \(Q_{a,b}\). Note that different rates of making contacts in different phases are not realistically distinguishable from different times spent and so are not included as parameters. The phases have a mechanistic interpretation as the activities that individuals undertake on a given day.

In this model, the probability density function for the holding time is given by the general expression

$$ \rho(t) = \boldsymbol{\mu}^{\top} \mathrm{e}^{\mathbf{M} t} \boldsymbol{\nu} , $$

(10)

where:

$$ \begin{aligned} &\boldsymbol{\mu} = (\mu_{a});\quad\quad \boldsymbol{\nu} = (\nu_{a}) ; \\ & M_{a,a} = -M_{a} ;\quad\quad M_{a} =\mu_{a} + \sum_{b} Q_{b,a} ;\quad\quad M_{a,b\neq a} = Q_{b,a} . \end{aligned} $$

(11)

From the expressions (5), (7), (8) and (9) above, in particular through inspection of the form of the moment generating function, it is clear that the *r*th moment of the degree distribution will involve a term like

$$ \mathcal{I}_{r} = \int_{t=0}^{\infty} \mathrm{e}^{r\tau t} \rho(t) \,\mathrm{d}t = \boldsymbol{\mu}^{\top} \int_{t=0}^{\infty} \mathrm{e}^{(\mathbf{M} + r\tau\mathbf{I})t} \,\mathrm{d}t \boldsymbol {\nu} , $$

(12)

where **I** is the identity matrix. Let \(\mathbf{A} = \mathbf {M} + r\tau \mathbf{I}\); this matrix is triangular and so its eigenvalues are equal to its diagonal elements; in particular the *a*th eigenvalue of **A** is \(\lambda_{a} = -M_{a} + r \tau\). If we let **R** be a matrix whose *a*th column is the *a*th right eigenvector of **A** and **L** be a matrix whose *a*th row is the *a*th left eigenvector of **A** then

$$ \mathcal{I}_{r} = \boldsymbol{\mu}^{\top} \int _{t=0}^{\infty} \mathbf{L}^{-1} \mathbf{L} \mathrm{e}^{\mathbf{A}t} \mathbf{R} \mathbf{R}^{-1} \,\mathrm{d}t \boldsymbol{\nu} = \boldsymbol{\mu}^{\top} \mathbf{L}^{-1} \int _{t=0}^{\infty} \mathbf{D} \,\mathrm{d}t \mathbf{R}^{-1}\boldsymbol{\nu} , $$

(13)

where **D** is a diagonal matrix whose *a*th diagonal element is \(\mathrm{e}^{\lambda_{a} t}\). The integral \(\mathcal{I}_{r}\) therefore converges exactly when ∀*a*, \(\lambda_{a}<0\), which implies that the condition for divergence of the *r*th moment is

$$ \overline{m}_{r} \text{ diverges}\quad \Leftrightarrow\quad \exists a \text{ such that } \tau> M_{a}/r . $$

(14)

In general, however, combination of (10) and (6) is not the most numerically efficient method for calculation of the overall probability mass function for final number of contacts \(K_{i}(T_{i})\) and a different approach is needed.

### 2.4 Numerically efficient model solution

The model as described above can be solved in a numerically efficient manner using the spectral methods for continuous-time Markov chains developed by Bailey [37]. We consider the limit as the population size \(N\rightarrow\infty\) and write down ordinary differential equations (ODEs) for the proportion of the population in phase *a* and with *k* social contacts at time *t*, \(p_{a,k}(t)\). These ODEs take the form

$$ \frac{\mathrm{d}}{\mathrm{d}t}p_{a,k} = - \biggl(f_{k} + \mu_{a} + \sum_{b>a}Q_{a,b} \biggr) p_{a,k} + f_{k-1} p_{a,k-1} + \sum _{b< a} Q_{b,a}p_{b,k} , $$

(15)

where \(f_{k}\) is the rate at which individuals with *k* social contacts make new contacts, given in (3). We are then interested in \(d_{k}\), the probability mass function for a randomly selected individual having made *k* social contacts by the end of the process. A series of manipulations directly analogous to those of Bailey [37] shows that

$$ d_{k} = \lim_{s\downarrow0} \sum _{a} \mu_{a} \int_{0}^{\infty} \mathrm{e}^{-st} {p}_{a,k}(t)\, \mathrm{d}t =: \sum _{a} \mu_{a} A_{a,k} . $$

(16)

Applying Laplace transformation to (15) subject to the initial condition \(p_{a,k}(0) = \nu_{a} \delta_{k,0}\) and taking the frequency-space limit \(s\downarrow0\) then leads to a set of linear equations for \(d_{k}\) that are triangular and so can be evaluated directly without numerically costly matrix inversion:

$$ \nu_{a} \delta_{k,0} = - \biggl(f_{k} +\mu_{a} + \sum_{b>a}Q_{a,b} \biggr) A_{a,k} + f_{k-1} A_{a,k-1} + \sum _{b< a} Q_{b,a}A_{b,k} . $$

(17)

These equations are at the root of the numerical efficiency of our model. Note that we use Laplace transformation mainly for technical reasons and our results could be obtained by directly integrating (15) if one were not concerned by all quantities being rigorously defined.

### 2.5 Model likelihood, fitting and selection

We assume a vector of data \(\mathbf{y}=(y_{k})\), where \(y_{k}\) is the number of individuals reporting *k* social contacts when surveyed. A model \(\mathcal{M}\) is therefore specified by a number of phases *m* and the presence or absence of PA, meaning the general parameters are \(\boldsymbol{\theta} = (\tau,\nu_{a},\mu_{a},Q_{a,b})\), with *τ* present only if there is PA. The number *n* of individuals sampled from the British population *N* is

$$ n = \sum_{k} y_{k} = 5388 \ll N \gtrsim6 \times10^{7} , $$

(18)

and so it is appropriate to assume that each individual picks a number of contacts independently from the distribution with pmf given by \(d_{k}\) as in (16). Accounting for the censoring of zero contacts in the real data, we define

$$ \tilde{d}_{0} = 0 ,\quad\quad \tilde{d}_{k>0} = \frac{d_{k}}{1-d_{0}} , $$

(19)

meaning that the overall likelihood function is then given by

$$ L(\mathbf{y} | \boldsymbol{\theta}) = \frac{n!}{\prod_{l} y_{l} !} \prod _{k} \bigl(\tilde{d}_{k}(\boldsymbol{ \theta})\bigr)^{y_{k}} . $$

(20)

Note that the combinatorial factors do not depend on the parameters, and so need not be calculated during model fitting.

We consider the use of the likelihood function (20) using standard statistical methodology. Numerical maximum likelihood estimation was performed using simulated annealing run from multiple starting points to ensure the global optimum was obtained. Model selection was performed using AIC [38] and BIC [39], as well as likelihood ratio tests [40] on pairs of models where this test was informative. This was done since each approach involves different trade-offs between model fit and complexity, and to check that our conclusions about PA are not overly sensitive to the precise method used. Uncertainty in model parameters was quantified using confidence intervals obtained through bootstrapping the data, and uncertainty in model outputs such as the predicted degree distribution was quantified using a parametric bootstrap.