2.1 Data collection
The present study is part of a wider project on Child and Youth Development Study funded by UNICEF Malawi (Leal Neto et al. [30]). The data collection was conducted between 16th December 2019 and 10th January 2020 in Mdoliro village in Dowa district in the Central Region of Malawi. Mdoliro is a small village with an estimated 2019 population of 147 distributed over 32 households (average household size is 4.5). The majority of the inhabitants were Christians, and the Chewas are the main ethnic group. Farming is the major source of income.
The data were obtained and processed using a proximity-sensing application previously used to measure individuals social contacts in a variety of real-world settings as hospital wards (Isella et al. [21]; Voirin et al. [46]), schools (Stehlé et al. [45]), social events (Smieszek, et al. [44]), households (Ozella et al. [38]; Kiti et al. [28]), and more recently it was used to detect proximity events between animals (e.g., Wilson-Aggarwal et al. [47]). This technology is based on wearable proximity sensors that exchange ultra–low power radio packets and the use of the sensors to detect face-to face interactions is described in detail in Cattuto et al. [6]. Sensors in close proximity exchange with one another a maximum of about 1 power packet per second, and the exchange of low-power radio-packets is used as a proxy for the spatial proximity of the individuals wearing the sensors (Cattuto et al. [6]). To estimate how close individuals are, the attenuation of the signals with distance is computed as the difference between the received and transmitted power. Proximity between individuals is asserted when the median attenuation over a given time interval exceeds a specified attenuation threshold (in dBm). In this study, we set the attenuation threshold at −75 dBm, this threshold was already used in previous deployments (e.g., Ozella et al. [37]) and allowed the detection of proximity events between devices situated in the range 1–1.5 m of one another. This distance between individuals allows detection of a close-contact situation during which social interactions might occur and a communicable disease infection might be directly transmitted. A ‘contact event’ between two individuals was identified when the devices exchanged at least one radio packet during a time interval of 20 sec. After a contact is established, it is considered ongoing as long as the devices continue to exchange at least one radio packet for every subsequent 20 s interval. Conversely, a contact was considered broken if a 20 s interval elapses with no exchange of radio packets (Stehlé et al. [45]; Kiti et al. [27]). Each device has a unique identification (ID) number that is used to link the information on the contacts established by the individual carrying the device. For the present study, the technology was operated in a distributed fashion: contact data were stored in the local memory of individual devices. After collecting the devices at the end of the study, data from individual devices were downloaded, and the (temporal) contact networks recorded by individual devices were combined together to build a time-resolved proximity graph. In addition to contact information, each device periodically logs its orientation in space as measured by a tri-axial accelerometer.
The participants wore a sensor enclosed in a pouch and pinned to the front of a blouse/shirt in order to detect close-range proximity. The low-power radio frequency in use cannot propagate through the human body, and the position of the sensor favours capturing face-to-face interactions. In addition, metadata on individuals were collected, i.e., gender, age and which household they belonged to, through the use of the app Survey CTO. A household was defined as the group of people living in the same house and eating from the same kitchen (Hosegood and Timaeus [20]). Participants were grouped into three age-categories: <10 years old (children), 11–18 years old (adolescents), and >18 years old (adults). Training sessions were conducted with Health Surveillance Assistants (HSAs) and volunteers in the use of sensors and how participants should have worn them over the study period, and HSAs visited the village in order to check if participants were wearing the sensors properly. The Ministry of Health (MOH) defines an HSA as a primary healthcare worker serving as a link between a health facility and the community (Chikaphupha et al. [9]). HSAs’ tasks include community health, family health, environmental health, prevention and control of communicable diseases, and community case management. Specifically, two HSAs were involved in the study: one of them lived in Mdoliro (i.e., one adult participant that had also the role of HSA), and one HSA was external from the village’s population, and visited the village at least once a week.
2.2 Ethical aspects
Only participants who gave their written consent (documented) were included in the research. In the case of children, consent was obtained from their guardians. In the case of adolescents, consent was obtained from both themselves and their guardians. The study was approved by Ethical Committee at the University of Zurich (OEC IRB #2018-046) and Ethical Committee at College of Medicine in Malawi (P.10/19/2825).
2.3 Data analysis
The proximity data were extracted from devices and cleaned by identifying anomalies in the recorded data that might point to sensors that were tampered with or suffered hardware/battery issues resulting in data loss or low-quality data. Participants were asked to remove the sensor overnight. Night contacts were disregarded from the analysis by using the tri-axial accelerometer data to identify the time periods during which the sensor did not move. This also allowed us to identify the time periods during which the sensors were not worn by the participants. This data was also disregarded from the analyses.
2.4 Network analysis and contact matrices
For each participant in the study, we computed the number of contact events and the duration of each contact. Time-aggregated, weighted contact networks were generated: nodes correspond to individuals, an edge between two individuals indicates that at least one contact event involving those individuals was recorded during the whole experimental period. The weight \(w_{ij}\) of an edge between nodes i and j is defined as the cumulative duration of the contact events recorded between those individuals. Network edges are undirected and the weights on the edges are regarded as symmetric (\(w_{ij} = w_{ji}\)). The degree \(k_{i}\) of a node i in the above network corresponds to the number of distinct individuals with whom individual i has been in contact. Intra-household and inter-household contact matrices were generated based on the daily number and on the daily duration of contacts by age-category. We aimed to obtain the daily mean of the contact durations and the daily mean of contact events per capita for each pair of age-categories. To obtain the intra-household matrices, we divided the total contact durations and the total number of contacts by the days during which the family members wore sensors simultaneously (days of overlap), and by the number of persons belonging to the two age-categories (a and b):
$$\begin{aligned}& \text{daily mean contact duration} (a,b) = \frac{( \frac{\text{total contact durations } (a,b)}{\text{days of overlap }} )}{ ( {\text{age} - \text{category} (a) + \text{age}-\text{category}(b)} / {2)}}, \\& \text{daily mean contact events } (a ,b ) = \frac{( \frac{ {total number of contact events } (a ,b )}{\text{days of overlap}} )}{( {\text{age}-\text{category} (a ) + \text{age}-\text{category}(b )} / {2)}}. \end{aligned}$$
To obtain the inter-household matrices, we divided the total contact durations and the total number of contacts by the days during which all the participants wore the sensors simultaneously. The observed values were compared to those obtained by a null model: we shuffled the nodes attributes of the intra- and inter-household edge lists and we computed for those realizations the daily number and the daily duration of contacts by age-category.
2.5 Community detection
Community detection seeks to describe the large-scale structure of a network by dividing its nodes into communities (or groups), based only on the pattern of links among those nodes (Contisciani et al. [10]). Nodes belonging to communities are more highly connected to each other than to the rest of the network and probably share common properties. We used the Louvain algorithm (Blondel et al. [2]) to identify community structure in the aggregated networks. The modularity is a measure of the structure of networks which measures the strength of division of a network into communities, and this method maximizes a modularity score for each community. The algorithm assesses how much more closely connected the nodes within a group are, compared to how connected they would be in a random network (Borgatti et al. [3]; Lu et al. [32]). We used the Normalized Mutual Information (NMI) score to test the relationship between the community membership of participants and their attributes (i.e., gender, age-category and household membership). We aimed to evaluate if the communities detected by the algorithm corresponded to participants’ attributes. NMI score ranges between 0 (no mutual information) and 1 (perfect correlation). NMI scores closer to 1 imply a greater overlap between community membership and attributes.
2.6 Assortativity
We studied the assortativity (i.e., the tendency of the individuals to associate with individuals of similar characteristics) in the gender and the age-category. Our aim was to understand if individuals with the same gender and the same age-category will be more likely to interact, and if there are differences between intra and inter-households contacts. We computed the number of contacts and the total time in contacts (weights) between individuals in aggregated observed networks. We compared the values obtained by the observed networks, with those obtained from a null model. We created an ensemble of realizations of a null model by shuffling nodes’ attributes and computing for those realizations the assortativity. Then, we compared the empirical results with the distribution of values obtained from the null model using the z-test.
2.7 Daily activity profiles
We studied the daily activity profile of contacts among individuals, extracting the probability of observing a contact as a function of the time along the day. We computed these activity profiles for each household, split in intra-household and inter-household contacts. Additionally, we create two aggregated datasets that join all the timestamps of the observed contacts: the aggregated data for intra-household contacts and the aggregated data for inter-household contacts. We computed the Kolmogorov–Smirnov statistical distance \(d_{\mathrm{KS}}\) between the aggregated data of all the households and the data observed for each household (for intra-household daily activity profile), and of all individuals and the data observed for each individual (for inter-households daily activity profile). The Kolmogorov–Smirnov distance between two probability distributions i and j depending on time τ is the maximum, among all the times, of the absolute difference between the cumulative distribution functions \(C_{i}(\tau )\) and \(C_{j}(\tau )\):
$$ d_{ ks} ( i,j ) = \max_{\tau } \bigl\vert C_{i} ( \tau ) - C_{j} ( \tau ) \bigr\vert . $$
This distance is bounded between 0, when the two compared distributions are identical, and 1, when the overlap between both distributions is null. In this particular case, we studied the distributions arising from the empirically observed contacts. The times of the contacts of each household (for intra-household data) and of each individual (for inter-households data) i are described by the cumulative distribution function \(C_{i}\) given by
$$ C_{i} ( \tau ) = 1- \frac{ \sum_{t< \tau } N_{i} (t)}{\sum_{t} N_{i} (t)}, $$
where \(N_{i}(t)\) is the number of observed contacts at time t, computed identically for the aggregated data. We considered a range of times of 24 h duration. We observe that the periodicity of this range may influence the measurement of dKS. For example, if we consider the origin of times at midnight and there are contacts between 23:00 and 01:00 in the aggregated dataset, but there is no contact at one specific household along this range, the maximum difference between the cumulatives would be influenced by the origin of times, such that this difference would be lower if the origin of times is included along the observed range. To tackle this issue, we generated the origin of times as a uniform random number between 0 and 24 h. We generated, for each pair of distributions, 100 different origins of times, computed the Kolmogorov–Smirnov distance for each origin of times, and the considered \(d_{\mathrm{KS}}\) was the minimum among all these samples.