Quantifying social contacts in a household setting of rural Kenya using wearable proximity sensors

Close proximity interactions between individuals influence how infections spread. Quantifying close contacts in developing world settings, where such data is sparse yet disease burden is high, can provide insights into the design of intervention strategies such as vaccination. Recent technological advances have enabled collection of time-resolved face-to-face human contact data using radio frequency proximity sensors. The acceptability and practicalities of using proximity devices within the developing country setting have not been investigated. We present and analyse data arising from a prospective study of 5 households in rural Kenya, followed through 3 consecutive days. Pre-study focus group discussions with key community groups were held. All residents of selected households carried wearable proximity sensors to collect data on their close (<1.5 metres) interactions. Data collection for residents of three of the 5 households was contemporaneous. Contact matrices and temporal networks for 75 individuals are defined and mixing patterns by age and time of day in household contacts determined. Our study demonstrates the stability of numbers and durations of contacts across days. The contact durations followed a broad distribution consistent with data from other settings. Contacts within households occur mainly among children and between children and adults, and are characterised by daily regular peaks in the morning, midday and evening. Inter-household contacts are between adults and more sporadic when measured over several days. Community feedback indicated privacy as a major concern especially regarding perceptions of non-participants, and that community acceptability required thorough explanation of study tools and procedures. Our results show for a low resource setting how wearable proximity sensors can be used to objectively collect high-resolution temporal data without direct supervision. The methodology appears acceptable in this population following adequate community engagement on study procedures. A target for future investigation is to determine the difference in contact networks within versus between households. We suggest that the results from this study may be used in the design of future studies using similar electronic devices targeting communities, including households and schools, in the developing world context. Electronic Supplementary Material The online version of this article (doi:10.1140/epjds/s13688-016-0084-2) contains supplementary material.


Introduction
Close social contacts drive the spread of respiratory infections that are transmitted by respiratory droplets or saliva []. Improved characterization of these social contacts should lead to an improved understanding of the dynamics of infectious diseases with this mode of transmission within human communities, and increasingly, such data is utilized within predictive transmission dynamic models [-]. The collection of close contact data has however many challenges [-]. The most important consists in defining the form of contact required to effect transmission [], and, in turn, the methodology that can be employed to collect unbiased data on such behaviour.
In this context, the standard definition for a close contact is co-location with an individual such that both have a conversation without raising voices, or having a direct (physical) contact that entails skin-to-skin touch between the individuals []. The recording of such behaviour has primarily been performed using daily contact diaries (self or rd party completed [, , , , ]), or retrospective questionnaires on contacts made [, , , , ]. The collection of contact data using these methods has moreover primarily focused on populations in developed countries [, , , , , , ], rather than in developing [, , ] countries. The level of respiratory infectious disease burden in lowincome countries suggests that increased attention on developing country communities is justified in the future.
The contact diary has been the mainstay of studies recording contact data, in which, for example, respondents record whom they contact (age, gender) and how often, whether there was skin-to-skin contact, the location (home, work, school, other), and how long the encounter lasted. Several limitations on accurate collection of representative data have been identified such as recall bias [], low compliance and illiteracy [], and differences in definitions [, , ]. Completion of diaries is a time consuming occupation, may alter behaviour, and require prior user training. Alternatives to collecting contact data via diary questionnaires are Web-based interfaces [, ] or focus group discussions [], with the methods adapted to suit the context. Alternatively, synthetic contact matrices can be generated from co-location data using time-use [] or demographic data [, ]. All these methods are of limited value in defining networks of contacts.
More recently, a host of proximity-sensing technologies have paved the way to automated collection of social proximity data: Bluetooth-enabled smartphones [, ], radio beacons [], wearable radio frequency (RF) devices [], and more. In particular, proximity-sensing wearable devices (henceforth referred to as 'tags') achieve low cost, have simple operational constraints, and can be tuned to detect proximity interactions at less than  meter separation distance every few seconds []. These proximity events are deemed relevant to direct (through physical contact) or indirect (via aerosols) spread of infections. When worn by participants, they provide data on temporal dynamic interaction patterns in real-world environments, for example, schools [, ], hospitals [, ] and conferences []. Data from these studies highlight important network properties, such as the presence of superspreaders (nodes) who are more likely to spread infections compared to others based on the number and duration of their interactions. Schools highlight age and school class assortativity as important to infection control [, , ], while hospital research identified nurses as having the most potential to transmit infections to patients [].
Investigation of contact patterns in high-contact settings such as schools and households is of paramount importance in epidemiology []. In particular, households are considered hubs of infection spread [] because of high frequency and long duration contacts with high proportion of physical interactions, combined with a high degree of clustering between members of the same household []. Furthermore, introduction of infection into households and onward dissemination from households is dependent on the connectivity of households with other groups (such as schools and workplaces, as well as other households) within a community. However, there is a distinct lack of information, in both developed and developing countries, on contact patterns at the household level, and on the role of the household network structure in shaping disease transmission. Furthermore, there is little information from the low resource setting on the acceptability of use, methodology of implementation, and performance, of electronic tracking methods, both proximity sensors and GPS locators.
The present study offers a simultaneous assessment of intra-and, to a lesser extent, interhousehold social contact patterns. We report the number and duration of contacts and the influence of age and day, as well as the temporal structure of the networks. We also report on the experience of undertaking the study in a low resource setting, from the researcher, the community and the individual participant perspectives, including the challenges and limitations. Although the data set is limited in size, since this was principally a feasibility study, it represents a first step in developing the use of electronic proximity and tracking in a rural developing country setting which we hope will provide the basis for more detailed and expansive studies.

Study design, context and data collection
The study was conducted in the Matsangoni sub-location within the Kilifi Health and Demographic Surveillance Site (KHDSS), coastal Kenya []. Five households (Figure , panel A) were selected at random from a group of  households that had earlier participated in a study to investigate 'who acquires infection from whom' (WAIFW) []. A household was defined as all people who eat from the same kitchen. In this rural setting, a household encompasses several related families living in distinct houses within the same compound and reporting to one head. Participants were grouped into  age groups assumed to approximate key social or behavioral groups: < (infant), - (pre-school), - (primary school), - (secondary school), - (adults), and ≥ (elderly) years.
Prior to the study,  focus group discussions were conducted in the study area focusing on four thematic areas: acceptability of the tags for data collection, participants and nonparticipants perceptions of tags, privacy concerns and length of time to carry tags. The groups were composed of primary school students (class -, approximate age range - y), secondary school students (form -, age range - y) and kindergarten teachers (age range - y). The last group was separate male and female Kenya Medical Research Institute (KEMRI) Community Representatives (KCR, age range - y). KCRs are a network of community-elected individuals who provide feedback on research activities to and from KEMRI and the community [] and are recognized as a key informant group for research activities. Community sensitization commenced with seeking permission from the local administrative officers in Matsangoni. At each household, the head gave the initial approval for the research team to engage the rest of the members. For practical reasons, Figure 1 Study design. Panel A shows the selection of households within the study area. Panel B shows a child wearing the tag worn with a lanyard around the neck. Panel C shows data collection over time across the households, highlighting E-F-L in which data was collected concurrently.
the study procedures were explained to all available members simultaneously in a manner reflecting the developmental age of the individuals, and follow-ups were made appropriately for those who missed the joint sessions. Active engagement for children below  years old was enhanced by use of information, education and content materials in form of colouring books containing health messages. Teenage siblings and adults received more detailed study fliers and were given a toll-free study number to call in case of further questions.
Data were collected using the platform developed by the SocioPatterns collaboration project (a European consortium of institutions and investigators focused on social dynamics, http://www.sociopatterns.org). The sensors are wearable devices that exchange ultra-low power radio packets and can detect close proximity of individuals wearing them []. The infrastructure of the SocioPatterns sensing platform has been further described in several papers [-]. For this study, the tags were tuned to exchange data packets only when located within <. metres, suggesting a dyadic conversation or skin-to-skin touch such as a handshake. A close proximity 'contact' between two individuals with tags occurred when at least one data packet was exchanged in a  second window. Once a contact was established, it was considered ongoing until no packets were exchanged for  consecutive seconds. Several -second contact windows are aggregated to give the duration of one contact between different individuals. Participants were asked to wear the tags with a lanyard on the chest. Since the radiofrequency used to sense proximity cannot propagate through a human body, this enabled detection of face-to-face proximity relations. All data was collected and stored in the internal memory of the wearable devices and downloaded to a computer for post-processing and analysis.
Participants were approached for consent at the households by trained fieldworkers. Before a participant was given a tag ( Figure , panel B), it was reset to clear its memory. Note that participants were not trained on how to perform this procedure. The tags were enclosed in a pouch with lanyard and given to each participant to wear around the neck. Participants wore a tag for five days; however, each household received tags on separate days, e.g., household L started on April th at  pm to th, household F from th to th and household E from th to th. Residents of these three households (E, F, L) carried the tags during an overlapping time window (Figure , panel C). On the fifth day of data collection, fieldworkers went to each household to collect the tags.

Data analysis
The data collected by the tags provide a high-resolution measurement of the contact patterns between household members at the temporal scale of  seconds. First, we extracted and cleaned the data separately for each participant, identifying corrupted sensors (no data available) or anomalous signals (such as continuous bursts of data) in the contact measurements. In order to make the contact dataset comparable across households, we discarded data collected on the first and fifth day and considered only contacts collected over  consecutive days, from  am to  pm, for each household. Night contacts, collected from  pm to  am, were disregarded from the analysis because most tags were not worn by the participants during night time. Only individuals that had a complete contact record for  consecutive days were considered in the analysis. Measurements obtained from sensors that were restarted or interrupted before the end of the experiment were not included. Table  reports a full description of age and gender of all participants by household, indicating the individuals whose contact records were excluded from the data analysis.
We first compute the number of contact events recorded by each individual and the statistical distribution of the duration of contact events, and then aggregate such statistics by age and household. Contact events can be of two types: I. contacts between members of the same household (available for all the households); II. contacts between members of different households (available only for households E, F, and L). Given the different nature and frequency of the two types of events, we disaggregated the dataset by type of contacts and analyzed the two sets separately.
Furthermore, we generate the aggregated contact networks on a daily scale and on the full experimental time period ( days). Nodes of the networks are individuals, while an edge indicates the presence of at least one recorded contact event between the two involved individuals during the aggregation time window. Given a contact network, we define the following quantities: -the degree k i of a node i represents the number of distinct individuals with whom individual i has been in contact during the time window; -the weight n ij of an edge between nodes i and j is the number of contact events recorded between these individuals during the time window; -the weight w ij of an edge between nodes i and j is the cumulative duration of the n ij contacts recorded during the time window between the two individuals. Network edges are undirected and weights on the edges are symmetric (n ij = n ji , w ij = w ji ). We study the statistical distributions of the degrees and weights of the contact networks and extract from them age stratified contact matrices. We characterize the statistical distributions by computing their average and the squared coefficient of variation (CV  ), defined as the ratio between the standard deviation and the mean of the distribution. The squared coefficient of variation is used to distinguish between high-variance distributions (CV  > ) and low-variance distributions (CV  < ).
We compare age-stratified contact matrices extracted from the empirical contact networks with synthetic matrices generated by an algorithmic approach []. To quantify differences between matrices, we compute the basic reproductive number R  defined as the dominant eigenvalue of the next generation matrix [], which is equivalent to the dominant eigenvalue of the contact matrix up to a constant. Under the null hypothesis of equal contact matrices, the ratio of estimates of R  is expected to equal . For each matrix comparison, we assess the statistical significance of any deviation from the null hypothesis by calculating % confidence intervals based on a nonparametric bootstrap with , samples of the empirical contact data. In practice, we sample , times the empirical data and compare each sampled empirical matrix of one household to the corresponding synthetic matrix, whose generation process cannot be randomized, obtaining one value of the R  ratio from each comparison.
We analyse the temporal variability of the contact networks aggregated on a daily scale by measuring the node loyalty [] and the similarity between the neighborhoods of a node in two different days []. The loyalty θ measures the fraction of preserved neighbors of a node for a pair of two network configurations at time t  and t  . If the set of neighbors of node i at time t is denoted as Γ t i , then the loyalty θ t  ,t  i is given by the Jaccard index []: The loyalty takes values between  and , with θ =  indicating that no neighbors are retained from time t  to time t  , and θ =  indicating that the set of neighbors is exactly the same in the two configurations. The definition of loyalty does not take into account the presence of weights on the edges. To measure the similarity between neighborhoods of nodes, also taking into account the time spent in contact, we consider the cosine similarity [], defined as: where w ij, and w ij, are the weights on the edge i ↔ j measured at time t  and t  , respectively. The cosine similarity takes values between  and , where  corresponds to the case when i had contacts in the two time windows with exactly the same individuals and spending the same fraction of time in proximity with each of them. If the sets of neighbors are completely different in the two configurations, the cosine similarity is zero.

Ethical review and consent
The Kenya Ethical Review Committee (KEMRI/RES///) and the Biomedical and Social Ethics Review Committee of the University of Warwick (--) approved the study. Written informed consent was sought from participants aged ≥ years and from parents or guardians for those aged < years. Infants (< months of age) did not participate in this study.

Focus group discussions
Focus group discussions were held with a total of  respondents ( adult males,  adult females,  KEMRI Community Representatives,  primary school students and  secondary school students). Of the non-students,  had no education,  primary school education,  secondary school education and  college education. Their main economic activity was small-scale business, farming, and casual labour or employed as a teacher () or in the hotel industry (). Participants agreed in as much as the community would welcome the research, however, it was made clear that a more thorough explanation of the study tools and procedures was necessary. For instance, clarification on whether the tags detected a respiratory infection in addition to proximity between tags. Most participants thought that the devices were relatively small making then comfortable to be carried for a number of days. Some mentioned that a shorter period would make it convenient for them to carry the devices, while others argued that a shorter duration might not give an accurate picture of their regular movement and contact patterns. Despite the size of the tag, there were concerns regarding dangers posed by the lanyard around the neck or children getting hurt when the tag was pressed against the chest. Different ways of carrying the devices were proposed, such as in the shirt or trouser pocket for men, attached to blouses for women, or encased in a pouch hung around the neck. This concealment would ensure that participants did not lose or tamper with the tags and would minimize questions from non-participants. From these discussions, it was agreed that the devices would be inserted in pouches and carried around the neck for a period of  days and involve all children especially due to the paucity of data in this age group and their importance in the spreading of infections at schools and households.

Baseline characteristics
Out of  listed residents,  (% female) were enrolled and assigned tags. One infant was not enrolled since the tag was potentially dangerous due to its size and shape. Other individuals, especially adults aged > years, were not available at the households during the entire study period mainly due to work and school. The median age was . years (IQR .-.) and household size ranged from  to  participants. Participants were regrouped into  age groups, instead of , by merging all aged < year olds. Data were collected over four days for each household within the period th April  to th May . No data was collected on May st as it was a gazetted public holiday, as well as on three other separate days to enable recharging of the electronic devices and monitoring of study progress. During the data analysis, we observed a number of tags displaying anomalies such as spikes in their activity records. We also excluded all data collected on the first day from all tags. Overall, these tags correspond to % of the total and, eventually, only data from  tags were used in the analysis reported here (see Table ). We recorded , contact events in  days. A total of , (%) contacts were recorded between members of the same household, and  contact events between members of different households. Table  displays the summary statistics of the contacts recorded between members of the same household. We recorded an average number of  contact events per person, per day, with a standard deviation of  contact events (CV  = .), thus very close to the characteristics of an exponential distribution (CV  = ). School children aged - years recorded the highest total number of contacts, with those aged - years recording the lowest. However, children aged - years have the highest average number of daily contacts across all ages. Disaggregating the contacts by gender, we found no statistically significant difference between the contact distributions of males and females, using a two-sample Kolmogorov-Smirnov test (p-value = .); males reported an average of  daily contacts per individual (CV  = .), and females had an average of  daily contacts per individual (CV  = .). Figure  shows the probability distribution of person-to-person contact durations within households for all participants (panel A) and further stratified by age (B), gender (C) and day of study (D). The average contact duration measured on all contact events is  seconds with about % of contacts exceeding  minutes (the squared coefficient of variation of the full distribution is CV  = .). There is a similar distribution across all  age groups, but most of the longest contact durations are measured among children aged < years, who are also the most represented age group in the dataset. Contacts between individuals aged - and > years do not have very long durations.

Contact matrices
We generated contact matrices based on number and duration of contacts by age ( Figure ) and stratified by household (see Figures S-S in Additional file ). Figure  shows the total number of contacts (panel A) and the cumulative durations (panel D) between age groups, computed over the study duration and considering only contacts within households. Panels B and E show the average number of contacts and cumulative durations per individual, taking into account the number of participants in each age group and thus yielding asymmetric matrices. Panels C and F show the daily average number of contacts and durations per participant (average entries divided by ). Matrices of number and duration of contacts reveal different levels of assortativity for the different age groups. Age assortativity is observed in children aged < years, with children aged - years having more contacts with older participants aged - years. Disassortativity is instead observed for older age groups, with adults aged - years spending most of their time with children aged -, which is easily explained by their parenthood, and teenagers aged - spending most of their time with adults. The least numbers of interactions are observed between pre-school children and teenagers and between school children (-) and the elderly (≥).
To assess whether such contact matrices could be inferred by a simple random mixing assumption, we generated a contact matrix separately for each household by using the method proposed by Fumanelli et al. [] (see Additional file  for details). We compute such synthetic contact matrix by assuming that the contact frequency between age groups is simply proportional to the number of individuals in the two groups in the household. In Figures S-S, in Additional file , we show a side-by-side comparison of the con- tact matrices measured by proximity sensors and the synthetic matrices based on the age structure of each household. For the sake of comparison, the entries of the sensor-based matrices are normalized by the total number of contacts recorded in the household, thus yielding the contact fraction measured between age groups. It is immediate to see that sensor based contact matrices display higher heterogeneities in the distributions of contacts between age groups, with differences with respect to the synthetic matrices that are more significant as the household gets smaller. For instance, contacts between children tend to be higher than expected by random mixing, as observed in household E ( Figure S Figure S). Also, sensor based matrices show higher than expected contact frequencies between children and adults, such as in house-  hold L ( Figure S). On the other hand, mixing between adults tend to be overestimated by synthetic matrices, such as in households E and F. While qualitative differences can be assessed by visual inspection, a more quantitative analysis is needed to identify epidemiologically relevant differences between matrices. To this aim, we compared contact matrices using the basic reproductive number R  , following the approach of Hens et al. []. Comparing the contact patterns measured by sensors with the synthetic matrices, we found an R  ratio significantly different from  for all the households (see Table ) and systematically larger than , indicating that a random mixing assumption may not be adequate to describe contact patterns within households.

Contact network
We also investigated the structure of the fully aggregated contact networks, considering both contacts within and between households. Figure  displays a pictorial representation of the contact network of the  households, where nodes are individuals and edges indicate the presence of at least one recorded contact during the study period. Nodes are color coded according to their household (panel A), age (panel B) and gender (panel C), and the size of each node i is proportional to its degree k i . The edge thickness is proportional to the weight w ij that corresponds to the total amount of time spent in proximity by the two individuals. The contact network is formed by  nodes and  edges. The network is not fully connected but there are three connected components corresponding to households B and H, plus the aggregation of households E-F-L. The average degree of the network is k = ., and the average clustering coefficient is equal to ., indicating a high level of clustering (a random network with the same number of nodes and edges would have a clustering coefficient equal to .). From panel B, it is possible to notice the high level of assortativity that characterizes the youngest age classes. The average degree of children aged - and - is . and ., respectively, thus higher than the global average (see Table ). It is also possible to observe the rather low variance of the degree distribution P(k) (CV  = .), as shown in Figure  (panel A). The degree distribution extends between k min =  and k max = , and it is peaked around its average value. The observed low variance of the P(k) is in agreement with previous empirical studies on human contact networks [, , ]. On the other hand, the weight distribution is heterogeneous and decays approximately as a power law, with an exponential cut-off, as shown in Figure , panel B.
In the full contact network, there are  edges (% of the total) connecting  members of different households (E, F and L). Only  (of ) members of household L did not record contacts with members of other households (one Male  yo, one Female  yo, one Female  yo) but all members of household E and F had contacts with members of other households. As highlighted by the layout of Figure , panel A, members of household E had contacts with both members of households F and L, while there was only  contact between two individuals of the latter two households. It is important to notice that such inter-household contacts were recorded during short time windows for all the three households (see Figure S in Additional file ). More specifically, all contacts between members of households E and L were recorded during two three hour intervals on two different days, and all contacts between members of households E and F were recorded between  am and  pm on one day, and between  am and  am in a following day. Interestingly, contact matrices extracted from inter-household contacts display some significant differences from those extracted from intra-household contacts, as shown in Figure . Although children aged less than  are present in the inter-household contact matrix there is no significant assortativity among them, and most of the contacts and the total time spent in proximity are recorded between adults (aged -).  decline up to noon, peaking again between  pm and  pm. Their numbers experience a decline again in the afternoon up to about  pm and then a sharp increase during the night up to  pm. Panels B, C, D, E, F represent households B, E, F, H and L respectively. All households except B display a regular contact pattern similar to the description above over the three days.

Longitudinal analysis and stability of daily contact patterns
We then looked at the daily contact networks, obtained by aggregating all the contact events measured between  am and  pm of each day, for each household. To assess the changes in the contacts of each household member, we computed the loyalties and the cosine similarities between the neighborhoods of each node in each pair of daily networks. The distributions of similarities and loyalties are shown in Figure , aggregated for all households and by each household. Median values of the distributions are quite close to  and the IQRs all lie above ., with the exception of the distribution of household B. More specifically, the median values of the cosine similarity all vary between . and . (panel A), and the median values of the loyalty vary between . and . (panel C), indicating a substantial stability of individual contact patterns across days.
To better understand how much these values can be considered 'large' , given the relatively small size of the networks under study, we compared the values of cosine similarity and loyalty to the ones measured on a set of different null models, i.e., randomized versions of the contact networks. More in detail, we considered two types of null models: one in which the topology of the network is unchanged but the weights of the network are reshuffled among the edges ( Figure , panel B) and one in which the network edges are placed at random between the nodes of the network ( Figure , panel D). We computed the cosine similarity distributions on , realizations of the first null model and found much smaller median values, varying between . and ., at the global level and for each household, than observed in the original network (IQRs ranging between . and .). Similarly, we computed the loyalty distributions on , realizations of the second null model and found smaller median values, varying between . and ., for all the contact networks under study.

Discussion
The use of wireless proximity sensors ('tags') to collect data on close proximity interactions relevant for infectious disease transmission has gained significant ground. Whereas paper diary studies define a contact as a direct physical touch or conversation between colocated individuals, a contact event is a continuous set of -second interactions between two tags without a  second break. Tags capture the dynamics of contacts by collecting high resolution temporal data without influences of recall bias in paper diaries, and are relatively easy to deploy in various settings including hard to reach populations such as children and the elderly in rural areas []. Data collection using tags has mainly been done in closed settings of developed countries such as schools [  cial and mobility structures of individuals in developing countries may differ significantly compared to these settings, and there is no recorded use of wearable proximity sensors in developing country contexts. The primary aim of this study was to estimate social contact patterns and networks in a low resource developing country setting. This also provided an opportunity to assess social factors and logistical challenges that influence the deployment of wearable sensors for contact detection in a rural developing country community. We report here the first study to use close proximity sensors within the household setting, undertaken in a rural low-income community in coastal Kenya.
In the design of social network studies to inform mathematical models of infectious disease spread, it is important to note the contextual differences between developed and resource poor settings. Previous studies in institutions (schools [, , ], hospitals [, , ], conferences [], museums []) on use of wearable sensors to detect close proximity interactions have mainly focused on challenges in computation of statistical measures of networks. Little attention has been paid to social and logistical challenges in conducting such studies even more so in environments such as school and households with high mixing rates between participants and non-participants. Intensive community engagement that involved local administrators, opinion leaders and entire households facilitated community entry and ensured that key messages were restructured and easily understood. Focus group discussions at the onset of this study provided invaluable input on study procedures such as how to ensure the devices were acceptable to participants and minimize curiosity in other community members. Most residents, particularly women, normally hang mobile phones on lanyards around their necks. We took advantage of this and requested participants to carry the devices in pouches around their necks to minimize attracting attention to themselves. Bias due to non-compliance, such as not wearing a tag or picking a different tag in the morning, and behavior change were potential problems. Although an exit interview was not conducted to assess the effects of the recommendations from the focus group discussions, there was relevant anecdotal evidence that these biases did not affect the data collection significantly as suggested in a similar study in a primary school []. This could be attributed to familiarity as individuals become accustomed to wearing the device over time. To avoid other unobserved effects such as exchange of tags, future studies in high-density populations, where participants keep the tags over several days, could resort to other simple measures to identify unique tags for each participant such as using different colour pouches for each participant.
We collected data continuously over a full  hours each day unlike other studies that did not collect data outside normal school or work hours [, ]. However, all nighttime ( pm- am) contacts were disregarded during analysis due to inconsistent spikes in data that suggested heightened interaction between participants (probably the result of removal and storage of devices together). In general, family members congregate in the morning, over lunch hour and again in the evening. This is suggestive of normal human social behavior in rural Kilifi, whereby members of a household congregate for breakfast, lunch and when everyone returns home from school, work or other engagements. Combining this with time-use data would have provided further insight into the fluctuations in contacts during the day, particularly by elucidating where individuals spend their time and suggesting other potential contacts they made with people outside the study. While it may be possible to issue tags to all participants at the same time in closed settings such as school or health institutions, studies involving several households pose several challenges. For example, it was not feasible to collect household data contemporaneously across all five households due to three main reasons: there were fewer tags compared to the total number of household members, the  households were not selected for their close proximity and it took approximately half a day per household to issue tags to all residents, and lastly the need to recharge devices after each use at the research office due to lack of electricity in the selected households.
Overall, at a coarse-grained level, the observed structure of the household contact matrices is consistent with the assumption of a strong age assortativity, as measured from self-reported contact diaries [] and often integrated into mathematical epidemic models []. Many contacts are observed among children, and between children aged < years and adults, while teenagers and elderly individuals tend to have lower contact frequencies.
On the other hand, contact matrices are not fully captured by a simple random mixing assumption, that is typical of large synthetic contact networks built from socio-demographic data []. More sophisticated modeling approaches, such as latent variables models as proposed in [], could be more suitable to fit the observed contact patterns within households.
Results further indicate a strong variability of contact durations in households, characterised by heavy-tailed distributions, and confirm the presence of 'universal' characteristics of contact patterns also in this setting, as previously measured in schools [, ] and hospitals []. We did not find any significant difference by gender in the durations of contacts, with the distributions being essentially equal to the aggregated one. It is worth noting, however, that we did not investigate the presence of gender homophily, which has been observed in primary schools [], where the likelihood of infection transmission in school children of the same gender is high due to gender assortativity of contacts []. Distributions of contact durations did not vary significantly from day to day, suggesting the presence of robust and repetitive contact patterns on a daily temporal scale. It is important to note that this study was conducted over three weekdays only, and thus we might expect to observe differences in network structure in a longer study that would incorporate weekends. In line with the observations of previous SocioPatterns measurements [, , , ], the probability distribution of contacts ( Figure ) is suggestive of many brief contacts and few long-lasting contacts, whose probability is however not negligible. For communicable diseases whose transmissibility depends on duration of contact, this may play a key role in defining the probability of transmission given a short or long contact duration []. Recent analysis of time use and contact data suggests there is a minimum 'suitable duration' of exposure for transmission to occur, dependent on the transmissibility of the infection []. However, further investigation is required to understand the characteristics of contact events associated with transmission of common childhood respiratory infections such as viral pneumonia by combining a variety of methods, including longitudinal surveillance of contacts and microbiological data.
As reported in previous studies [, ], contact patterns may exhibit important heterogeneities at different time scales relevant for disease transmission. At the finest scale of minutes and hours, our data display strong fluctuations driven by the circadian activities of individuals, as one may easily expect. Whether such fluctuations should be expected also at the daily or longer time scale, remains an open question. To address this issue, we performed a longitudinal analysis of the contact networks extracted from the tags, with the main goal of measuring potential similarities between contacts measured from households on different days, keeping in mind the constraints imposed by the short study duration. These results indicate that the observed contact patterns of each household member were significantly similar from day to day; at the level both of contacted individuals (as measured by the loyalty) and of the durations of time spent in contact with different individuals (as measured by the cosine similarity). Overall, within-household contacts appear to be highly stable and repetitive across single days, thus suggesting that a short data collection period of a few days could be sufficient for an accurate description. Such information is relevant to understand how much a single experimental day can be considered representative of the typical contact patterns within a household and how much data gathering would be needed to obtain a comprehensive picture of the full contact network. On the other hand, contacts across households, mainly driven by adults who could thus act as bridges in transmission of communicable diseases from one household to the other, appear to be irregular and quite difficult to capture during a time window of a week or less. While between-household contacts could be a significant driver of infectious diseases, the sample size does not allow obtaining any definitive insights on the mixing behaviour of the population in general. Furthermore, our results support the role of children in transmitting respiratory infections within the household [, , ].
It is important to highlight some key limitations of the present study. It was not possible to collect data from all  households contemporaneously due to the limitations mentioned. Indeed, it is impossible to saturate an entire community with the tags, thus limiting the ability to reconstruct full networks. However, for future household studies, recruitment could be conducted in household clusters and tags issued on the same day for all participants to ensure contemporaneous data collection. Currently, a smaller and lighter wireless sensor that should enable younger children to participate is available. The new sensor also has a longer battery life and bigger storage space, thus enabling data collection over longer periods, which is especially useful in hard to reach populations where there is no electricity connection in households (www.get.openbeacon.org). Data from a quarter of the tags were disregarded due to both tag malfunction and human influences such as storing tags together, e.g., during nighttime. While it may not be possible to detect tag malfunction during data collection, future studies can minimize human error by proper training on how to use and store the tags especially when collecting data for longer periods. There were few individuals aged - years (%) at the households resulting in few contacts being recorded with this age group. Residents of this age were away at boarding school or work, which can be alleviated by longitudinal studies incorporating both term and holiday times as well as weekends. This represents a highly mobile age group whose interactions may be important in transmission of respiratory infections and hence a potential target in intervention design. However, debate still remains on how large and long a study should be to capture an accurate representation of the dynamism of social networks and contacts [, ]. Furthermore, the study population was composed of students, subsistence farmers, small business operators and fishermen, which is typical of the coastal setting in Kenya. As such, results cannot be generalized to urban locations or other inland locations that have different social, economic, cultural and demographic characteristics. Future studies should consider higher coverage for example, in rural and urban household clusters rather than entire communities, to generate more generalizable insights into network characteristics of different regions. Lastly, collecting temporal contextual data such as time use and location would potentially explain the structure of daily contacts and why day-to-day differences exist in some cases or between households.
Our approach used electronic proximity sensors to collect social contact data from households in a rural setting. The results suggest important differences in within and between household contacts, with individuals of different ages driving the frequency and distribution of contacts in this setting. This should be a focus of future investigation. Despite involving only  households, insights from this pilot study can be valuable to the design of larger community-wide studies, such as how long to conduct data collection or how many people to involve. Continued work of this nature is important in understanding the temporal dynamics of household contacts and networks with a view to informing the design of intervention strategies.

Data accessibility
The data supporting this work is available in a fully anonymised format at: http://www.sociopatterns.org/datasets/.