In this section, we present the description and the discussion of the main results of this paper. First, we investigate the correlation between the activities of individual hours and employment and unemployment rates, and choose two dimensions with which employment and unemployment levels have maximum or minimum correlations. We then evaluate to what extent the linear model is a valid description of our data for these most separating dimensions (2) and then for all possible dimensions (24) of our dataset. Second, we discuss how the linear models in 2 and 24 dimensions separate the two population groups with the two distinct activity patterns, and give a possible interpretation of these patterns. Third, we connect the two groups with real-world indicators like share of employed in a county, and discuss the plausibility of the correspondence of the daily patterns of the two separate groups to employment status.
We first evaluate population-weighted Pearson correlations for each hour i between \(y_{i}^{(k)}\) activities for the 1,884 counties (from which we have an adequate number of messages) and employment and unemployment levels. We calculate the errors of these correlations by bootstrapping our sample for \(n=1\mbox{,}000\) times, the results with errorbars are shown in Figure 1. While unemployment levels are defined in the traditional way of the Bureau of Labor Statistics, we define the share of employed slightly differently, normalizing the number of employed by the entire population of a county. This definition matches the notion of population share of “active” people regarding regular working hours better.
The hours between 6am and 8pm show a significantly positive correlation with employment, and a negative one with unemployment, while during the night, between 9pm and 5am, the correlation is reversed. With respect to employment, the correlation peaks at 12pm with \(0.43\pm 0.02\) and reaches its lowest value at 1am with \(-0.39\pm0.03\). The location of the maximum and minimum of correlation with unemployment are shifted slightly to 12pm and 12am, though exactly with opposite signs (\(0.30\pm0.02\) for 12am and \(-0.38\pm0.02\) for 12pm). The signs of the correlations and the hours of their extreme values indicate that increased daytime activity is associated with higher employment levels, and higher than average nighttime activity corresponds to higher unemployment.
To check the linearity of the model described in Section 2, we first choose the coordinate system of the hours having the extreme correlation values with employment levels. Figure 2 shows the 12am and 1pm activities of the filtered counties with the dashed line corresponding to the direction of the first eigenvector of the covariance matrix, now calculated only from these two dimensions. If we normalize the eigenvalues by their sum, we see that the first eigenvalue of the covariance matrix carries 0.99 share from all the variance in the data, thus, linearity in this two-dimensional subspace of the whole 24-hour activity space is a good assumption.
We continue by assessing the validity of the linear model in all 24 dimensions presented in Eq. (1). In Figure 3(a) we plot eigenvalues of the covariance matrix C again normalized by the sum of all eigenvalues. Only the first four eigenvalues correspond to a variance significantly greater than 0, and the first principal component stands out with a proportion of 0.52, whereas the other three significant components carry 0.25, 0.13 and 0.04 share of the variance. Thus, our dataset is mostly linear even in the 24-dimensional space, and the representation with Eq. (1) remains plausible.
In the 2-dimensional case, the dashed line of Figure 2 marks the direction of the first principal vector. The difference between the two vectors A (red) and B (blue) representing the two universal patterns (see Eqs. (4)-(5) in Section 2.4) is parallel to this component, let us denote it by m. It can be seen in Figure 2 that the A pattern is marked by an increased activity at 1pm, and a decreased activity at 12am, while pattern B is characterized by exactly the inverse relationship.
The principal component corresponding to the largest principal value in the 24-dimensional case can be seen in Figure 3. As the coordinates represent the hours, it can be seen from Figure 3 that m is positive from 5am until 8pm, and negative otherwise. Thus, the positive elements of m select mainly those hours during which people are awake, and the negative elements correspond to the sleeping hours.
We then plot the elements of the 24-dimensional A and B from Eqs. (4)-(5) in Figure 4. By interpreting these patterns as the different average tweeting patterns of two population groups, each \(\alpha^{(k)}\) is proportional to the share of people in a county in one population group. Our hypothesis is that the group more active during the daytime corresponds to people who regularly go to work, school etc. on weekdays, thus their daytime is regulated by the earlier wake-up and bedtime indicated in pattern A. On the other hand, pattern B could correspond to a group where this regulation factor does not exist due to retirement, unemployment or any other reason, which would allow these people to be more active during nighttime and wake up later.
To confirm our hypothesis, we correlate \(\alpha^{(k)}\) values with labor force and unemployment estimates from the Local Area Unemployment Statistics (see Section 2.2) of the investigated counties. In the 2-dimensional case, these combined values of \(\alpha^{(k)}\) do not correlate with employment (\(0.38\pm0.03\)) or unemployment (\(-0.32\pm0.02\)) better than previous activity measures from single dimensions from Figure 1. However, by using all dimensions, we find correlations of \(0.46\pm0.02\) and \(-0.34\pm 0.02\) for employment (see scatterplot in Figure 5) and unemployment, respectively. For the employment this is an improvement to that of the single dimensional correlations, while it is not for the unemployment. A possible interpretation is that a stricter daily rhythm is imposed upon those who are employed, as such, the characteristics of their activity curves mean a stronger overall pattern than that of the unemployed. Nevertheless, the result shows that high a \(\alpha^{(k)}\) is significantly bound to higher employment, and lower unemployment rates, and that the overall shape of the activity timeline can give us more information than just using one feature of a whole day. The similarity of the regional distribution of \(\alpha^{(k)}\), unemployment and employment rates are visualized on the three maps of Figure 6.
Our results are in line with previous research carried out for Spain in [31], where share of Twitter activity during a window of the morning hours (8-11am), afternoon hours (3-5pm) and of the night hours (0-3am) correlated significantly with unemployment rates among 25 to 44-year old inhabitants of Spanish administrative areas. High morning and low night activity indicated lower unemployment rates, which is in correspondence with our correlations. Although in Spain high afternoon activity correlated positively with unemployment levels, we cannot observe this phenomenon in the US. Due to the bias in the age of Twitter users towards younger age groups [39], our calculated county activity patterns are not representative of the whole population. We believe that our model could be improved by incorporating labor force data detailed by different age groups.
That correlation with unemployment is significantly lower than correlation with labor force share of the population can be related to the fact that the share of employed should overlap more with the population exhibiting the “working” pattern A, whereas officially registered unemployed people are not distinguishable in this context from those who are on a maternal leave or are retired etc. We also believe that there are other inherent reasons for example the more flexible working hours in the creative industry that limit the power of such a simple model explaining the employment patterns of a geographical area.