Skip to main content

Table 1 Summary of our Twitter dataset.

From: Whom should we sense in “social sensing” - analyzing which users work best for social media now-casting

Topic

Flu activity

Unemployment

1st-person flu

1st-person unemployment

#Tweets

153,848

145,780

79,223

83,015

#Authors

142,458

139,300

75,000

72,375

State

56,967 (40%)

48,670 (35%)

24,287 (32%)

22,987 (32%)

Gender

56,903 (40%)

49,359 (36%)

28,187 (38%)

23,555 (33%)

Male

24,896 (18%)

29,018 (20%)

10,664 (14%)

11,911 (17%)

Female

32,009 (23%)

22,146 (15%)

17,524 (23%)

11,644 (16%)

Age

42,916 (30%)

42,049 (30%)

23,418 (31%)

22,584 (31%)

Age < 24

19,682 (14%)

20,452 (15%)

12,745 (17%)

12,419 (17%)

Age ≥ 24

23,234 (16.3%)

21,597 (16%)

10,673 (14%)

10,165 (14%)

  1. The keywords used for extracting unemployment tweets were “got fired”, “lost ** job”, “get a job” and “unemployment”. For flu tweets only “flu” was used. Both sets of keywords were used in previous work. The value in parentheses is the fraction of users with the corresponding inferred demographics.