Skip to main content
Figure 4 | EPJ Data Science

Figure 4

From: Tampering with Twitter’s Sample API

Figure 4

Estimating suspicious values. (A) Binomial distribution (\(n=1000\), \(p=0.01\)) to estimate the expected exact proportion of Tweets per user in the 1% Sample API with n% Tweets in 1% Sample API (PDF, black line) as well as accumulated probabilities (CDF, red line), e.g. 3 out of 1000 users reach at least 2% coverage. (B) Expected probability that an account has \(\geq2\%\) coverage on x days. The suspicious threshold p (dashed lines) is set to \(1/N\), where \(N=328M\), the overall number of active Twitter accounts. (C) Expected entropy scores for millisecond distribution of accounts in 10% Sample. Values below 0.985 can be considered as very rare events, i.e. suspicious. (D) Expected entropy density for the 1% Sample and 10% Sample to illustrate that identifying suspicious entropy scores (i.e. outliers) is easier in the 10% Sample

Back to article page