Sweet tweets! Evaluating a new approach for probability-based sampling of Twitter

Buskirk, Trent D.; Blakely, Brian P.; Eck, Adam; McGrath, Richard; Singh, Ravinder; Yu, Youzhi

doi:10.1140/epjds/s13688-022-00321-1

EPJ Data Science

Table 2 Description of the 6 different methods we used in our experiment. The first three methods are possible settings for the TSAPI and the last three are new variants we are introducing and comparing in our experiment

From: Sweet tweets! Evaluating a new approach for probability-based sampling of Twitter

Tweet access method	Description
1. Popular	One of three methods available for the result_type parameter in the TSAPI that returns the most popular results, as determined by Twitter, in the query.
2. Mixed	The current default method for the result_type parameter of the TSAPI: returns both “popular” and “recent” Tweets as part of the query. Popular Tweets are determined by Twitter.
3. Recent	Another option for the result_type parameter of the TSAPI that can be selected by the user in which the most recent Tweets are returned. If there are more than 100 Tweets that occurred most recently then additional queries can be submitted in sequence to obtain collections of Tweets that follow chronologically from 11:59:59:999 pm of a given day back to midnight at the beginning of that day as described in https://developer.twitter.com/en/docs/twitter-api/v1/Tweets/timelines/guides/working-with-timelines.
4. Uniform	A series of evenly spaced time points from a given day are determined a single query is submitted for each of the selected time points using the TSAPI with result_type parameter set to “recent”. For this method we randomly select a starting time point within a sampling interval determined by the number of queries desired and then determine subsequent, evenly spaced points. The identified time points are then converted to Tweet IDs and used as the max_id parameters in the TSAPI.
5. VBEST-SYS	A systematic random sample (without replacement, circular) is taken of a desired size from the universe of Tweet PSUs identified from the VBEST algorithm. The right-most endpoint of each of the Tweet PSU intervals is then used in a TSAPI query with result_type set to “recent”. One query is submitted per selected Tweet PSU.
6. VBEST-SRS	A simple random sample (without replacement) is taken of a desired size from the sampling frame of Tweet PSUs constructed from the VBEST algorithm. The right-most endpoint of each of the Tweet PSU intervals is then used in a TSAPI query with result_type set to “recent”. One query is submitted per selected Tweet PSU.

Note: For more information on Twitter Search API search options please refer to Twitter documentation available at: https://developer.twitter.com/en/docs/twitter-api/v1/Tweets/search/api-reference/get-search-Tweets.

Back to article page