Collect tweet data from twitter search — Collect.search.twitter • vosonSML

This function collects tweet data based on search terms and structures the data into a dataframe with the class names "datasource" and "twitter".

The twitter Standard search API sets a rate limit of 180 requests every 15 minutes. A maximum of 100 tweets can be collected per search request meaning the maximum number of tweets per operation is 18000 / 15 minutes. More tweets can be collected by using retryOnRateLimit = TRUE parameter which will cause the collection to pause if the rate limit is reached and resume when the rate limit resets (in approximately 15 minutes). Alternatively the twitter API parameter since_id can be used in a later session to resume a twitter search collection from the last tweet previously collected as tweet status id's are sequential. The Standard API only returns tweets for the last 7 days.

All of the search query operators available through the twitter API can be used in the searchTerm field. For example, to search for tweets containing the term "love" or "hate" the "OR" operator can be used in the term field: searchTerm = "love OR hate". For more information refer to the twitter API documentation for query operators: https://developer.twitter.com/en/docs/twitter-api/v1/tweets/search/guides/standard-operators.

# S3 method for search.twitter
Collect(
  credential,
  endpoint,
  searchTerm = "",
  searchType = "recent",
  numTweets = 100,
  includeRetweets = TRUE,
  retryOnRateLimit = TRUE,
  writeToFile = FALSE,
  verbose = FALSE,
  ...
)

Arguments

credential

A credential object generated from Authenticate with class name "twitter".

endpoint

API endpoint.

searchTerm

Character string. Specifies a twitter search term. For example, "Australian politics" or the hashtag "#auspol".

searchType

Character string. Returns filtered tweets as per search type recent, mixed or popular. Default type is recent.

numTweets

Numeric. Specifies how many tweets to be collected. Defaults is 100.

includeRetweets

Logical. Specifies if the search should filter out retweets. Defaults is TRUE.

retryOnRateLimit

Logical. When the API rate-limit is reached should the collection wait and resume when it resets. Default is TRUE.

writeToFile

Logical. Write collected data to file. Default is FALSE.

verbose

Logical. Output additional information. Default is FALSE.

...

Arguments passed on to rtweet::search_tweets

geocode: Geographical limiter of the template "latitude,longitude,radius" e.g., geocode = "37.78,-122.40,1mi".
since_id: Supply a vector of ids or a data frame of previous results to find tweets newer than since_id.
max_id: Supply a vector of ids or a data frame of previous results to find tweets older than max_id.
parse: If TRUE, the default, returns a tidy data frame. Use FALSE to return the "raw" list corresponding to the JSON returned from the Twitter API.

Value

A tibble object with class names "datasource" and "twitter".

Note

Additional parameters passed to this function in the ellipsis ... will also be passed to the Twitter search API request. Most parameters have been covered but a complete list can be found here: https://developer.twitter.com/en/docs/twitter-api/v1/tweets/search/api-reference/get-search-tweets A useful additional parameter is language allowing the user can restrict tweets returned to a particular language using an ISO 639-1 code. For example, to restrict a search to tweets in English the value language = "en" can be passed to this function.

Examples

if (FALSE) {
# search and collect 100 recent tweets for the hashtag #auspol
myTwitterData <- twitterAuth |>
  Collect(searchTerm = "#auspol", searchType = "recent", numTweets = 100, verbose = TRUE,
          includeRetweets = FALSE, retryOnRateLimit = TRUE, writeToFile = TRUE)
}