This function collects tweet data based on search terms and structures the data into a dataframe with
the class names "datasource"
and "twitter"
.
The twitter Standard search API sets a rate limit of 180 requests every 15 minutes. A maximum of 100 tweets can be
collected per search request meaning the maximum number of tweets per operation is 18000 / 15 minutes. More tweets
can be collected by using retryOnRateLimit = TRUE
parameter which will cause the collection to pause if the
rate limit is reached and resume when the rate limit resets (in approximately 15 minutes). Alternatively the
twitter API parameter since_id
can be used in a later session to resume a twitter search collection from the
last tweet previously collected as tweet status id's are sequential. The Standard API only returns tweets for the
last 7 days.
All of the search query operators available through the twitter API can be used in the searchTerm
field. For
example, to search for tweets containing the term "love"
or "hate"
the "OR"
operator can be
used in the term field: searchTerm = "love OR hate"
. For more information refer to the twitter API
documentation for query operators:
https://developer.twitter.com/en/docs/twitter-api/v1/tweets/search/guides/standard-operators.
# S3 method for search.twitter
Collect(
credential,
endpoint,
searchTerm = "",
searchType = "recent",
numTweets = 100,
includeRetweets = TRUE,
retryOnRateLimit = TRUE,
writeToFile = FALSE,
verbose = FALSE,
...
)
A credential
object generated from Authenticate
with class name "twitter"
.
API endpoint.
Character string. Specifies a twitter search term. For example, "Australian politics"
or the
hashtag "#auspol"
.
Character string. Returns filtered tweets as per search type recent
, mixed
or
popular
. Default type is recent
.
Numeric. Specifies how many tweets to be collected. Defaults is 100
.
Logical. Specifies if the search should filter out retweets. Defaults is TRUE
.
Logical. When the API rate-limit is reached should the collection wait and resume when it resets. Default is TRUE
.
Logical. Write collected data to file. Default is FALSE
.
Logical. Output additional information. Default is FALSE
.
Arguments passed on to rtweet::search_tweets
geocode
Geographical limiter of the template
"latitude,longitude,radius" e.g., geocode = "37.78,-122.40,1mi"
.
since_id
Supply a vector of ids or a data frame of previous results to
find tweets newer than since_id
.
max_id
Supply a vector of ids or a data frame of previous results to
find tweets older than max_id
.
parse
If TRUE
, the default, returns a tidy data frame. Use FALSE
to return the "raw" list corresponding to the JSON returned from the
Twitter API.
A tibble object with class names "datasource"
and "twitter"
.
Additional parameters passed to this function in the ellipsis ...
will also be passed to the Twitter
search API request. Most parameters have been covered but a complete list can be found here:
https://developer.twitter.com/en/docs/twitter-api/v1/tweets/search/api-reference/get-search-tweets A useful
additional parameter is language
allowing the user can restrict tweets returned to a particular language
using an ISO 639-1 code. For example, to restrict a search to tweets in English the value language = "en"
can be passed to this function.
if (FALSE) {
# search and collect 100 recent tweets for the hashtag #auspol
myTwitterData <- twitterAuth |>
Collect(searchTerm = "#auspol", searchType = "recent", numTweets = 100, verbose = TRUE,
includeRetweets = FALSE, retryOnRateLimit = TRUE, writeToFile = TRUE)
}