Collect comments data from reddit threads — Collect.thread.reddit • vosonSML

Collects comments made by users on one or more specified subreddit conversation threads and structures the data into a dataframe with the class names "datasource" and "reddit".

Usage

# S3 method for class 'thread.reddit'
Collect(
  credential,
  endpoint,
  threadUrls,
  sort = NA,
  waitTime = c(6, 8),
  ua = NULL,
  ...,
  writeToFile = FALSE,
  verbose = TRUE
)

collect_reddit_threads(
  threadUrls,
  sort = "best",
  waitTime = c(6, 8),
  ua = vsml_ua(),
  writeToFile = FALSE,
  verbose = TRUE,
  ...
)

Arguments

credential: A credential object generated from Authenticate with class name "reddit".
endpoint: API endpoint.
threadUrls: Character vector. Reddit thread urls to collect data from.
sort: Character vector. Reddit comment sort order. Options are "best", "top", "new", "controversial", "old", and "qa". Default is NA.
waitTime: Numeric vector. Time range in seconds to select random wait from in-between url collection requests. Minimum is 3 seconds. Default is c(6, 8) for a wait time chosen from between 6 and 8 seconds.
ua: Character string. Override User-Agent string to use in Reddit thread requests. Default is NULL.
...: Additional parameters passed to function. Not used in this method.
writeToFile: Logical. Write collected data to file. Default is FALSE.
verbose: Logical. Output additional information about the data collection. Default is TRUE.

Value

A tibble object with class names "datasource" and "reddit".

Note

The reddit web endpoint used for collection has maximum limit of 500 comments per thread url.

Examples

if (FALSE) { # \dontrun{
# subreddit url to collect threads from
threadUrls <- c("https://www.reddit.com/r/xxxxxx/comments/xxxxxx/x_xxxx_xxxxxxxxx/")

redditData <- redditAuth |>
  Collect(threadUrls = threadUrls, writeToFile = TRUE)
} # }