Collects hyperlinks from web pages and structures the data into a dataframe with the class names
"datasource"
and "web"
.
# S3 method for web
Collect(credential, pages = NULL, writeToFile = FALSE, verbose = FALSE, ...)
collect_web_hyperlinks(pages = NULL, writeToFile = FALSE, verbose = FALSE, ...)
A credential
object generated from Authenticate
with class name "web"
.
Dataframe. Dataframe of web pages to crawl. The dataframe must have the columns page
(character),
type
(character) and max_depth
(integer). Each row is a seed web page to crawl, with the page
value being the page URL. The type
value is type of crawl as either "int"
, "ext"
or
"all"
, directing the crawler to follow only internal links, follow only external links (different domain to
the seed page) or follow all links. The max_depth
value determines how many levels of hyperlinks to follow
from the seed site.
Logical. Write collected data to file. Default is FALSE
.
Logical. Output additional information. Default is FALSE
.
Additional parameters passed to function. Not used in this method.
A tibble
object with class names "datasource"
and "web"
.