Collects hyperlinks from web pages and structures the data into a dataframe with the class names
"datasource"
and "web"
.
Usage
# S3 method for class 'web'
Collect(credential, pages = NULL, ..., writeToFile = FALSE, verbose = TRUE)
collect_web_hyperlinks(pages = NULL, writeToFile = FALSE, verbose = TRUE, ...)
Arguments
- credential
A
credential
object generated fromAuthenticate
with class name"web"
.- pages
Dataframe. Dataframe of web pages to crawl. The dataframe must have the columns
page
(character),type
(character) andmax_depth
(integer). Each row is a seed web page to crawl, with thepage
value being the page URL. Thetype
value is type of crawl as either"int"
,"ext"
or"all"
, directing the crawler to follow only internal links, follow only external links (different domain to the seed page) or follow all links. Themax_depth
value determines how many levels of hyperlinks to follow from the seed site.- ...
Additional parameters passed to function. Not used in this method.
- writeToFile
Logical. Write collected data to file. Default is
FALSE
.- verbose
Logical. Output additional information. Default is
TRUE
.