Collects hyperlinks from web pages and structures the data into a dataframe with the class names
"datasource" and "web".
Usage
# S3 method for class 'web'
Collect(credential, pages = NULL, ..., writeToFile = FALSE, verbose = TRUE)
collect_web_hyperlinks(pages = NULL, writeToFile = FALSE, verbose = TRUE, ...)Arguments
- credential
A
credentialobject generated fromAuthenticatewith class name"web".- pages
Dataframe. Dataframe of web pages to crawl. The dataframe must have the columns
page(character),type(character) andmax_depth(integer). Each row is a seed web page to crawl, with thepagevalue being the page URL. Thetypevalue is type of crawl as either"int","ext"or"all", directing the crawler to follow only internal links, follow only external links (different domain to the seed page) or follow all links. Themax_depthvalue determines how many levels of hyperlinks to follow from the seed site.- ...
Additional parameters passed to function. Not used in this method.
- writeToFile
Logical. Write collected data to file. Default is
FALSE.- verbose
Logical. Output additional information. Default is
TRUE.