Changelog
Source:NEWS.md
vosonSML 0.35.1
Bug Fixes
- Updated reddit user-agent parameters as default to NULL. If NULL, requests use the
HTTPUserAgentoption value, which is set by default to the internal package user-agent by functionvsml_ua()during collection. - Automatically removed duplicate comments from youtube
Collectdata based onCommentID, these records can be found in an attribute of the object namedduplicated. Access usingattributes(youtube_data)$duplicated.
vosonSML 0.35.0
vosonSML 0.34.1
Minor Changes
- Added a parameter to
MastodonnetworkCreate()function namedsubtypefor creating variations to theactivityandactornetworks. For theactivitynetwork asubtype = tagparameter can be used to create atagnetwork of post tags that are colocated. For theactornetwork asubtype = serverparameter can be used to create aservernetwork, which is anactornetwork reduced to server associations.
vosonSML 0.34.0
Major Changes
- Added
Mastodonauthentication, collection and network creation. There are two options forMastodoncollection, a hashtag search for global or local server timeline posts that is optionally authenticated:Collect.search.mastodon(), and a public thread collection function using input URL’s that is similar toRedditthread collection that requires no authentication:Collect.thread.mastodon(). To access these methods viaCollectanendpoint = "search"orendpoint = "thread"parameter should be passed to the functions. - The
Mastodonauthentication and collection uses thertootpackage and a function has been created for importingrtootdata intovosonSMLcalledImportRtoot. Imported data can be passed as input to theCreatenetwork functions.
vosonSML 0.33.2
Bug Fixes
- Fixed a bug in the regex for
RedditURL parsing in which thread ID’s were limited to 6 characters. - Fixed verbose output for
2modenetworks to use option specified method. - Fixed an issue with adding text to
Twitternetworks caused by missing columns in the data. - Added twitter tokenization functions that were recently removed from the
tidytextandtokenizerspackages due to a change in the ICU library unicode standard and thestringipackage (tokenizers issue #82). This affects only the generation ofsemanticand2modetwitter networks and the fix maintains their functionality until an alternative tweet tokenization method is implemented. Unfortunately these two twitter network types are not supported on systems using ICU library versions >= 72.0 at this time. - Fixed an intermitant column mismatch error in
Twittercaused by unexpected type when data merging. - Fixed the number of tweet observations does not match number of users error reported with
rtweetv1.1. - Fixed number of tweets requested count in verbose message for
Twittertimeline collection. - Fixed a bug in
Redditthread collection where URL’s missing trailing slashes would trigger loop protection errors. - Changed the default
sortparameter value forRedditthreaad collection to beNA. Default sort order onRedditis not a fixed value.
Major Changes
- Added
sortparameter toRedditcollection. As this collection method is limited, it may be useful to request comments in sort order using theRedditsort optionstop,new,controversial,old,qaandbest. - Added a
Collect.listingfunction for subreddits onReddit. This is not a search, however it allows complete metadata for a specified number of subreddit threads to be collected in sorted order. The sort options arehot,top,newandrising. There is a further time parameterperiodthat can be set tohour,day,week,month,yearorallifsort = top, meaning for example, results sorted by top threads over the last week.
Minor Changes
- Added simple log file output for
CollectandMergefunctions whenwriteToFile = TRUE. The log file is written in the same location as the data file with the.txtextension appended. - Changed data output path option
option(voson.data = "my-data")to now attempt to create the directory if it does not exist.
vosonSML 0.32.8
Bug Fixes
- Fixed two issues that arose from the introduction of tibbles and verbose messaging in
Collect.reddit(). - Fixed an error caused by unescaped regex parameters in hyperlinks processed by
Collect.web()(#49).
vosonSML 0.32.7
CRAN release: 2022-08-16
Major Changes
- Re-wrote and modified
vosonSMLTwitterfunctions to support major changes made inrtweetrelease version 1.0.2. - Added an
endpointparameter to theTwitterCollectfunction. It is set tosearchby default, which is the usual collect behaviour, but can also now be set totimelineto collect user timelines instead. SeeCollect.timeline.twitter()for parameters. - Changed output message system.
vosonSMLfunctions are now silent by default. Using theverboseparameter will again print function output. - Changed output messages to use the
message()function instead of thecat()function by default. Setting the global optionoption(voson.msg = FALSE)will again redirect output tocat(). The option can be removed by assigning a value ofNULL. - Added the
voson.dataoption allowing a directory path to be set forwriteToFileoutput files. Files are output to the current working directory by default, however a new directory can now be set withoption(voson.data = "my-data")for example. The directory path can be relative or a full path, but must be created beforehand or already exist. If the path is invalid or does not exist it will continue with the default behaviour. This option can be removed by assigning a value ofNULL. This will not effect other file write operations performed by the user. - The
TwitterAddText()andAddUserData()functions now work with mostTwitternetwork types. -
AddText()now adds columns for embedded tweet text and has ahashtagsparameter to add a list of tweet hashtags as a network attribute. -
AddUserData()now adds an additional dataframe formissing_users. It lists the ids and screen names of users that did not have metadata embedded in the collected data. Using thelookupUsersparameter will retrieve the metadata using the twitter API. Additonally passing therefresh = TRUEparameter will now retrieve and update the metadata for all users in the network. - Twitter data collection now returns a named list of two dataframes containing
tweetsandusers. - Removed the
ImportDatafunction and replaced it withImportRtweet()forrtweetversion 1.0 format data. - Added
Merge()andMergeFiles()functions to support the merging of collected data from separate operations. These functions support input of multiple Collect objects or.RDSfiles, automatically detect the datasource type and support thewriteToFileparameter for file output of merged data.
Minor Changes
- Re-wrote
YouTubeid extraction from url function to be more robust and added support forYouTubeshorts urls. - Removed stand-alone
GetYoutubeVideoIDsfunction. TheYouTubecollect function parametervideoIDswill now accept video ids or video urls. - Added wrappers and aliases for some functions. Twitter auth objects can now be created with simplified
auth_twitter_app(),auth_twitter_dev()andauth_twitter_user()functions for each token type. Thecollect_reddit_threads()andcollect_web_hyperlinks()functions skip the unecessaryAuthenticatestep forRedditand web data collection.
vosonSML 0.31.1
Bug Fixes
- Incorrectly ordered tweets by
status IDto summarise collected tweet range. TheMin IDandMax IDare not necessarily the earliest and latest tweet in the tweets collected and therefore not ideal for delimiting subsequent collections. Instead the twoEarliest Obsand twoLatest Obstweets as returned by theTwitter APIare now reported.
Major Changes
- Added
enpointparameter toCollect, allowingsearchortimelineto be specified for atwitterdata collection. If it is not specified the default is a twittersearch. - The
timelinecollection accepts ausersvector of user names or ID’s or a mixture of both, and will return up to 3,200 of each users most recent tweets. - Minimum required version of R has changed from 3.6 to 4.1.
vosonSML 0.30.5
Major Changes
- Re-implemented
Create.actor.twitterandCreate.activity.twitterto usedplyranddata.tabletechniques consistent with other package network creation functions. Both functions are significantly faster for large collection dataframes.
Minor Changes
-
Create.actor.twitterincludes two new parameters formentions,inclMentionsthat will process and includementionsedges in the network andinclRtMentionsthat will process and include mentions found in retweets. TheinclMentionsparameter is set toTRUEby default andinclRtMentionsset toFALSE. TheinclRtMentionsparameter is a subset of mentions, therefore for it to be set toTRUE,inclMentionsmust also beTRUE. - Re-implemented and simplified the
Create.activity.twitternetwork creation. Addedauthor_idandauthor_screen_nameto nodes to assist with labels or re-creating tweet URLs from data. - Added
rmEdgeTypesparameter toCreate.activity.twitterandCreate.actor.twitter. These accept a list of edge types that can be filtered out of the network during network creation. - Removed label attributes from igraph graphs generated by the
Graphfunction. - Tidied up and renamed many of the utils functions. Removed unused functions.
- Added last observation tweet to minimum and maximum status ID values reported for twitter collections. Usually the last observation and
Min IDwill be the same, but sometimes theMin IDis outside of the expected collection range. The last observation is a more reliable tweet to use as the starting point for subsequent search collections. - Cleaned up package imports, suggests and added some interactive package checks to reduce the number of required imports.
vosonSML 0.30.0
Major Changes
- Added a web crawler
Collectmethod with hyperlink network creation. TheCreatefunction withactivitytype parameter creates a network where nodes areweb pagesand edges thehyperlinkslinking them (extracted froma hrefHTML tags). Theactornetwork has page orsite domainsas the nodes and again thehyperlinksfrom linking pages between domains.
vosonSML 0.29.14
vosonSML 0.29.12
Bug Fixes
- Fixed an issue with custom classes assigned to dataframes causing an
vctrserror when usingdplyrfunctions. The classes are no longer needed post-method routing so they are simply removed. - Replaced an instance of the deprecated
dplyr::funsfunction that was generating a warning.
vosonSML 0.29.10
CRAN release: 2020-04-25
Minor Changes
- Reimplemented the
Create.semantic.twitterandCreate.twomode.twitterfunctions using thetidytextpackage. They now better support tokenization of tweet text and allows a range of stopword lists and sources to be used from thestopwordspackage. The semantic network function requires thetidytextandtidyrpackages to be installed before use. - New parameters have been added to
Create.semantic.twitter:- Numbers and urls can be removed or included from the term list using
removeNumbersandremoveUrls, default value isTRUE. - The
assocparameter has been added to choose which node associations or ties to include in the network. The default value is"limited"and includes only ties between most frequently occurring hashtags and terms in tweets. A value offullwill also include ties between most frequently occurring hashtags and hashtags, and terms with terms creating a more densely connected network. - Parameters to specify
stopwordslanguage e.gstopwordsLang = "en"and source e.gstopwordsSrc = "smart"have been added. These correspond to thelanguageandsourceparameters of thetidytext::get_stopwordsfunction. Thestopwordsdefault value isTRUE.
- Numbers and urls can be removed or included from the term list using
- The network produced by the
Create.twomode.twitterfunction is weighted by default but can be disabled by setting the newweightedparameter toFALSE. - Renamed the
replies_from_textparameter torepliesFromTextandat_replies_onlytoatRepliesOnlyin theAddText.actor.youtubefunction for consistency. - Improved the usage examples in the README file.
- Removed
tmpackage dependency.
vosonSML 0.29.9
Minor Changes
- Updated
Introduction to vosonSMLvignetteMerging Collected Dataexamples. - Added new hex sticker to package documentation.
Bug Fixes
- Fixed a logic problem in
Collect.youtubethat was causing no video comments to be collected if there were no reply comments for any of the videos firstmaxCommentsnumber of top level comments. For example, ifmaxCommentsis set to 100 and the first 100 comments made to a video had no replies then no results would be returned.
vosonSML 0.29.8
Bug Fixes
- A recent intermittent problem with the Twitter API caused an issue with the
rtweet::rate_limitfunction that resulted in an error when using the rtweetretryonratelimitsearch parameter. Therate_limitfunction was being called byvosonSMLto check the twitter rate limit regardless of whether the search parameter was set or not, and so was failingCollectwith an error. A fix was made so thatvosonSMLchecks ifrtweet::rate_limitsucceeds, and if not automatically setsretryonratelimittoFALSEso that a twitterCollectcan still be performed without error should this problem occur again.
vosonSML 0.29.5
Minor Changes
- Reddit JSON is now retrieved using
jsonlite::fromJSON. - Reddit ‘Continue’ threads are now followed with additional thread requests. Many more comments are now collected for threads with large diameters or breadth. Continue threads also have a Reddit limit of 500 comments per thread request.
- Reddit comment ID’s and timestamps are now extracted.
- Removed the
tictocpackage from dependency imports to suggested packages. - Added some checks for whether the
rtweetpackage is installed. - Removed the
RedditExtractoRpackage from imports. - HTML decoded tweet text during network creation to replace ‘&’, ‘<’, and ‘>’ HTML codes.
- Added node type attribute to
twomodenetworks.
vosonSML 0.29.3
Bug Fixes
- Added a fix
redditgsub locale error https://github.com/vosonlab/vosonSML/issues/21. - Changed
bimodalnetwork hashtags to lowercase as filter terms when entered are converted to lowercase. - Fixed errors thrown when removing terms from
bimodalandsemanticnetworks. - Removed a duplicate
GetVideoData()function call inAddVideoData. - Fixed data type errors in
AddTextfunctions related to strict typing bydplyr::if_elsefunction.
vosonSML 0.29.2
Minor Changes
- A feature was added to the youtube actor
AddTextfunction to redirect edges towards actors based on the presence of ascreen nameor@screen namethat may be found at the beginning of a reply comment. Typically reply comments are directed towards a top-level comment, this instead captures when reply comments are directed to other commenters in the thread.
vosonSML 0.29.1
Minor Changes
- Changed youtube
actornetwork identifiers to be their uniqueChannel IDinstead of theirscreen names. - Created the
AddVideoDatafunction to add collected video data to the youtubeactornetwork. The main purpose of this function is to replace video identifiers with theChannel IDof the video publisher (actor) instead. To get theChannel IDof video publishers an additional API lookup for the videos in the network is required. Additional columns such as videoTitle,DescriptionandPublishedtime are also added to the network$edgesdataframe as well as returned in their own dataframe called$videos.
vosonSML 0.29.0
Major Changes
- Created the
AddTextfunction to add collected text data to networks. This feature applies toactivityandactornetworks and will typically add a node attribute to activity networks and an edge attribute to actor networks. For example, this function will add the columnvosonTxt_tweetscontaining tweet text to$nodesif passed an activity network, and to$edgesif passed an actor network. - Generation of
igraphgraph objects and subsequent writing to file has been removed from theCreatefunction and placed in a new functionGraph. This change abstracts the graph creation and makes it optional, but also allows supplemental network steps such asAddTextto be performed prior to creating the final igraph object.
Minor Changes
- Removed
writeToFileparameter fromCreatefunctions and added it toGraph. - Removed
weightEdges,textDataandcleanTextparameters fromCreate.actor.reddit.cleanTextis now a parameter ofAddText.activity.redditandAddText.actor.reddit. - Replaced
AddTwitterUserDatawithAddUserDatafunction that works similarly toAddText. This function currently only applies to twitter actor networks and will add, or download add if missing, user profile information to actors as node attributes.
vosonSML 0.27.2
CRAN release: 2019-07-18
Minor Changes
- Added twitter interactive web authorization of an app as provided by
rtweet::create_token. Method is used when only twitter app name and consumer keys are passed toAuthenticate.twitteras parameters. e.gAuthenticate("twitter", appName = "An App", apiKey = "xxxxxxxxxxxx", apiSecret = "xxxxxxxxxxxx"). A browser tab will open asking the user to authorize the app to their twitter account to complete authentication. This is using twittersApplication-user authentication: OAuth 1a (access token for user context)method. - It is suspected that Reddit is rate-limiting some generic R UA strings. So a User-Agent string is now set for underlaying R Collect functions (e.g
file) via theHTTPUserAgentoption. It is temporarily set to package name and current version number for Collect e.gvosonSML v.0.27.2 (R Package). - Removed hex sticker (and favicons for pkgdown site).
vosonSML 0.27.1
Bug Fixes
- Fixed a bug in
Create.semantic.twitterin which a sum operation calculating edge weights would setNAvalues for all edges due toNAvalues present in the hashtag fields. This occurs when there are tweets with no hashtags in the twitter collection and is now checked. - Some UTF encoding issues in
Create.semantic.twitterwere also fixed.
vosonSML 0.27.0
Bug Fixes
- Fixed a bug in
Collect.twitterin which any additionaltwitter APIparameters e.glangoruntilwere not being passed properly tortweet::search_tweets. This resulted in the additional parameters being ignored.
Major Changes
- Removed the
SaveCredentialandLoadCredentialfunctions, as well as theuseCachedTokenparameter forAuthenticate.twitter. These were simply calling thesaveRDSandreadRDSfunctions and not performing any additional processing. UsingsaveRDSandreadRDSdirectly to save and load anAuthenticatecredential object to file is simpler. - Changed the way that the
cleanTextparameter works inCreate.actor.redditso that it is more permissive. Addresses encoding issues with apostrophes and pound symbols and removes unicode characters not permitted by the XML 1.0 standard as used ingraphmlfiles. This is best effort and does not resolve allreddittext encoding issues.
Minor Changes
- Added
Collect.twittersummary information that includes the earliest (min) and latest (max) tweetstatus_idcollected with timestamp. Thestatus_idvalues can be used to frame subsequent collections assince_idormax_idparameter values. If theuntildate parameter was used the timestamp can also be used as a quick confirmation. - Added elapsed time output to the
Collectmethod.
vosonSML 0.26.3
CRAN release: 2019-02-22
vosonSML 0.26.2
Bug Fixes
- Fixed a bug in
Create.actor.twitterandCreate.bimodal.twitterin which the vertices dataframe provided to thegraph_from_data_framefunction as a contained duplicate names raising an error.
vosonSML 0.25.0
Major Changes
- Replaced the
twitteRtwitter collection implementation with thertweetpackage. - A users
twitterauthentication token can now be cached in the.twitter_oauth_tokenfile and used for subsequenttwitter APIrequests without re-authentication. A new authentication token can be cached by deleting this file and using the re-using the parameteruseCachedToken = TRUE.