Bug Fixes

  • Fixed a reddit data collection issue for threads that are specified using shorter URL’s without the title part and that contain continue thread links. These links were resolving to the main thread resulting in duplication of comments and thread structures.

Minor Changes

  • Added a parameter to Mastodon network Create() function named subtype for creating variations to the activity and actor networks. For the activity network a subtype = tag parameter can be used to create a tag network of post tags that are colocated. For the actor network a subtype = server parameter can be used to create a server network, which is an actor network reduced to server associations.

Major Changes

  • Added Mastodon authentication, collection and network creation. There are two options for Mastodon collection, a hashtag search for global or local server timeline posts that is optionally authenticated: Collect.search.mastodon(), and a public thread collection function using input URL’s that is similar to Reddit thread collection that requires no authentication: Collect.thread.mastodon(). To access these methods via Collect an endpoint = "search" or endpoint = "thread" parameter should be passed to the functions.
  • The Mastodon authentication and collection uses the rtoot package and a function has been created for importing rtoot data into vosonSML called ImportRtoot. Imported data can be passed as input to the Create network functions.

Minor Changes

  • Changed default Reddit request wait time range from 3 to 5 seconds, to 6 to 8 seconds to avoid a proposed platform rate limit of 10 requests per minute. This value can still be manually set using the waitTime = c(min, max) wait time range parameter.

Bug Fixes

  • Fixed a bug in the regex for Reddit URL parsing in which thread ID’s were limited to 6 characters.
  • Fixed verbose output for 2mode networks to use option specified method.
  • Fixed an issue with adding text to Twitter networks caused by missing columns in the data.
  • Added twitter tokenization functions that were recently removed from the tidytext and tokenizers packages due to a change in the ICU library unicode standard and the stringi package (tokenizers issue #82). This affects only the generation of semantic and 2mode twitter networks and the fix maintains their functionality until an alternative tweet tokenization method is implemented. Unfortunately these two twitter network types are not supported on systems using ICU library versions >= 72.0 at this time.
  • Fixed an intermitant column mismatch error in Twitter caused by unexpected type when data merging.
  • Fixed the number of tweet observations does not match number of users error reported with rtweet v1.1.
  • Fixed number of tweets requested count in verbose message for Twitter timeline collection.
  • Fixed a bug in Reddit thread collection where URL’s missing trailing slashes would trigger loop protection errors.
  • Changed the default sort parameter value for Reddit threaad collection to be NA. Default sort order on Reddit is not a fixed value.

Major Changes

  • Added sort parameter to Reddit collection. As this collection method is limited, it may be useful to request comments in sort order using the Reddit sort options top, new, controversial, old, qa and best.
  • Added a Collect.listing function for subreddits on Reddit. This is not a search, however it allows complete metadata for a specified number of subreddit threads to be collected in sorted order. The sort options are hot, top, new and rising. There is a further time parameter period that can be set to hour, day, week, month, year or all if sort = top, meaning for example, results sorted by top threads over the last week.

Minor Changes

  • Added simple log file output for Collect and Merge functions when writeToFile = TRUE. The log file is written in the same location as the data file with the .txt extension appended.
  • Changed data output path option option(voson.data = "my-data") to now attempt to create the directory if it does not exist.

Bug Fixes

  • Fixed two issues that arose from the introduction of tibbles and verbose messaging in Collect.reddit().
  • Fixed an error caused by unescaped regex parameters in hyperlinks processed by Collect.web() (#49).

Major Changes

  • Re-wrote and modified vosonSML Twitter functions to support major changes made in rtweet release version 1.0.2.
  • Added an endpoint parameter to the Twitter Collect function. It is set to search by default, which is the usual collect behaviour, but can also now be set to timeline to collect user timelines instead. See Collect.timeline.twitter() for parameters.
  • Changed output message system. vosonSML functions are now silent by default. Using the verbose parameter will again print function output.
  • Changed output messages to use the message() function instead of the cat() function by default. Setting the global option option(voson.msg = FALSE) will again redirect output to cat(). The option can be removed by assigning a value of NULL.
  • Added the voson.data option allowing a directory path to be set for writeToFile output files. Files are output to the current working directory by default, however a new directory can now be set with option(voson.data = "my-data") for example. The directory path can be relative or a full path, but must be created beforehand or already exist. If the path is invalid or does not exist it will continue with the default behaviour. This option can be removed by assigning a value of NULL. This will not effect other file write operations performed by the user.
  • The Twitter AddText() and AddUserData() functions now work with most Twitter network types.
  • AddText() now adds columns for embedded tweet text and has a hashtags parameter to add a list of tweet hashtags as a network attribute.
  • AddUserData() now adds an additional dataframe for missing_users. It lists the ids and screen names of users that did not have metadata embedded in the collected data. Using the lookupUsers parameter will retrieve the metadata using the twitter API. Additonally passing the refresh = TRUE parameter will now retrieve and update the metadata for all users in the network.
  • Twitter data collection now returns a named list of two dataframes containing tweets and users.
  • Removed the ImportData function and replaced it with ImportRtweet() for rtweet version 1.0 format data.
  • Added Merge() and MergeFiles() functions to support the merging of collected data from separate operations. These functions support input of multiple Collect objects or .RDS files, automatically detect the datasource type and support the writeToFile parameter for file output of merged data.

Minor Changes

  • Re-wrote YouTube id extraction from url function to be more robust and added support for YouTube shorts urls.
  • Removed stand-alone GetYoutubeVideoIDs function. The YouTube collect function parameter videoIDs will now accept video ids or video urls.
  • Added wrappers and aliases for some functions. Twitter auth objects can now be created with simplified auth_twitter_app(), auth_twitter_dev() and auth_twitter_user() functions for each token type. The collect_reddit_threads() and collect_web_hyperlinks() functions skip the unecessary Authenticate step for Reddit and web data collection.

Bug Fixes

  • Incorrectly ordered tweets by status ID to summarise collected tweet range. The Min ID and Max ID are not necessarily the earliest and latest tweet in the tweets collected and therefore not ideal for delimiting subsequent collections. Instead the two Earliest Obs and two Latest Obs tweets as returned by the Twitter API are now reported.

Major Changes

  • Added enpoint parameter to Collect, allowing search or timeline to be specified for a twitter data collection. If it is not specified the default is a twitter search.
  • The timeline collection accepts a users vector of user names or ID’s or a mixture of both, and will return up to 3,200 of each users most recent tweets.
  • Minimum required version of R has changed from 3.6 to 4.1.

Minor Changes

  • Updated standard package documentation, added citation, code of conduct and README.Rmd.
  • Replaced magrittr pipes with native pipe operators.

Minor Changes

  • Updated standard package documentation, added citation and README.Rmd.

Major Changes

  • Re-implemented Create.actor.twitter and Create.activity.twitter to use dplyr and data.table techniques consistent with other package network creation functions. Both functions are significantly faster for large collection dataframes.

Minor Changes

  • Create.actor.twitter includes two new parameters for mentions, inclMentions that will process and include mentions edges in the network and inclRtMentions that will process and include mentions found in retweets. The inclMentions parameter is set to TRUE by default and inclRtMentions set to FALSE. The inclRtMentions parameter is a subset of mentions, therefore for it to be set to TRUE, inclMentions must also be TRUE.
  • Re-implemented and simplified the Create.activity.twitter network creation. Added author_id and author_screen_name to nodes to assist with labels or re-creating tweet URLs from data.
  • Added rmEdgeTypes parameter to Create.activity.twitter and Create.actor.twitter. These accept a list of edge types that can be filtered out of the network during network creation.
  • Removed label attributes from igraph graphs generated by the Graph function.
  • Tidied up and renamed many of the utils functions. Removed unused functions.
  • Added last observation tweet to minimum and maximum status ID values reported for twitter collections. Usually the last observation and Min ID will be the same, but sometimes the Min ID is outside of the expected collection range. The last observation is a more reliable tweet to use as the starting point for subsequent search collections.
  • Cleaned up package imports, suggests and added some interactive package checks to reduce the number of required imports.

Major Changes

  • Added a web crawler Collect method with hyperlink network creation. The Create function with activity type parameter creates a network where nodes are web pages and edges the hyperlinks linking them (extracted from a href HTML tags). The actor network has page or site domains as the nodes and again the hyperlinks from linking pages between domains.

Minor Changes

  • Prepending instead of appeneding S3 class names to Collect dataframes to avoid dplyr issues.
  • Removed retryOnRateLimit set to FALSE if rate limit cannot be determined.
  • ImportData will now accept a file path or a dataframe.

Bug Fixes

  • S3 class names were being added to Collect dataframes after writeToFile. Should no longer be required to manually add class names or use ImportData to load RDS files to use previously saved data with Create functions.

Minor Changes

  • Minor documentation updates to Create.semantic.twitter, Create.twomode.twitter and the Intro-to-vosonSML vignette:
    • Specified the tidyr, tidytext and stopwords package requirements in descriptions and examples
    • Updated references to twomode networks as 2-mode where possible

Bug Fixes

  • Fixed an issue with custom classes assigned to dataframes causing an vctrs error when using dplyr functions. The classes are no longer needed post-method routing so they are simply removed.
  • Replaced an instance of the deprecated dplyr::funs function that was generating a warning.

Minor Changes

  • Minor documentation updates.

Bug Fixes

  • Fixed a reddit collect bind_rows error on joining dataframes with different types for the structure column. Column type was being set to integer instead of character in cases when every thread comment have no replies or depth (except the OP).

Minor Changes

  • Reimplemented the Create.semantic.twitter and Create.twomode.twitter functions using the tidytext package. They now better support tokenization of tweet text and allows a range of stopword lists and sources to be used from the stopwords package. The semantic network function requires the tidytext and tidyr packages to be installed before use.
  • New parameters have been added to Create.semantic.twitter:
    • Numbers and urls can be removed or included from the term list using removeNumbers and removeUrls, default value is TRUE.
    • The assoc parameter has been added to choose which node associations or ties to include in the network. The default value is "limited" and includes only ties between most frequently occurring hashtags and terms in tweets. A value of full will also include ties between most frequently occurring hashtags and hashtags, and terms with terms creating a more densely connected network.
    • Parameters to specify stopwords language e.g stopwordsLang = "en" and source e.g stopwordsSrc = "smart" have been added. These correspond to the language and source parameters of the tidytext::get_stopwords function. The stopwords default value is TRUE.
  • The network produced by the Create.twomode.twitter function is weighted by default but can be disabled by setting the new weighted parameter to FALSE.
  • Renamed the replies_from_text parameter to repliesFromText and at_replies_only to atRepliesOnly in the AddText.actor.youtube function for consistency.
  • Improved the usage examples in the README file.
  • Removed tm package dependency.

Minor Changes

  • Updated Introduction to vosonSML vignette Merging Collected Data examples.
  • Added new hex sticker to package documentation.

Bug Fixes

  • Fixed a logic problem in Collect.youtube that was causing no video comments to be collected if there were no reply comments for any of the videos first maxComments number of top level comments. For example, if maxComments is set to 100 and the first 100 comments made to a video had no replies then no results would be returned.

Bug Fixes

  • A recent intermittent problem with the Twitter API caused an issue with the rtweet::rate_limit function that resulted in an error when using the rtweet retryonratelimit search parameter. The rate_limit function was being called by vosonSML to check the twitter rate limit regardless of whether the search parameter was set or not, and so was failing Collect with an error. A fix was made so that vosonSML checks if rtweet::rate_limit succeeds, and if not automatically sets retryonratelimit to FALSE so that a twitter Collect can still be performed without error should this problem occur again.

Minor Changes

  • Added some links to the pkgdown site navbar.

Minor Changes

  • Added some guidance for merging collected data to the Introduction to vosonSML vignette.

Minor Changes

  • Added Introduction to vosonSML vignette to the package.
  • Minor changes and input checks added to ImportData.
  • Added some unit testing for Authenticate and ImportData.

Minor Changes

  • Reddit JSON is now retrieved using jsonlite::fromJSON.
  • Reddit ‘Continue’ threads are now followed with additional thread requests. Many more comments are now collected for threads with large diameters or breadth. Continue threads also have a Reddit limit of 500 comments per thread request.
  • Reddit comment ID’s and timestamps are now extracted.
  • Removed the tictoc package from dependency imports to suggested packages.
  • Added some checks for whether the rtweet package is installed.
  • Removed the RedditExtractoR package from imports.
  • HTML decoded tweet text during network creation to replace ‘&’, ‘<’, and ‘>’ HTML codes.
  • Added node type attribute to twomode networks.

Minor Changes

  • Renamed bimodal networks to twomode.

Minor Changes

  • Added output messages from supplemental functions such as AddText() and Graph(). Also improved consistency of output messages from Collect and Create functions.

Bug Fixes

  • Added a fix reddit gsub locale error https://github.com/vosonlab/vosonSML/issues/21.
  • Changed bimodal network hashtags to lowercase as filter terms when entered are converted to lowercase.
  • Fixed errors thrown when removing terms from bimodal and semantic networks.
  • Removed a duplicate GetVideoData() function call in AddVideoData.
  • Fixed data type errors in AddText functions related to strict typing by dplyr::if_else function.

Minor Changes

  • A feature was added to the youtube actor AddText function to redirect edges towards actors based on the presence of a screen name or @screen name that may be found at the beginning of a reply comment. Typically reply comments are directed towards a top-level comment, this instead captures when reply comments are directed to other commenters in the thread.

Minor Changes

  • Changed youtube actor network identifiers to be their unique Channel ID instead of their screen names.
  • Created the AddVideoData function to add collected video data to the youtube actor network. The main purpose of this function is to replace video identifiers with the Channel ID of the video publisher (actor) instead. To get the Channel ID of video publishers an additional API lookup for the videos in the network is required. Additional columns such as video Title, Description and Published time are also added to the network $edges dataframe as well as returned in their own dataframe called $videos.

Major Changes

  • Created the AddText function to add collected text data to networks. This feature applies to activity and actor networks and will typically add a node attribute to activity networks and an edge attribute to actor networks. For example, this function will add the column vosonTxt_tweets containing tweet text to $nodes if passed an activity network, and to $edges if passed an actor network.
  • Generation of igraph graph objects and subsequent writing to file has been removed from the Create function and placed in a new function Graph. This change abstracts the graph creation and makes it optional, but also allows supplemental network steps such as AddText to be performed prior to creating the final igraph object.

Minor Changes

  • Removed writeToFile parameter from Create functions and added it to Graph.
  • Removed weightEdges, textData and cleanText parameters from Create.actor.reddit. cleanText is now a parameter of AddText.activity.reddit and AddText.actor.reddit.
  • Replaced AddTwitterUserData with AddUserData function that works similarly to AddText. This function currently only applies to twitter actor networks and will add, or download add if missing, user profile information to actors as node attributes.

Minor Changes

  • Added activity network type for reddit. In the reddit activity network nodes are the thread posts and comments, edges represent where comments are directed in the threads.
  • Added github dev version badge to README.

Major Changes

  • Added new activity network type for twitter and youtube Create function. In this network nodes are the items collected such as tweets returned from a twitter search and comments posted to youtube videos. Edges represent the platform relationship between the tweets or comments.

Minor Changes

  • Added a new twitter actor network edge type self-loop. This aims to facilitate the later addition of tweet text to the network graph for user tweets that have no ties to other users.

Minor Changes

  • Added twitter interactive web authorization of an app as provided by rtweet::create_token. Method is used when only twitter app name and consumer keys are passed to Authenticate.twitter as parameters. e.g Authenticate("twitter", appName = "An App", apiKey = "xxxxxxxxxxxx", apiSecret = "xxxxxxxxxxxx"). A browser tab will open asking the user to authorize the app to their twitter account to complete authentication. This is using twitters Application-user authentication: OAuth 1a (access token for user context) method.
  • It is suspected that Reddit is rate-limiting some generic R UA strings. So a User-Agent string is now set for underlaying R Collect functions (e.g file) via the HTTPUserAgent option. It is temporarily set to package name and current version number for Collect e.g vosonSML v.0.27.2 (R Package).
  • Removed hex sticker (and favicons for pkgdown site).

Bug Fixes

  • Fixed a bug in Create.semantic.twitter in which a sum operation calculating edge weights would set NA values for all edges due to NA values present in the hashtag fields. This occurs when there are tweets with no hashtags in the twitter collection and is now checked.
  • Some UTF encoding issues in Create.semantic.twitter were also fixed.

Minor Changes

  • Added ‘#’ to hashtags and ‘@’ to mentions in twitter semantic network to differentiate between hashtags, mentions and common terms.

Bug Fixes

  • Fixed a bug in Collect.twitter in which any additional twitter API parameters e.g lang or until were not being passed properly to rtweet::search_tweets. This resulted in the additional parameters being ignored.

Major Changes

  • Removed the SaveCredential and LoadCredential functions, as well as the useCachedToken parameter for Authenticate.twitter. These were simply calling the saveRDS and readRDS functions and not performing any additional processing. Using saveRDS and readRDS directly to save and load an Authenticate credential object to file is simpler.
  • Changed the way that the cleanText parameter works in Create.actor.reddit so that it is more permissive. Addresses encoding issues with apostrophes and pound symbols and removes unicode characters not permitted by the XML 1.0 standard as used in graphml files. This is best effort and does not resolve all reddit text encoding issues.

Minor Changes

  • Added Collect.twitter summary information that includes the earliest (min) and latest (max) tweet status_id collected with timestamp. The status_id values can be used to frame subsequent collections as since_id or max_id parameter values. If the until date parameter was used the timestamp can also be used as a quick confirmation.
  • Added elapsed time output to the Collect method.

Bug Fixes

  • Fixed bugs in Create.actor.reddit that were incorrectly creating edges between top-level commentors and thread authors from different threads. These bugs were only observable in when collecting multiple reddit threads.

Minor Changes

  • Improved output for reddit collection. Removed the progress bar and added a table of results summarising the number of comments collected for each thread.
  • Added to twitter collection output the users twitter API reset time.

Bug Fixes

  • Fixed a bug in Create.actor.twitter and Create.bimodal.twitter in which the vertices dataframe provided to the graph_from_data_frame function as a contained duplicate names raising an error.

Major Changes

  • Revised and updated roxygen documentation and examples for all package functions.
  • Updated all Authenticate, Collect and Create S3 methods to implement function routing based on object class names.

Minor Changes

  • Created a pkgdown web site for github hosted package documentation.
  • Created a new hex sticker logo.

Major Changes

  • Replaced the twitteR twitter collection implementation with the rtweet package.
  • A users twitter authentication token can now be cached in the .twitter_oauth_token file and used for subsequent twitter API requests without re-authentication. A new authentication token can be cached by deleting this file and using the re-using the parameter useCachedToken = TRUE.