NEWS.md
Mastodon
network Create()
function named subtype
for creating variations to the activity
and actor
networks. For the activity
network a subtype = tag
parameter can be used to create a tag
network of post tags that are colocated. For the actor
network a subtype = server
parameter can be used to create a server
network, which is an actor
network reduced to server associations.Mastodon
authentication, collection and network creation. There are two options for Mastodon
collection, a hashtag search for global or local server timeline posts that is optionally authenticated: Collect.search.mastodon()
, and a public thread collection function using input URL’s that is similar to Reddit
thread collection that requires no authentication: Collect.thread.mastodon()
. To access these methods via Collect
an endpoint = "search"
or endpoint = "thread"
parameter should be passed to the functions.Mastodon
authentication and collection uses the rtoot
package and a function has been created for importing rtoot
data into vosonSML
called ImportRtoot
. Imported data can be passed as input to the Create
network functions.Reddit
URL parsing in which thread ID’s were limited to 6 characters.2mode
networks to use option specified method.Twitter
networks caused by missing columns in the data.tidytext
and tokenizers
packages due to a change in the ICU library unicode standard and the stringi
package (tokenizers issue #82). This affects only the generation of semantic
and 2mode
twitter networks and the fix maintains their functionality until an alternative tweet tokenization method is implemented. Unfortunately these two twitter network types are not supported on systems using ICU library versions >= 72.0 at this time.Twitter
caused by unexpected type when data merging.rtweet
v1.1.Twitter
timeline collection.Reddit
thread collection where URL’s missing trailing slashes would trigger loop protection errors.sort
parameter value for Reddit
threaad collection to be NA
. Default sort order on Reddit
is not a fixed value.sort
parameter to Reddit
collection. As this collection method is limited, it may be useful to request comments in sort order using the Reddit
sort options top
, new
, controversial
, old
, qa
and best
.Collect.listing
function for subreddits on Reddit
. This is not a search, however it allows complete metadata for a specified number of subreddit threads to be collected in sorted order. The sort options are hot
, top
, new
and rising
. There is a further time parameter period
that can be set to hour
, day
, week
, month
, year
or all
if sort = top
, meaning for example, results sorted by top threads over the last week.Collect
and Merge
functions when writeToFile = TRUE
. The log file is written in the same location as the data file with the .txt
extension appended.option(voson.data = "my-data")
to now attempt to create the directory if it does not exist.Collect.reddit()
.Collect.web()
(#49).vosonSML
Twitter
functions to support major changes made in rtweet
release version 1.0.2.endpoint
parameter to the Twitter
Collect
function. It is set to search
by default, which is the usual collect behaviour, but can also now be set to timeline
to collect user timelines instead. See Collect.timeline.twitter()
for parameters.vosonSML
functions are now silent by default. Using the verbose
parameter will again print function output.message()
function instead of the cat()
function by default. Setting the global option option(voson.msg = FALSE)
will again redirect output to cat()
. The option can be removed by assigning a value of NULL
.voson.data
option allowing a directory path to be set for writeToFile
output files. Files are output to the current working directory by default, however a new directory can now be set with option(voson.data = "my-data")
for example. The directory path can be relative or a full path, but must be created beforehand or already exist. If the path is invalid or does not exist it will continue with the default behaviour. This option can be removed by assigning a value of NULL
. This will not effect other file write operations performed by the user.Twitter
AddText()
and AddUserData()
functions now work with most Twitter
network types.AddText()
now adds columns for embedded tweet text and has a hashtags
parameter to add a list of tweet hashtags as a network attribute.AddUserData()
now adds an additional dataframe for missing_users
. It lists the ids and screen names of users that did not have metadata embedded in the collected data. Using the lookupUsers
parameter will retrieve the metadata using the twitter API. Additonally passing the refresh = TRUE
parameter will now retrieve and update the metadata for all users in the network.tweets
and users
.ImportData
function and replaced it with ImportRtweet()
for rtweet
version 1.0 format data.Merge()
and MergeFiles()
functions to support the merging of collected data from separate operations. These functions support input of multiple Collect objects or .RDS
files, automatically detect the datasource type and support the writeToFile
parameter for file output of merged data.YouTube
id extraction from url function to be more robust and added support for YouTube
shorts urls.GetYoutubeVideoIDs
function. The YouTube
collect function parameter videoIDs
will now accept video ids or video urls.auth_twitter_app()
, auth_twitter_dev()
and auth_twitter_user()
functions for each token type. The collect_reddit_threads()
and collect_web_hyperlinks()
functions skip the unecessary Authenticate
step for Reddit
and web data collection.status ID
to summarise collected tweet range. The Min ID
and Max ID
are not necessarily the earliest and latest tweet in the tweets collected and therefore not ideal for delimiting subsequent collections. Instead the two Earliest Obs
and two Latest Obs
tweets as returned by the Twitter API
are now reported.enpoint
parameter to Collect
, allowing search
or timeline
to be specified for a twitter
data collection. If it is not specified the default is a twitter search
.timeline
collection accepts a users
vector of user names or ID’s or a mixture of both, and will return up to 3,200 of each users most recent tweets.Create.actor.twitter
and Create.activity.twitter
to use dplyr
and data.table
techniques consistent with other package network creation functions. Both functions are significantly faster for large collection dataframes.Create.actor.twitter
includes two new parameters for mentions
, inclMentions
that will process and include mentions
edges in the network and inclRtMentions
that will process and include mentions found in retweets. The inclMentions
parameter is set to TRUE
by default and inclRtMentions
set to FALSE
. The inclRtMentions
parameter is a subset of mentions, therefore for it to be set to TRUE
, inclMentions
must also be TRUE
.Create.activity.twitter
network creation. Added author_id
and author_screen_name
to nodes to assist with labels or re-creating tweet URLs from data.rmEdgeTypes
parameter to Create.activity.twitter
and Create.actor.twitter
. These accept a list of edge types that can be filtered out of the network during network creation.Graph
function.Min ID
will be the same, but sometimes the Min ID
is outside of the expected collection range. The last observation is a more reliable tweet to use as the starting point for subsequent search collections.Collect
method with hyperlink network creation. The Create
function with activity
type parameter creates a network where nodes are web pages
and edges the hyperlinks
linking them (extracted from a href
HTML tags). The actor
network has page or site domains
as the nodes and again the hyperlinks
from linking pages between domains.vctrs
error when using dplyr
functions. The classes are no longer needed post-method routing so they are simply removed.dplyr::funs
function that was generating a warning.Create.semantic.twitter
and Create.twomode.twitter
functions using the tidytext
package. They now better support tokenization of tweet text and allows a range of stopword lists and sources to be used from the stopwords
package. The semantic network function requires the tidytext
and tidyr
packages to be installed before use.Create.semantic.twitter
:
removeNumbers
and removeUrls
, default value is TRUE
.assoc
parameter has been added to choose which node associations or ties to include in the network. The default value is "limited"
and includes only ties between most frequently occurring hashtags and terms in tweets. A value of full
will also include ties between most frequently occurring hashtags and hashtags, and terms with terms creating a more densely connected network.stopwords
language e.g stopwordsLang = "en"
and source e.g stopwordsSrc = "smart"
have been added. These correspond to the language
and source
parameters of the tidytext::get_stopwords
function. The stopwords
default value is TRUE
.Create.twomode.twitter
function is weighted by default but can be disabled by setting the new weighted
parameter to FALSE
.replies_from_text
parameter to repliesFromText
and at_replies_only
to atRepliesOnly
in the AddText.actor.youtube
function for consistency.tm
package dependency.Introduction to vosonSML
vignette Merging Collected Data
examples.Collect.youtube
that was causing no video comments to be collected if there were no reply comments for any of the videos first maxComments
number of top level comments. For example, if maxComments
is set to 100 and the first 100 comments made to a video had no replies then no results would be returned.rtweet::rate_limit
function that resulted in an error when using the rtweet retryonratelimit
search parameter. The rate_limit
function was being called by vosonSML
to check the twitter rate limit regardless of whether the search parameter was set or not, and so was failing Collect
with an error. A fix was made so that vosonSML
checks if rtweet::rate_limit
succeeds, and if not automatically sets retryonratelimit
to FALSE
so that a twitter Collect
can still be performed without error should this problem occur again.jsonlite::fromJSON
.tictoc
package from dependency imports to suggested packages.rtweet
package is installed.RedditExtractoR
package from imports.twomode
networks.reddit
gsub locale error https://github.com/vosonlab/vosonSML/issues/21.bimodal
network hashtags to lowercase as filter terms when entered are converted to lowercase.bimodal
and semantic
networks.GetVideoData()
function call in AddVideoData
.AddText
functions related to strict typing by dplyr::if_else
function.AddText
function to redirect edges towards actors based on the presence of a screen name
or @screen name
that may be found at the beginning of a reply comment. Typically reply comments are directed towards a top-level comment, this instead captures when reply comments are directed to other commenters in the thread.actor
network identifiers to be their unique Channel ID
instead of their screen names
.AddVideoData
function to add collected video data to the youtube actor
network. The main purpose of this function is to replace video identifiers with the Channel ID
of the video publisher (actor) instead. To get the Channel ID
of video publishers an additional API lookup for the videos in the network is required. Additional columns such as video Title
, Description
and Published
time are also added to the network $edges
dataframe as well as returned in their own dataframe called $videos
.AddText
function to add collected text data to networks. This feature applies to activity
and actor
networks and will typically add a node attribute to activity networks and an edge attribute to actor networks. For example, this function will add the column vosonTxt_tweets
containing tweet text to $nodes
if passed an activity network, and to $edges
if passed an actor network.igraph
graph objects and subsequent writing to file has been removed from the Create
function and placed in a new function Graph
. This change abstracts the graph creation and makes it optional, but also allows supplemental network steps such as AddText
to be performed prior to creating the final igraph object.writeToFile
parameter from Create
functions and added it to Graph
.weightEdges
, textData
and cleanText
parameters from Create.actor.reddit
. cleanText
is now a parameter of AddText.activity.reddit
and AddText.actor.reddit
.AddTwitterUserData
with AddUserData
function that works similarly to AddText
. This function currently only applies to twitter actor networks and will add, or download add if missing, user profile information to actors as node attributes.rtweet::create_token
. Method is used when only twitter app name and consumer keys are passed to Authenticate.twitter
as parameters. e.g Authenticate("twitter", appName = "An App", apiKey = "xxxxxxxxxxxx", apiSecret = "xxxxxxxxxxxx")
. A browser tab will open asking the user to authorize the app to their twitter account to complete authentication. This is using twitters Application-user authentication: OAuth 1a (access token for user context)
method.file
) via the HTTPUserAgent
option. It is temporarily set to package name and current version number for Collect e.g vosonSML v.0.27.2 (R Package)
.Create.semantic.twitter
in which a sum operation calculating edge weights would set NA
values for all edges due to NA
values present in the hashtag fields. This occurs when there are tweets with no hashtags in the twitter collection and is now checked.Create.semantic.twitter
were also fixed.Collect.twitter
in which any additional twitter API
parameters e.g lang
or until
were not being passed properly to rtweet::search_tweets
. This resulted in the additional parameters being ignored.SaveCredential
and LoadCredential
functions, as well as the useCachedToken
parameter for Authenticate.twitter
. These were simply calling the saveRDS
and readRDS
functions and not performing any additional processing. Using saveRDS
and readRDS
directly to save and load an Authenticate
credential object to file is simpler.cleanText
parameter works in Create.actor.reddit
so that it is more permissive. Addresses encoding issues with apostrophes and pound symbols and removes unicode characters not permitted by the XML 1.0 standard as used in graphml
files. This is best effort and does not resolve all reddit
text encoding issues.Collect.twitter
summary information that includes the earliest (min) and latest (max) tweet status_id
collected with timestamp. The status_id
values can be used to frame subsequent collections as since_id
or max_id
parameter values. If the until
date parameter was used the timestamp can also be used as a quick confirmation.Collect
method.Create.actor.twitter
and Create.bimodal.twitter
in which the vertices dataframe provided to the graph_from_data_frame
function as a contained duplicate names raising an error.twitteR
twitter collection implementation with the rtweet
package.twitter
authentication token can now be cached in the .twitter_oauth_token
file and used for subsequent twitter API
requests without re-authentication. A new authentication token can be cached by deleting this file and using the re-using the parameter useCachedToken = TRUE
.