The Twitter features of vosonSML use the rtweet package, please ensure version 1.1 or later is installed.

packageVersion("rtweet")
## [1] '1.1.0'

Importing tweets from rtweet

Because vosonSML uses the rtweet package to access and collect tweets data, rtweet data is also able to be easily imported from dataframe or file and then transformed into a Collect object for use in creating networks.

tw_auth_rt <- readRDS("~/.rtweet_oauth1a")

tweets <- rtweet::search_tweets("#auspol", n = 20, token = tw_auth_rt)
data_tw <- ImportRtweet(tweets)

names(data_tw)
## [1] "tweets" "users"
class(data_tw)
## [1] "datasource" "twitter"    "list"

Authenticate with the Twitter API

Authenticate is used to create an object that contains a Twitter token for accessing the Twitter API. This can and should be re-used by saving it once to file after calling Authenticate and then by loading it again during future sessions.

library(vosonSML)

# twitter authentication creates an access token as part of the auth object
auth_tw_bearer <- Authenticate("twitter", bearerToken = "xxxxxxxxxxxx")

# save the object to file after authenticate
saveRDS(auth_tw_bearer, file = "~/.auth_tw_bearer")
# load a previously saved auth object for use in collect
auth_tw_bearer <- readRDS("~/.auth_tw_bearer")

Collect tweets using a search or from user timelines

Collect can be used to perform a twitter search with a search term or collect tweets from timelines using user names. The following example collects 100 recent tweets for the hashtag #auspol and creates a dataframe with the collected tweet data.

# set output data directory
options(voson.data = "./vsml-data")

# collect 100 recent tweets for the hashtag #auspol
collect_tw <- auth_tw_bearer |>
  Collect(searchTerm = "#auspol",
          searchType = "recent",
          numTweets = 100,
          includeRetweets = TRUE,
          writeToFile = TRUE,
          verbose = TRUE)
## Collecting tweets for search query...
## Search term: #auspol
## Requested 100 tweets of 44900 in this search rate limit.
## Rate limit reset: 2023-04-02 07:43:58
## 
## tweet        | status_id           | created            
## --------------------------------------------------------
## Latest Obs   | 1642429383460925441 | 2023-04-02 07:30:54
## Earliest Obs | 1642428234058039297 | 2023-04-02 07:26:20
## Collected 100 tweets.
## RDS file written: ./vsml-data/2023-04-02_073109-TwitterData.rds
## Done.

The next example collects the 100 most recent tweets from the @vosonlab and @ANU_SOC user timelines. Note that this method requires the endpoint = "timeline" parameter.

# collect 100 timeline tweets for each specified user
collect_tw_tl <- auth_tw_bearer |>
  Collect(endpoint = "timeline",
          users = c("vosonlab", "ANU_SOCY"),
          numTweets = 100,
          writeToFile = TRUE,
          verbose = TRUE)
## Collecting timeline tweets for users...
## Requested 200 tweets of 149800 in this search rate limit.
## Rate limit reset: 2023-04-02 07:44:01
## 
## tweet        | status_id           | created            
## --------------------------------------------------------
## Latest Obs   | 1641303253823680512 | 2023-03-30 04:56:04
## Earliest Obs | 1437995199636983808 | 2021-09-15 04:22:26
## Collected 200 tweets.
## RDS file written: ./vsml-data/2023-04-02_073113-TwitterData.rds
## Done.

The output for these methods also lists the earliest and most recent tweet as well as the number of tweets collected.

Create twitter activity, actor, semantic and 2-mode network graphs

The twitter Create function accepts the data from Collect and a type parameter of activity, actor, semantic or twomode that specifies the type of network to create from the collected data. Create produces two dataframes, one for network nodes and one for node relations or edges in the network. These can then undergo further processing as per the supplemental functions section or be passed to the Graph function that creates an igraph object.

Activity network

Nodes are tweets and edges are the relationship to other tweets such as reply, retweet or quote tweets.

net_activity <- collect_tw |> Create("activity", verbose = TRUE)
## Generating twitter activity network...
## -------------------------
## collected tweets | 100
## tweet            | 2 
## retweet          | 92
## reply            | 4 
## quote            | 2 
## nodes            | 167
## edges            | 100
## -------------------------
## Done.
g_activity <- net_activity |> Graph(writeToFile = TRUE, verbose = TRUE)
## Creating igraph network graph...
## GRAPHML file written: ./vsml-data/2023-04-02_173114-TwitterActivity.graphml
## Done.

g_activity
## IGRAPH 5943f8d DN-- 167 100 -- 
## + attr: type (g/c), name (v/c), author_id (v/c), author_screen_name
## | (v/c), created_at (v/c), user_id (e/c), screen_name (e/c), created_at
## | (e/c), edge_type (e/c)
## + edges from 5943f8d (vertex names):
##  [1] 1642429383460925441->1642343694660677632
##  [2] 1642429382257147904->1641392034991726592
## + ... omitted several edges

Actor network

Nodes are twitter users and edges are the relationship to other users in the network such as reply, mention, retweet and quote tweets. Mentions can be excluded by setting the parameter inclMentions to FALSE.

net_actor <- collect_tw |>
  Create("actor", inclMentions = TRUE, verbose = TRUE)
## Generating twitter actor network...
## -------------------------
## collected tweets | 100
## tweet            | 2 
## retweet          | 92
## reply mention    | 1 
## reply            | 4 
## quote mention    | 3 
## quote            | 2 
## nodes            | 125
## edges            | 104
## -------------------------
## Done.
g_actor <- net_actor |> Graph(writeToFile = TRUE, verbose = TRUE)
## Creating igraph network graph...
## GRAPHML file written: ./vsml-data/2023-04-02_173115-TwitterActor.graphml
## Done.

g_actor
## IGRAPH 59a5dc7 DN-- 125 104 -- 
## + attr: type (g/c), name (v/c), screen_name (v/c), status_id (e/c),
## | created_at (e/c), edge_type (e/c)
## + edges from 59a5dc7 (vertex names):
##  [1] 2849412290         ->2882834947
##  [2] 141563430          ->4168028659
## + ... omitted several edges

Semantic network

Nodes are concepts represented as common words and hashtags. Edges represent the occurence of a particular word and a particular hashtag in the same tweet. The semantic network is undirected.

# install additional required packages
# install.packages("stopwords")

# create a semantic network excluding the hashtag #auspol, include only the
# top 10% most frequent words and 20% most frequent hashtags as nodes
net_semantic <- collect_tw |>
  Create(
    "semantic",
    removeTermsOrHashtags = c("#auspol"),
    termFreq = 10,
    hashtagFreq = 20,
    verbose = TRUE
  )
## Generating twitter semantic network...
## Removing terms and hashtags: #auspol
## -------------------------
## retweets                 | 92
## tokens                   | 231
## removed specified        | 8 
## removed users            | 7 
## hashtag count            | 9 
## hashtags unique          | 9 
## term count               | 89
## terms unique             | 85
## top 20% hashtags n (>=1) | 9 
## top 10% terms n (>=1)    | 85
## nodes                    | 41
## edges                    | 91
## -------------------------
## Done.
g_semantic <- net_semantic |> Graph(writeToFile = TRUE, verbose = TRUE)
## Creating igraph network graph...
## GRAPHML file written: ./vsml-data/2023-04-02_173115-TwitterSemantic.graphml
## Done.

g_semantic
## IGRAPH 59de049 UN-B 41 91 -- 
## + attr: type (g/c), name (v/c), type (v/c), n (v/n), from.type (e/c),
## | to.type (e/c), status_id (e/c)
## + edges from 59de049 (vertex names):
##  [1] immoral     --#assisteddy... immoral     --#vad
##  [3] immoral     --#endoflif...   immoral     --#humanrights
## + ... omitted several edges

2-mode network

Nodes are twitter users or hashtags. Edges represent the use of a hashtag or the reference to another user in a tweet. The weighted parameter will add a simple frequency weight column for edges.

net_2mode <- collect_tw |>
  Create("twomode", 
         removeTermsOrHashtags = c("#auspol"),
         weighted = TRUE,
         verbose = TRUE)
## Generating twitter 2-mode network...
## Removing terms and hashtags: #auspol
## -------------------------
## collected tweets  | 100
## removed specified | 8 
## users             | 7 
## hashtags          | 9 
## nodes             | 22
## edges             | 16
## -------------------------
## Done.
g_2mode <- net_2mode |> Graph(writeToFile = TRUE, verbose = TRUE)
## Creating igraph network graph...
## GRAPHML file written: ./vsml-data/2023-04-02_173115-Twitter2mode.graphml
## Done.

mask(g_2mode)
## IGRAPH 59f5de1 DNWB 22 16 -- 
## + attr: type (g/c), name (v/c), type (v/c), user_id (v/c), screen_name
## | (v/c), status_id (e/c), created_at (e/c), is_retweet (e/l), is_quote
## | (e/l), is_reply (e/l), weight (e/n)
## + edges from 59f5de1 (vertex names):
##  [1] @uxxxxxxxxxlg  ->#assisteddy...   @uxxxxxxxxxlg  ->#vad            
##  [3] @uxxxxxxxxxlg  ->#endoflif...     @uxxxxxxxxxlg  ->#humanrights    
##  [5] @axxxxdo       ->#turnbull        @axxxxdo       ->#murdoch        
##  [7] @axxxxdo       ->#media           @cxxxxxxxxbc   ->@cxxxxxxxxxnl
## + ... omitted several edges