We use vosonSML to collect Twitter data and then use Botometer to test the bot status of a subset of Twitter users.
As social media platforms like Twitter become important spaces for information diffusion, discussion and opinion formation, serious concerns have been raised about the role of malicious socialbots in interfering, manipulating and influencing communication and public opinion Badawy, Ferrara, and Lerman (2018). Their detection and the understanding of consequent dynamics of behaviour are relevant to researchers and it is central to the research collaboration the VOSON Lab is involved in (see Cimiano et al. 2020).
In this post we will use vosonSML
to collect data via the Twitter API and construct a network represented as an igraph
graph object in R. Then, we will identify a subset of users and test their bot status using Botometer
1 (in python) and include the bot scores as node attributes in the network graph in R.
vosonSML
in RThe vosonSML vignette (Ackland, Gertzel, and Borquez 2020) provides comprehensive instructions on how to use vosonSML
. In this post, we are going to focus on the essential steps for Twitter collection and network generation via vosonSML
. The first step involves loading the vosonSML
package into the R session, and use the Web Auth
approach to create a Twitter API access token:
library(magrittr)
library(vosonSML)
twitterAuth <-
Authenticate(
"twitter",
appName = "An App",
apiKey = "xxxxxxxxxxxx",
apiSecret = "xxxxxxxxxxxx")
#Optionally, save the access token to disk:
saveRDS(twitterAuth, file = "twitter_auth.rds")
#The following loads into the current session a previously-created access token:
twitterAuth <- readRDS("twitter_auth.rds")
Then, we collect 100 tweets that contain the Australian politics hashtag #auspol. Data is saved in .rds
dataframe format.
twitterData <- twitterAuth %>%
Collect(
searchTerm = "#auspol",
numTweets = 100,
includeRetweets = FALSE,
retryOnRateLimit = TRUE,
writeToFile = TRUE)
To read the Twitter dataframe from disk, the ImportData()
function modifies the class values for the object before it is used with vosonSML
:
twitterData <- ImportData("2021-09-30_182359-TwitterData.rds", "twitter")
And now we use the vosonSML
functions Create("actor")
to create an Actor network and Graph()
to create an igraph
graph object g
.
The Create("actor")
function generates a named list containing two dataframes named nodes and edges. In this Actor network nodes are users who have either tweeted using the search term #auspol, or else are mentioned or replied to in tweets featuring the search terms. Edges represent interactions between Twitter users, and an edge attribute indicates whether the interaction is a mention, reply, retweet, quoted retweet or self-loop.
The output in the console loos like this:
> actorNetwork <- twitterData %>% Create("actor", vertbose=TRUE)
Generating twitter actor network...-------------------------
| 100
collected tweets | 119
tweet mention | 62
tweet | 27
reply mention | 21
reply | 5
quote mention | 17
quote | 226
nodes | 251
edges -------------------------
Done.`> g <- actorNetwork %>% Graph()
Creating igraph network graph...Done.
The graph object g
prints as follows:
> g
-- 226 251 --
GRAPH 527682d DN+ attr: type (g/c), name (v/c), screen_name (v/c), status_id (e/c), created_at (e/c),
| edge_type (e/c)
+ edges from 527682d (vertex names):
1] 1362279599191691264->1116612139 43447495 ->43447495
[3] 518488471 ->518488471 353381552 ->353381552
[5] 576131356 ->576131356 28305154 ->3288075858
[7] 1310795233651601408->3079563404 3842652433 ->3842652433
[9] 3219321554 ->3219321554 1327357902424666112->29387813
[11] 1327357902424666112->1548253015 1327357902424666112->1327357902424666112
[13] 37891446 ->37891446 1296548400 ->1296548400
[+ ... omitted several edges
We now use igraph
to manipulate the network. The simplify(g)
function removes multiple edges and loops from the network. Then, we proceed to identify a subset of Twitter users (5), based on indegree, which we will later use in our bot status analysis.
Given this network was created using tweets that contain the #auspol hashtag, it is not surprising that the top 5 Twitter users based on indegree are four politicians and a political commentator:
1] "ScottMorrisonMP" "DanielAndrewsMP" "JoshFrydenberg" "GladysB" "bruce_haigh"``` [
Since we are going to access the Botometer
API via Python, first we need to print the Twitter handles we want to check (5) with Botometer
to a .csv
file.
write.csv(data.frame(user=V(g)$screen_name[order(degree(g, mode="in"), decreasing=TRUE)][1:5]), "top-5_auspol.csv", row.names=FALSE)
Botometer
in pythonWe are now going use the Botometer API to find the bot scores for the 5 Twitter accounts. We will use the python client Botometer-python provided by the Botometer
team.
To use the Botometer
API you need to be able to authenticate using Twitter developer app API keys (same keys you use for Dev Auth
approach to authenticating for Twitter collection via vosonSML
). You also need a free RapidAPI (previously Mashape) account with the Botometer Pro API enabled (the Basic plan is free).
Below is a test of the Botometer
python client v4 , using code from the Indiana University Network Science Institute GitHub page.
To access Botometer
, enter the following code in a python shell or script. The second step involves checking the bot status of a single Twitter account:
import botometer
= "xx"
rapidapi_key = {
twitter_app_auth 'consumer_key': "xx",
'consumer_secret': "xx",
'access_token': "xx",
'access_token_secret': "xx"
}
= botometer.Botometer(wait_on_ratelimit=True,
bom =rapidapi_key,
rapidapi_key**twitter_app_auth)
# Check a single account by screen name
= bom.check_account('@clayadavis')
result print(result)
The result of our test prints as follows:
{"cap": {
"english": 0.4197222421546159,
"universal": 0.6608500314332488
},"display_scores": {
"english": {
"astroturf": 0.2,
"fake_follower": 1.2,
"financial": 0.0,
"other": 0.3,
"overall": 0.4,
"self_declared": 0.2,
"spammer": 0.0
},"universal": {
"astroturf": 0.2,
"fake_follower": 0.9,
"financial": 0.0,
"other": 0.3,
"overall": 0.8,
"self_declared": 0.0,
"spammer": 0.1
}
},"raw_scores": {
"english": {
"astroturf": 0.04,
"fake_follower": 0.23,
"financial": 0.0,
"other": 0.06,
"overall": 0.08,
"self_declared": 0.05,
"spammer": 0.01
},"universal": {
"astroturf": 0.04,
"fake_follower": 0.18,
"financial": 0.0,
"other": 0.06,
"overall": 0.17,
"self_declared": 0.0,
"spammer": 0.02
}
},"user": {
"majority_lang": "en",
"user_data": {
"id_str": "11330",
"screen_name": "test_screen_name"
}
} }
The descriptions of elements in the response e.g. users
, raw scores
, etc., are specified in the GitHub page.
This step involves reading in the .csv file with the 5 Twitter handles into python and run them through the botometer
API:
import pandas as pd
= pd.read_csv("top-5_auspol.csv")
users print(users)
"user\n 0 ScottMorrisonMP\n 1 DanielAndrewsMP\n 2 JoshFrydenberg\n 3 GladysB\n 4 bruce_haigh\n") cat(
Now, we collect the botscores and print the Complete Automation Probability (CAP) to the csv file.
= {} #use this to save all botometer results to file
results_dict = [] #use this for writing botometer CAP to csv
cap for i in users.user:
#print(i)
= bom.check_account('@'+i)
result #print(result)
'cap']['english']])
cap.append([i, result[= result
results_dict[i]
#write CAP score to csv
= pd.DataFrame(cap, columns=["user", "cap"])
df #print(df)
"top-5_auspol_cap.csv")
df.to_csv(
#write results dictionary to file
import json
open("top-5_auspol_botometer_results.txt",'w'))
json.dump(results_dict, #can be read back in with
#d2 = json.load(open("top-5_auspol_botometer_results.txt"))
The Botometer
API provides scores as Complete Automation Probability (CAP), defined as the probability, according to Botometer
models, that an account with a certain score or greater is a bot. More information on how to interpret the scores is available here.
Finally, we can read the botometer
scores back into R and include them as a node attribute in the graph.
df2 <- read.csv("top-5_auspol_cap.csv")
df2
The dataframe looks like this:
> df2 <- read.csv("top-5_auspol_cap.csv")
> df2
X user cap1 0 ScottMorrisonMP 0.7384783
2 1 DanielAndrewsMP 0.7966467
3 2 JoshFrydenberg 0.4756770
4 3 GladysB 0.7966369
5 4 bruce_haigh 0.7874002
To add the bot scores as node attributes, we create a new node attribute “cap” and copy in the scores from the csv file.
The console output presenting nodes with the node attribute we just created is as follows:
1] "bruce_haigh" "DanielAndrewsMP" "JoshFrydenberg" "ScottMorrisonMP" "GladysB" [
To inspect the values, we run the following code:
1] 0.7874002 0.7966467 0.4756770 0.7384783 0.7966369 [
Botometer
is a joint project of the Observatory on Social Media (OSoMe) and the Network Science Institute (IUNI) at Indiana University, USA↩︎
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Ackland & Borquez (2021, Oct. 8). VOSON Lab Code Blog: Testing the bot status of users in Twitter networks collected via vosonSML. Retrieved from https://vosonlab.github.io/posts/2021-10-01-testing-bot-status-of-twitter-users/
BibTeX citation
@misc{ackland2021testing, author = {Ackland, Robert and Borquez, Francisca}, title = {VOSON Lab Code Blog: Testing the bot status of users in Twitter networks collected via vosonSML}, url = {https://vosonlab.github.io/posts/2021-10-01-testing-bot-status-of-twitter-users/}, year = {2021} }