Testing the bot status of users in Twitter networks collected via vosonSML

rstats python SNA vosonSML networks Botometer bot detection

We use vosonSML to collect Twitter data and then use Botometer to test the bot status of a subset of Twitter users.

Robert Ackland true (VOSON Lab, School of Sociology, Australian National University)http://vosonlab.net/ , Francisca Borquez
10-08-2021

Introduction

As social media platforms like Twitter become important spaces for information diffusion, discussion and opinion formation, serious concerns have been raised about the role of malicious socialbots in interfering, manipulating and influencing communication and public opinion Badawy, Ferrara, and Lerman (2018). Their detection and the understanding of consequent dynamics of behaviour are relevant to researchers and it is central to the research collaboration the VOSON Lab is involved in (see Cimiano et al. 2020).

In this post we will use vosonSML to collect data via the Twitter API and construct a network represented as an igraph graph object in R. Then, we will identify a subset of users and test their bot status using Botometer1 (in python) and include the bot scores as node attributes in the network graph in R.

Collecting the Twitter network using vosonSML in R

The vosonSML vignette (Ackland, Gertzel, and Borquez 2020) provides comprehensive instructions on how to use vosonSML. In this post, we are going to focus on the essential steps for Twitter collection and network generation via vosonSML. The first step involves loading the vosonSML package into the R session, and use the Web Auth approach to create a Twitter API access token:

library(magrittr)
library(vosonSML)

twitterAuth <-
   Authenticate(
      "twitter",
      appName = "An App",
      apiKey = "xxxxxxxxxxxx",
      apiSecret = "xxxxxxxxxxxx")

#Optionally, save the access token to disk:
saveRDS(twitterAuth, file = "twitter_auth.rds")

#The following loads into the current session a previously-created access token:
twitterAuth <- readRDS("twitter_auth.rds")

Then, we collect 100 tweets that contain the Australian politics hashtag #auspol. Data is saved in .rds dataframe format.

twitterData <- twitterAuth %>%
   Collect(
      searchTerm = "#auspol",
      numTweets = 100,
      includeRetweets = FALSE,
      retryOnRateLimit = TRUE,
      writeToFile = TRUE)

To read the Twitter dataframe from disk, the ImportData() function modifies the class values for the object before it is used with vosonSML:

twitterData <- ImportData("2021-09-30_182359-TwitterData.rds", "twitter")

And now we use the vosonSML functions Create("actor") to create an Actor network and Graph() to create an igraph graph object g.

The Create("actor") function generates a named list containing two dataframes named nodes and edges. In this Actor network nodes are users who have either tweeted using the search term #auspol, or else are mentioned or replied to in tweets featuring the search terms. Edges represent interactions between Twitter users, and an edge attribute indicates whether the interaction is a mention, reply, retweet, quoted retweet or self-loop.

actorNetwork <- twitterData %>% Create("actor", vertbose=TRUE)
g <- actorNetwork %>% Graph()
g

The output in the console loos like this:

> actorNetwork <- twitterData %>% Create("actor", vertbose=TRUE)
Generating twitter actor network...
-------------------------
collected tweets | 100
tweet mention    | 119
tweet            | 62
reply mention    | 27
reply            | 21
quote mention    | 5 
quote            | 17
nodes            | 226
edges            | 251
-------------------------
Done.
`> g <- actorNetwork %>% Graph()
Creating igraph network graph...Done.

The graph object g prints as follows:

> g
GRAPH 527682d DN-- 226 251 -- 
+ attr: type (g/c), name (v/c), screen_name (v/c), status_id (e/c), created_at (e/c),
| edge_type (e/c)
+ edges from 527682d (vertex names):
 [1] 1362279599191691264->1116612139          43447495           ->43447495           
 [3] 518488471          ->518488471           353381552          ->353381552          
 [5] 576131356          ->576131356           28305154           ->3288075858         
 [7] 1310795233651601408->3079563404          3842652433         ->3842652433         
 [9] 3219321554         ->3219321554          1327357902424666112->29387813           
[11] 1327357902424666112->1548253015          1327357902424666112->1327357902424666112
[13] 37891446           ->37891446            1296548400         ->1296548400         
+ ... omitted several edges

We now use igraph to manipulate the network. The simplify(g) function removes multiple edges and loops from the network. Then, we proceed to identify a subset of Twitter users (5), based on indegree, which we will later use in our bot status analysis.

library(igraph)

#remove multiple and loop edges
g <- simplify(g)

V(g)$screen_name[order(degree(g, mode="in"), decreasing=TRUE)][1:5]

Given this network was created using tweets that contain the #auspol hashtag, it is not surprising that the top 5 Twitter users based on indegree are four politicians and a political commentator:

[1] "ScottMorrisonMP" "DanielAndrewsMP" "JoshFrydenberg"  "GladysB"         "bruce_haigh"```  

Since we are going to access the Botometer API via Python, first we need to print the Twitter handles we want to check (5) with Botometer to a .csv file.

write.csv(data.frame(user=V(g)$screen_name[order(degree(g, mode="in"), decreasing=TRUE)][1:5]), "top-5_auspol.csv", row.names=FALSE)

Finding the bot scores using Botometer in python

Getting started with Botometer

We are now going use the Botometer API to find the bot scores for the 5 Twitter accounts. We will use the python client Botometer-python provided by the Botometer team.

To use the Botometer API you need to be able to authenticate using Twitter developer app API keys (same keys you use for Dev Auth approach to authenticating for Twitter collection via vosonSML). You also need a free RapidAPI (previously Mashape) account with the Botometer Pro API enabled (the Basic plan is free).

Below is a test of the Botometer python client v4 , using code from the Indiana University Network Science Institute GitHub page.

To access Botometer, enter the following code in a python shell or script. The second step involves checking the bot status of a single Twitter account:

import botometer

rapidapi_key = "xx"
twitter_app_auth = {
                    'consumer_key': "xx",
                    'consumer_secret': "xx",
                    'access_token': "xx",
                    'access_token_secret': "xx"
                   }

bom = botometer.Botometer(wait_on_ratelimit=True,
                          rapidapi_key=rapidapi_key,
                          **twitter_app_auth)

# Check a single account by screen name
result = bom.check_account('@clayadavis')
print(result)

The result of our test prints as follows:

{
    "cap": {
        "english": 0.4197222421546159,
        "universal": 0.6608500314332488
    },
    "display_scores": {
        "english": {
            "astroturf": 0.2,
            "fake_follower": 1.2,
            "financial": 0.0,
            "other": 0.3,
            "overall": 0.4,
            "self_declared": 0.2,
            "spammer": 0.0
        },
        "universal": {
            "astroturf": 0.2,
            "fake_follower": 0.9,
            "financial": 0.0,
            "other": 0.3,
            "overall": 0.8,
            "self_declared": 0.0,
            "spammer": 0.1
        }
    },
    "raw_scores": {
        "english": {
            "astroturf": 0.04,
            "fake_follower": 0.23,
            "financial": 0.0,
            "other": 0.06,
            "overall": 0.08,
            "self_declared": 0.05,
            "spammer": 0.01
        },
        "universal": {
            "astroturf": 0.04,
            "fake_follower": 0.18,
            "financial": 0.0,
            "other": 0.06,
            "overall": 0.17,
            "self_declared": 0.0,
            "spammer": 0.02
        }
    },
    "user": {
        "majority_lang": "en",
        "user_data": {
            "id_str": "11330",
            "screen_name": "test_screen_name"
        }
    }
}

The descriptions of elements in the response e.g. users, raw scores, etc., are specified in the GitHub page.

Analysing bot status with Botometer in phyton

This step involves reading in the .csv file with the 5 Twitter handles into python and run them through the botometer API:

import pandas as pd
users = pd.read_csv("top-5_auspol.csv")
print(users)
cat("user\n 0  ScottMorrisonMP\n 1  DanielAndrewsMP\n 2   JoshFrydenberg\n 3          GladysB\n 4  bruce_haigh\n")

Now, we collect the botscores and print the Complete Automation Probability (CAP) to the csv file.

results_dict = {}     #use this to save all botometer results to file
cap = []              #use this for writing botometer CAP to csv
for i in users.user:
   #print(i)
   result = bom.check_account('@'+i)
   #print(result)
   cap.append([i, result['cap']['english']])
   results_dict[i] = result

#write CAP score to csv
df = pd.DataFrame(cap, columns=["user", "cap"])
#print(df)
df.to_csv("top-5_auspol_cap.csv")

#write results dictionary to file
import json
json.dump(results_dict, open("top-5_auspol_botometer_results.txt",'w'))
#can be read back in with
#d2 = json.load(open("top-5_auspol_botometer_results.txt"))

The Botometer API provides scores as Complete Automation Probability (CAP), defined as the probability, according to Botometer models, that an account with a certain score or greater is a bot. More information on how to interpret the scores is available here.

Bot scores as node attributes in the graph in R

Finally, we can read the botometer scores back into R and include them as a node attribute in the graph.

df2 <- read.csv("top-5_auspol_cap.csv")
df2

The dataframe looks like this:

> df2 <- read.csv("top-5_auspol_cap.csv")
> df2
  X            user       cap
1 0 ScottMorrisonMP 0.7384783
2 1 DanielAndrewsMP 0.7966467
3 2  JoshFrydenberg 0.4756770
4 3         GladysB 0.7966369
5 4     bruce_haigh 0.7874002

To add the bot scores as node attributes, we create a new node attribute “cap” and copy in the scores from the csv file.

V(g)$cap <- NA
V(g)$cap[match(df2$user,V(g)$screen_name)] <- df2$cap
V(g)$screen_name[!is.na(V(g)$cap)]

The console output presenting nodes with the node attribute we just created is as follows:

[1] "bruce_haigh"     "DanielAndrewsMP" "JoshFrydenberg"  "ScottMorrisonMP" "GladysB" 

To inspect the values, we run the following code:

V(g)$cap[!is.na(V(g)$cap)]
[1] 0.7874002 0.7966467 0.4756770 0.7384783 0.7966369
Ackland, R., B. Gertzel, and F. Borquez. 2020. Introduction to vosonSML. Canberra, Australia: VOSON Lab, Australian National University. https://cran.r-project.org/web/packages/vosonSML/vignettes/Intro-to-vosonSML.html.
Badawy, A., E. Ferrara, and K. Lerman. 2018. “Analyzing the Digital Traces of Political Manipulation: The 2016 Russian Interference Twitter Campaign.” In 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).
Cimiano, P., F. Muhle, O. Putz, B. Schiffhauer, E. Esposito, R. Ackland, U. Seelmeyer, and T. Veale. 2020. “Bots Building Bridges (3b): Theoretical, Empirical, and Technological Foundations for Systems That Monitor and Support Political Deliberation Online.” Volkswagen Foundation Grant in AI and the Society of the Future stream (2021-2025).
Rizoiu, M-A., T. Graham, R. Zhang, Y. Zhang, R. Ackland, and L. Xie. 2018. #DebateNight: The Role and Influence of Socialbots on Twitter During the 1st U.S. Presidential Debate.” In International AAAI Conference on Web and Social Media.

  1. Botometer is a joint project of the Observatory on Social Media (OSoMe) and the Network Science Institute (IUNI) at Indiana University, USA↩︎

References

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Ackland & Borquez (2021, Oct. 8). VOSON Lab Code Blog: Testing the bot status of users in Twitter networks collected via vosonSML. Retrieved from https://vosonlab.github.io/posts/2021-10-01-testing-bot-status-of-twitter-users/

BibTeX citation

@misc{ackland2021testing,
  author = {Ackland, Robert and Borquez, Francisca},
  title = {VOSON Lab Code Blog: Testing the bot status of users in Twitter networks collected via vosonSML},
  url = {https://vosonlab.github.io/posts/2021-10-01-testing-bot-status-of-twitter-users/},
  year = {2021}
}