VOSON Lab Code Blog: Exploring signed networks in Mastodon

Francisca Borquez; Bryan Gertzel; Robert Ackland

1. Introduction – Exploring Mastodon networks with R tools

Our previous post (Gertzel, 2023) introduced Mastodon – a decentralised microblogging platform – as well as code and methodological steps to collect Mastodon data using the rtoot R package (Schoch & Chan, 2023) and construct networks for analysis.

The present blog post uses rtoot to construct a signed network between Mastodon instances where a ‘friend’ tie indicates that an instance has nominated another instance as a peer while a ‘foe’ tie is where an instance blocks another instance. When an instance \(i\) blocks another instance \(j\) it means that content and users on instance \(j\) are not visible to users on instance \(i\).

While the present post uses rtoot directly, we have recently implemented Mastodon collection in vosonSMl and vosonDash, using the rtoot R package. Steps for collecting and generating different types of networks, following a similar framework as with the other data sources (Twitter, YouTube, Reddit, WWW) are available on the GitHub page.

This blog post summarises research that was presented at the Australian Social Network Analysis Conference ASNAC, in November 2023. Research design, methodology and analysis by Francisca Borquez V.; data collection and analysis code by Bryan Gertzel and Rob Ackland.

2. Exploring signed networks in Mastodon

Mastodon is a decentralised microblogging platform that uses an open-source network protocol called ActivityPub to communicate between server instances and users. The instances collectively form a network of interoperable servers known as the federated universe or ‘Fediverse’. Users join autonomous local communities (with their own rules, administration and moderation), typically established around a community or area of interest and intended to group similar users based on e.g. geographic location, language, views, interests, etc.

The Mastodon federated governance of instances results in local rules and moderation, wherein users and other instances can be ‘friended’ but also silenced or suspended. As instances administrators apply moderating rules (e.g. by nominating foes), they protect their own users to be exposed to potentially harmful timelines, content and users. The following example provides an approach to empirically observe online signed networks, i.e. networks containing both positive (friendly) and negative (antagonistic) ties, using Mastodon instance log data.

2.1 Identifying seed server instances

Mastodon provides an instance search engine, which is based on a database that gets crawled and updated on a daily basis. Users can define search parameters, for example by setting languages, descriptions, number of users, etc.

In this example, we are using 5 server instances relating to technology as starting points (seeds), we identified through the instance search engine. The seeds were qualitatively assessed and included in the sample if they:

were self-identified and ‘actually’ related to tech
were actively moderated and have terms of use in place
presented a list of moderated servers or instance blocks (foes)
had at least 1,000 users, who are involved and active

Instance	Number of users
social.veraciousnetwork.com	2,087
defcon.social	1,122
vmst.io	1,942
social.linux.pizza	1,675
gamestoot.de	1,798

Table 1: seed server instances related to tech and number of active users.

Data were collected in November 2023. We used the R package rtoot (Schoch & Chan, 2023) to programmatically access the list of ‘friends’ – federated servers, with the get_instance_peers function – and ‘foes’ – moderated servers, with the get_instance_blocks function. The following code conducts the data collection and saves the collected raw data as an RDS file, for later use.

# Code authored by Bryan Gertzel, VOSON Lab
# options
options(scipen = 999)
options(encoding = "UTF-8")

library(tidyverse)
library(rtoot)

#--------------------
#Collect the raw data
#--------------------

# read from csv. 'About' column contains server instances URLs. Trailing gets removed with second line.
seeds <- read_csv(file = "seeds.csv") |> pull("About") |> str_remove_all("https://|/about|/explore")

# get instance peer list (friends)
get_peers <- function(x) {
  tryCatch({
    peers <- get_instance_peers(x, anonymous = TRUE)
    tibble(peers.domain = peers) |> mutate(instance = x) |> relocate(instance)
  },
  error = function(e) {
    message(paste(x, "-", "get_peers", e))
    NULL
  })
}

# get instance blocks list
get_blocks <- function(x) {
  tryCatch({
    blocks <- get_instance_blocks(x, anonymous = TRUE)
    blocks |> rename_with(~ paste0("blocks.", .x)) |> mutate(instance = x) |> relocate(instance)
  },
  error = function(e) {
    message(paste(x, "-", "get_blocks", e))
    NULL
  })
}

# get friends and foes for seed instances in a dataframe
data <- map_dfr(seeds, function(x) bind_rows(get_peers(x), get_blocks(x)))

nrow(data)
#[1] 223032

#Save raw data
saveRDS(data, "data_23Nov.rds")

2.2 Constructing the full network and categorising nodes

The resulting dataframe contains 223,032 rows, including both peers and foes nominations. We then constructed a directed network, where nodes are server instances, and there are two types of tie: ‘friend’ i.e. where instance \(i\) regards instance \(j\) as a ‘peer’, and ‘foe’ i.e. where instance \(i\) blocks instance \(j\). Type of tie – friend or foe – was included in the network as edge attribute.

# Code authored by Bryan Gertzel and Rob Ackland, VOSON Lab
#Construct the networks
library(dplyr)
library(igraph)

data <- readRDS(paste0(data_dir, "data_23Nov.rds"))

nrow(data)

[1] 223032

# server relations (edge list)
relations <- data |>
  select(from = instance, to = peers.domain) |>
  filter(!is.na(to)) |>
  mutate(type = "friend") |>
  bind_rows(
    data |>
      select(from = instance, to = blocks.domain) |>
      filter(!is.na(to)) |>
      mutate(type = "foe")
  ) |>
  distinct()

g <- graph_from_data_frame(relations)

vcount(g)

[1] 124258

ecount(g)

[1] 223032

# Option for signed networks -- weighted ties 
#E(g)$weight <- ifelse(E(g)$type=="friend", 1, -1)

#identifying seed servers
V(g)$seed <- "no"
V(g)$seed[which(degree(g, mode="out")>0)] <- "yes"      

table(E(g)$type)


   foe friend 
   643 222389

The full network contains 124,258 nodes (server instances) and 223,032 ties, of which 222,389 are positive (‘friend’ or peer nominations) and 643 are negative (‘foe’ nominations).

Then, the nodes were classified as ‘friends’ , ‘foes’, ‘mixed’ and ‘neither’, according to the type of ties they receive:

Type	Classification	Number of nodes	Colour
Friend	if receive >=2 friend nominations and <2 foe nominations	28,036	Green
Foe	if receive <2 friend nominations and >=2 foe nominations	41	Red
Mixed	if receive >=2 friend nominations and >=2 foe nominations	52	Orange
Neither	Otherwise	96,129	White

Table 2: node classification according to type of tie.

The following code classifies the nodes and saves the igraph graph as a graphml file, for further analysis.

# Code authored by Rob Ackland, VOSON Lab

#-----------------------------
#Classify nodes: 
#if receive >=2 friend nominations and <2 foe nominations then "friend"
#if receive <2  friend nominations and >=2 foe nominations then "foe"
#if receive >=2 friend nominations and >=2 foe nominations then "mixed"
#otherwise: "neither"
#Note: this code takes several minutes to run

e_ind <- incident_edges(g, V(g), mode="in")

#e_ind[1]

f1 <- function(t){
 
    x1 <- table(e_ind[[t]]$type)
    isFriend <- 0
    if ("friend" %in% names(x1))
        if (x1[which(names(x1)=="friend")]>=2)
            isFriend <- 1
    isFoe <- 0
    if ("foe" %in% names(x1))
        if (x1[which(names(x1)=="foe")]>=2)
            isFoe <- 1
    
    if (t%%100==0)
        cat("finished:", t, "\n")

    type <- ifelse(isFriend & !isFoe, "friend",
                   ifelse(!isFriend & isFoe, "foe", 
                          ifelse(isFriend & isFoe, "mixed", "neither")))
}

#This takes several minutes
#L <- lapply(1000:2000, f1)                #testing
L <- lapply(1:length(e_ind), f1)
df <- do.call("rbind", L)

V(g)$type <- df

table(V(g)$type)
#foe  friend   mixed neither 
#41   28036      52   96129

#save graphml
write.graph(g, "g.graphml", format="graphml")

2.3 Constructing a subnetwork of negative ties only

Then, we constructed a subnetwork that includes ‘foe’ ties only. The nodes in this subnetwork encompass the seed instances which have sent negative ties and those instances that have received at least one ‘foe’ nomination. The resulting network has 497 nodes and 643 ‘foe’ edges. Isolates were removed. Nodes were colour coded based on the categorisation.

# Code authored by Rob Ackland, VOSON Lab

#Read full graph
g <- read.graph(paste0(data_dir,"g.graphml"),format="graphml")

#Construct network: (1) only foe ties and (2) remove isolates
#So this network only contains seeds and nodes that have received at least one foe nomination

g5 <- delete.edges(g, which(E(g)$type=="friend"))
g5 <- induced.subgraph(g5, which(degree(g5)>0))

#Colour nodes according to their status as seed/friend/foe/mixed
#So we find that there are nodes in this network that are coloured green: they are 
#classifiedas "friend" (more than 2 friendship nominations) but at least one seed has classified them as foe

V(g5)$color <- "white"
V(g5)$color[which(V(g5)$type=="friend")] <- "green"       #seed servers
V(g5)$color[which(V(g5)$type=="foe")] <- "red"       #seed servers
V(g5)$color[which(V(g5)$type=="mixed")] <- "orange"       #seed servers
V(g5)$color[which(V(g5)$seed=="yes")] <- "blue"       #seed servers

#png("foe_ties_only.png", width=600, height=600)
#plot(g5, vertex.size=3, vertex.label="", edge.arrow.size=0.3)
#dev.off()

write.graph(g5, "g5.graphml", format="graphml")    

table(V(g5)$type)


    foe  friend   mixed neither 
     41     237      52     167

#Let's just check that a node classified as mixed really is mixed
mixed <- which(V(g5)$type=="mixed")
#Example:
V(g5)$name[mixed[1]]

[1] "iddqd.social"

#Let's see the inbound ties to this node
e_ind <- incident_edges(g, V(g), mode="in")

e_ind[[which(V(g)$name=="iddqd.social")]]

+ 6/223032 edges from eebb30b (vertex names):
[1] furry.engineer             ->iddqd.social
[2] social.veraciousnetwork.com->iddqd.social
[3] vmst.io                    ->iddqd.social
[4] defcon.social              ->iddqd.social
[5] social.linux.pizza         ->iddqd.social
[6] gametoots.de               ->iddqd.social

#Let's see the type of tie
e_ind[[which(V(g)$name=="iddqd.social")]]$type

[1] "friend" "foe"    "foe"    "foe"    "foe"    "friend"

#So everything looks correct: this node is "mixed" as there are two friendship nominations
#and 4 foe nominations

The graphml file was then read into Gephi and the following visualisation was produced.

Figure 1: Mastodon network of server instances. Seed instances are coloured blue. Nodes (other than seeds) represent Mastodon instances which have received at least one foe nomination by a seed and are classified by colour: green – ‘friend’; orange – ‘mixed’: red – ‘foe’, and white – ‘neither’. Node size by indegree. Isolates were removed from the network. Edge reflect incident node colour. Network visualisation produced with Gephi.

3. Findings

Consistent with the literature (Everett & Borgatti, 2014; Stadtfeld et al., 2020), the network structure presents group boundaries around the seed servers, given that instance moderators are consciously separating themselves from those instances that have been identified as unequivocal ‘foes’(red) or ‘mixed’ (orange).

In their exploration of online signed networks, (Leskovec et al., 2010) sustained that positive ties are expected to produce clusters while negative ties tend to span positive clusters, which can be observed in this example. In this network, green nodes (friends) – those servers that have received only 1 negative nomination and 2 or more positive nominations by other seed instance - tend to cluster around the seed servers, while unequivocal ‘foe’ servers (red) plus ‘mixed’ servers (orange)- which have received at least 2 negative and positive nominations - are localised in a central area of the network, spanning the clusters.

4. Next Steps

The next steps of this research will involve the use of signnet (Schoch, 2023), an R package that provides methods to analyse signed networks, with special focus on structural balance using triads, and signed blockmodeling.

Another approach will involve to qualitatively assess seed servers’ moderating rules, based on what constitutes an ‘acceptable’ reason to block an instance. Moderating rules will be then categorised in scales, from ‘strict’ to ‘permissible’, and such values can be included in the network as node attributes. Similarly, we will assess the capacity of the instance administrator(s) to ‘defend their space’ as a network strategy to define boundaries, i.e. by identifying other variables that could explain clustering, for example, frequency of activity (active versus passive moderation), date of creation (established communities versus new entrants), etc.

Everett, M. G. & Borgatti, S. P. (2014). Networks containing negative ties. Social Networks, 38, 111–120. https://doi.org/https://doi.org/10.1016/j.socnet.2014.03.005

Gertzel, B. (2023). VOSON lab code blog: Mastodon conversation networks. https://vosonlab.github.io/posts/2023-07-27-mastodon-conversation-networks/

Leskovec, J., Huttenlocher, D. & Kleinberg, J. (2010). Signed networks in social media. https://arxiv.org/abs/1003.2424

Schoch, D. (2023). Signnet: An r package for analyzing signed networks. Journal of Open Source Software, 8(81), 4987. https://doi.org/10.21105/joss.04987

Schoch, D. & Chan, C. (2023). Software presentation: Rtoot: Collecting and analyzing mastodon data. Mobile Media & Communication, 20501579231176678. https://doi.org/10.1177/20501579231176678

Stadtfeld, C., Takács, K. & Vörös, A. (2020). The emergence and stability of groups in social networks. Social Networks, 60, 129–145. https://doi.org/https://doi.org/10.1016/j.socnet.2019.10.008

Exploring signed networks in Mastodon