VOSON Lab Code Blog: Analysing online networks with VOSONDash

Francisca Borquez

This post introduces VOSONDash network analysis tools, which include network visualisation, network metrics, and text analysis. Users can analyse different networks including those collected with VOSONDash (Twitter, YouTube and Reddit), or import graphml files collected elsewhere.

Analysing online networks with VOSONDash is the first of a series of posts where we will cover VOSONDash features. Data collection with VOSONDash is covered in the following posts:

Twitter – Collecting Twitter data with VOSONDash
Reddit – Exploring issues in Reddit using VOSON Dash
YouTube – Collecting YouTube comments with VOSONDash

About VOSONDash

VOSONDash is an output of computational social methods research, designed to be a “Swiss Army knife” for studying online networks. The R/Shiny dashboard tool enables online data collection, and network and text analysis (including visualisation) within the same environment. VOSONDash builds on a number of R packages, in particular vosonSML for data collection and network generation, and igraph for network analysis. The package provides a graphical user interface which does not require users to have R programming skills and it is available on CRAN and GitHub. Bryan Gertzel is the lead developer and maintainer of VOSONDash.

Starting VOSONDash

The GitHub page provides instructions to install VOSONDash via R or Rstudio. Once the package is installed, run VOSONDash from the RStudio console entering the following code; VOSONDash will open in a web browser.

library(VOSONDash)
runVOSONDash()

Network data

To ease replication, in this example we will use the EnviroActivistsWebsite_2006 demo dataset which is provided in the package. The dataset is a hyperlink network collected with VOSON in 2006, as part of a research piece (Ackland and O’Neil 2011). The network has 161 nodes (websites representing environmental organisations) and 1,444 edges representing hyperlinks between these organisations. In this dataset, text data is stored as node attribute and categorical values are assigned depending on type of environmental organisations (Bios, Globals, and Toxics).

Network analysis using VOSONDash

There are three main approaches to analysing online networks with VOSONDash: Network graph, Network metrics (SNA), and Text analysis. More information on features can be accessed in the VOSONDash Userguide (Borquez et al. 2020).

Network graph

In Network graph provides two options to explore networks: network visualisation via igraph and visNetwork; and tabulations for nodes and edges. The Network graph pane provides the following options for manipulating the network:

Labels – to display or not labels.
Graph Filters – to display or not multiple edges, loops, and isolates.
Layout – to select graph layout and spread.
Node Size – to select node size by metric e.g. indegree and define size (multiplier).
Categorical filter – option available when data contains pre-set categorical values. New collections do not have that option.
Component filter – to display weak or strong components and define component range.
Neighbourhood select – to create subnetworks. It uses ego network terminology of order, where order 1 include ties between the alters.

Figure 1: VOSONDash network visualisation

Network metrics

Via the Network metrics pane, we can observe basic SNA metrics, including network level and node level metrics (e.g. centralisation). Network metrics reflect the applied filters for the visualisation; in this example we removed isolates (3 nodes), so network size is 158 and the Component distribution is 1 (one connected component). Degree distribution is only available for undirected networks; Indegree distribution and Outdegree distribution charts are available for directed networks, like this example. Accordingly, in this network, there are 15 nodes receiving one hyperlink, and three nodes receiving 35 hyperlinks. While 19 nodes link out to only one other site, there are two organisations in this network that link out to 50 sites.

Assortativity metrics (Homogeneity and Homophily indexes, including mixing matrix and population share) are presented for networks with categorical node attributes. In this example, we have selected the categorical attribute Type. The mixing matrix table presents links across the three types of organisations Globals, Bios and Toxics. The Bios and Globals sub-movements show a strong tendency towards linking to their own type. Population shares, Homogeneity indexes and Homophily indexes are presented by type. Controlling for group size, Globals are the group more biased towards its own type, where 53% of their ties to other Global organisations can be explained by homophily.

Text analysis

For a network with text data stored as either node or edge attribute, it is possible to conduct basic text analysis with VOSONDash. Text corpus can be pre-processed using Filters to:

remove common English words that are not relevant to the analysis, such as “and,” “the,” and “but” using Remove Standard Stopwords,
create own stopword list in User-Defined Stopwords,
reduce words to their stems with Apply word stemming,
remove URLs, numbers, or punctuation, based on user’s specifications.
Word lenght, if need to specify number of characters.
Advanced options provide HTML Decode and iconv UTF8, specially useful for social media as text often contains encoded characters.
For Twitter networks, two other options become available: Remove Twitter hashtags and Remove Twitter Usernames.

There are three methods available to visualise text:

Word frequency bar charts, where further parameters can be applied such as to define the number of results displayed, and frequency to define Minimum frequency, for the text to appear.
Word clouds where users can adjust Minimum frequency (how many times a word needs to have been used in order for it to feature in the visualisation); Maximum words to control for the number of words appearing in the graph; percentage of vertical words can be set for legibility; and random colours can be assigned to the visualisation. Comparison clouds are only available for datasets with categorical data, like this example where colour represents the node attribute type (Bios, Globals or Toxics).
The Sentiment analysis function uses the Syuzhet package and classifies words based on the NRC Emotion Lexicon, which is a list of English words and their associations with eight basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive).

We hope this guide is useful and easy to follow.

Ackland, R., and M. O’Neil. 2011. “Online Collective Identity: TheCase of the Environmental Movement.” Social Networks 33: 177–90. https://doi.org/https://doi.org/10.1016/j.socnet.2011.03.001.

Borquez, F., B. Gertzel, X. Cai, and R. Ackland. 2020. VOSON Dashboard Userguide. Canberra, Australia: VOSON Lab, Australian National University. https://vosonlab.github.io/VOSONDashDocs/.

Analysing online networks with VOSONDash