Social Media Data Collection and Network Analysis

with Netlytic and R

Anatoliy Gruzd

Ryerson University, Canada

gruzd@ryerson.ca | Twitter: @gruzd

Learning Objectives

By the end of this tutorial you will learn how to

collect social media data using Netlytic;
discover and visualize online communication networks using Netlytic and R;
create videos showing changes in online networks over time using R.

Here are some examples of resulting network animations:

Preparation Steps

Create an online account with Netlytic (https://netlytic.org)
Install Package R (https://cran.r-project.org/)

In R, install package igraph
This tutorial is tested with R v.3.2 and igraph v.1.0.1

Install RStudio (https://www.rstudio.com/products/rstudio/download/)
Install FFmpeg - software for video creation (http://ffmpeg.zeranoe.com/builds/)

For Mac Users:

Install/Update Xcode (6.4) from Apple Store
Follow the download instructions from starting “Install Homebrew” section at http://www.renevolution.com/ffmpeg/2013/03/16/how-to-install-ffmpeg-on-mac-os-x.html

Install VLC -video player (https://www.videolan.org/vlc/index.html#download)
Create a new folder on your Desktop called “HKnet” where you will keep all of the files related to this workshop.

Within this folder, create the following subfolders: “img” and “net”

PART 1: Network Discovery in Netlytic

Data source: Twitter

Step 1.1: Connect your Twitter account to Netlytic

Note: Alternately, you can download the “Hong Kong” dataset that will be used for this tutorial from this cvs file from https://netlytic.org/home/wp-content/uploads/2015/12/HongKong.csv

Step 1.2: Go to https://twitter.com/search-advanced and create a test search query. For today’s tutorial, type “#HongKong” into the “Any of these words” field.

Optional: For a more inclusive search query you may use the following:

#HongKong OR "Hong Kong" OR 香港 OR geocode:22.281216,114.158869,50km

Note: For more information on how to search by a specific location, see the following post: http://thoughtfaucet.com/search-twitter-by-location/examples/

Step 1.3: Once you are satisfied with the search results, go back to Netlytic, select the Twitter option under the “New Dataset” menu, copy & paste the search query from Step 2 (“#HongKong”) into the second text box (Twitter Search Terms), give any name to this dataset (for example, “#HongKong Tutorial” and click the “Import” button.

Note: If you downloaded the “#HongKong” dataset from the link provided in Step 1.1, then use the “Text file” tab instead of “Twitter” to import your data into Netlytic.

The Import command will retrieve 1000 most recent tweets from Twitter that correspond to your search query. In our case, it is going to be any Twitter message that mentions the ”#HongKong” hashtag.

The next screen will confirm the number of messages that Twitter returned based on your chosen search query.

Step 1.4: Click “Next Step” to preview your dataset. This step is designed to confirm that your dataset was imported properly.

Note: Here you can select what fields to preview by clicking on the “Row Label Fields” drop down menu.

Step 1.5: Go to the “4. Network Analysis” menu, find the “Name Network” section and click the Analyze button that shows the number of “Remaining Posts” as shown below.

Note: If you uploaded your dataset to Netlytic and did not import it directly from Twitter, select the dataset type as “Twitter” from the drop down menu before clicking on the Analyze button.

Step 1.6: Once the network is built, click on the “Visualize” button. The pop-up window will display the discovered network that represents “who mentions/replies/retweets whom”.

Note: Try changing the Layout, Node Size and Colors options in the left side menu. Discuss the changes and how they affect the interpretation of the network (help/hinder).

Step 1.7: Review some of the most connected members in the network, as indicated by the larger node size, and then read some of the messages exchanged among them and other Twitter users to understand the formation of connections in this network.

To access individual tweets, click on the node/person in question and then click on any of the connecting nodes/names listed in the left pane.

Step 1.8: Using the Notes feature (see below), annotate 2 different clusters/areas in the network that are indicated by different colors.

To annotate information about various clusters or individuals in the network visualization, use the yellow “Sticky Notes” feature. To activate this feature, click the yellow box containing a plus sign located in the bottom right hand corner of the network visualization window (see below).

Note: You can add “sticky notes” at the different levels of zoom in the network visualization. To navigate between zoom-levels, you can use “sticky notes” bookmarks that will appear at the bottom lower right hand corner of the network visualization screen along with information about the zoom level associated with each set of notes.(e.g., 0%, 25%, 50%, etc.)

Hint: To capture a snapshot of your network (and any sticky notes about your network), click on the “Save Image” button in the left pane as shown below. You can only save and publicly share up to three snapshots at a time in the system. If you want to take and save additional snapshots of your network, you will need to save them to your computer first and delete them from the system to make room for new snapshots. For example, you might want to take additional snapshots because you want to show/document something interesting about the interactions of a particular clusters of users in the network.

Part 2: Visualization of Dynamic Networks in R

Step 2.1: In Netlytic, under “Network analysis” -> “Name networks” click “Export” -> click “Edgelist” (the last icon). This will prompt you to save a *.CSV network file to your computer. Please save this CSV file to the project folder “HKnet” in the “net” subfolder.

Alternatively, you can download the network file from the following link: https://netlytic.org/home/wp-content/uploads/2015/12/net_HongKong.csv

Step 2.2: Open RStudio and install R package igraph by running the following command in the main menu “Tools”->”Install Packages...”-> type “igraph”

Step 2.3: In RStudio, create a new (blank) script via the “File” menu->”New File”->”R Script”, and save it to the project folder “HKnet” via the “File” menu->”Save”. Let’s call it “myFirstNetViz.R”.

Netlytic-Rstudio-script new.png

Step 2.4: In RStudio, copy and paste the following script into the editor window of the newly saved script. This script will (1) open the CSV network file that you exported from Netlytic earlier and (2) create and display a basic network visualization.

library(igraph)

edges <- read.table("net/your_filename.csv",header=T,sep=",")

g <- graph.edgelist(as.matrix(edges[,c(2,3)]),directed=T)

layout.old <- layout.graphopt(g)

plot(g,layout=layout.old,

vertex.frame.color=V(g)$color,

edge.width=1.5,

asp=9/16)

Note: Replace “your_filename” with the actual network file name that you saved from Netlytic.

Before executing this script, set the default working directory via the menu menu: “Session” -> “Set Working Directory” -> “Choose Directory”:

To execute this script, you can either click “Run” for every line OR highlight the whole script and click “Run” as shown below:

Resulting screen:

Step 2.5: Next we will update the previous script to adjust the node size to reflect the number of connections (degree centrality) and also remove labels from nodes with less than 10 connections (chosen arbitrarily). We will also add the caption to the graph and will specify the size of edge arrows. Below is the updated script with the changes highlighted:

library(igraph)

edges <- read.table("net/your_filename.csv",header=T,sep=",")

g <- graph.edgelist(as.matrix(edges[,c(2,3)]),directed=T)

layout.old <- layout.graphopt(g)

plot(g,layout=layout.old,

vertex.frame.color=V(g)$color,

edge.width=1.5,

asp=9/16,

vertex.size= 1 + 1.5*log(graph.strength(g)),

vertex.label=ifelse(degree(g)>10,V(g)$name,NA),

vertex.label.color= "black",

vertex.label.font=1,

vertex.label.cex=2,

edge.arrow.size=0.1,

main="Dynamic Network Visualization"

)

Resulting screen:

Step 2.6: Next we will add two more lines to the above code to save the network image as a PNG file instead of displaying it on the screen. After running the following script, please open the “img” folder to confirm that “test.png” is there.

library(igraph)

edges <- read.table("net/your_filename.csv",header=T,sep=",")

g <- graph.edgelist(as.matrix(edges[,c(2,3)]),directed=T)

layout.old <- layout.graphopt(g)

png(file="img/test.png", width=1600,height=900,bg = "#F1F1F5")

plot(g,layout=layout.old,

vertex.frame.color=V(g)$color,

edge.width=1.5,

asp=9/16,

vertex.size= 1 + 1.5*log(graph.strength(g)),

vertex.label=ifelse(degree(g)>10,V(g)$name,NA),

vertex.label.color= "black",

vertex.label.font=1,

vertex.label.cex=2,

edge.arrow.size=0.5,

main="Dynamic Network Visualization"

)

dev.off()

Step 2.7 The following script (adapted from Moro’s code) is an extension of the previous script. Each edge in the network file has a timestamp value showing when the edge was created. The script uses this information to generate multiple snapshots of the network; each snapshot representing the network at a different point in time.

Let’s review this script together line by line:

library(igraph)

edges <- read.table("net/your_filename.csv",header=T,sep=",")

g <- graph.edgelist(as.matrix(edges[,c(2,3)]),directed=T)

E(g)$time <- edges[,4]

#remove self-loops

g <- simplify(g, remove.multiple = FALSE, remove.loops = TRUE)

step <- 3

E(g)$weight <- ifelse(E(g)$time < step,1,0)

layout.old <- layout.graphopt(g,niter=100,spring.length=E(g)$weight)

png(file="img/net%03d.png", width=1600,height=900,bg = "#F1F1F5")

total_time <- max(E(g)$time)

delta <- 0.5

nsteps <- max(E(g)$time)

for(step in seq(3,total_time,delta)){

E(g)$weight <- ifelse(E(g)$time < step,1,0)

E(g)$color <- ifelse(E(g)$time < step,"gray",rgb(0,0,0,0))

V(g)$color <- ifelse(graph.strength(g)==0,rgb(0,0,0,0),"#3476A8")

layout.new <- layout.graphopt(g,niter=10,start=layout.old,spring.length=E(g)$weight,max.sa.movement=1)

plot(g,layout=layout.new,

vertex.frame.color=V(g)$color,

edge.width=1.5,

asp=9/16,

vertex.size= 1 + 1.5*log(graph.strength(g)),

vertex.label=ifelse(degree(g)>10,V(g)$name,NA),

vertex.label.color= "black",

vertex.label.font=1,

vertex.label.cex=2,

edge.arrow.size=0.5,

main="Dynamic Network Visualization"

)

layout.old <- layout.new

}

dev.off()

After running this script, check that the “img” folder contains multiple PNG files.

Step 2.8. The final step is to run FFMpeg program from the command line to generate a video by “stitching” all of the PNG network images together.

First, start the FFMpeg program

In Windows, find and run (double click) the “ff-prompt.bat” file
In Mac, open the “terminal” window by going to spotlight and type in “terminal”

In the “command prompt”/“terminal window” that will open after running the previous file, enter the following commands:

Mac users, once terminal is open, type in ffmpeg and then enter in commands listed below.

Command #1 ‘Go to the project folder’:

cd path_to_the_project_folder

For example:

cd C:\Users\agruz_000\Desktop\HKnet

Command #2 Run the video creation program:

ffmpeg -r 10 -i img/net%03d.png -b:v 20M dynamic_network.mp4

Step 2.9. Go to the project folder and open dynamic_network.mp4 using the VLC video player

Here are some examples of resulting network animations:

https://youtu.be/qPxIjxIZzDs

https://youtu.be/iWYqqlzh5wc