CSL 709 - Network Science

(Fall 2015)

IIT Ropar

Prof. S. R. S. Iyengar

Class timings

Wednesday (2:25PM - 3:15PM)

Thursday (3:20PM - 4:10PM)

Friday (4:15PM - 5:05PM)

Lab timings

Friday 10:50AM - 12:35PM

Venue: Main Building, L4

Details of the Classes Conducted

Details of the Labs Conducted

Lab 1

31st July, 2015        10:50AM - 12:35PM

Introduction to python packages of networkx.

1. compl_discon(G): This function returns 0 if the graph is connected and it returns the diameter of the complement of G, if the graph is disconnected.                        Note: Complement of a disconnected graph is always a connected graph.

1. random_graph(n,p): Returns G(n,p), where G(n,p) is called an Erdos-Renyi   random graph with n nodes, where each edge is included in the graph with probability p independent from every other edge.

The above programs must not use built in functions available in networkx.

Lab 2

7th August, 2015        10:50AM - 12:35PM

Of the various concepts of Network Science discussed in class, you are required to pick a topic of your choice and implement it on python and state your observations.

Lab 3

14th August, 2015        10:50AM - 12:35PM

1. Write a function Grid(n) to generate a grid network of size n x n. Find the betweenness centrality and the closeness centrality of nodes in the network. Report whether the betweenness centrality and the closeness centrality of the nodes match.

1. Take a real world network from SNAP and mark the importance of nodes in the network by taking a random walk on the network and report your observations.

Lab 4

21st August, 2015        10:50AM - 12:35PM

1. Generate the graph of the students who participated pagerank activity.
1. Generate the ranking of the students based on the Google page ranking algorithm with the help of the adjacency matrix A  of the underlying graph. This is done by multiplying the matrix [A+I] k with the vector [1n]
2. Generate the ranking by taking a random walk on the graph as discussed in class.

Compare the ranking of nodes in the graph based on the two techniques.

Lab 5 : Take home Test - 1

1. Let the betweenness centrality ranking of a node ‘v’ be given by CB(v) and the closeness centrality ranking by CC(v).  Write a function that takes input  as a graph ‘G’ and find the number of unordered node pairs (u,v) such that                                    [  ( CC(u)-CC(v) ) (CB(u)-CB(v) )  ] < 0

count_inv_pairs() : function to count the node pairs that have inverted rankings based on the two centrality measures considered.

Input : file.txt that contains the edge list of a graph G

Output : N is the number of pairs that respect the condition specified

1. Let S be the set of all those node pairs as mentioned above. That is,

S = { (u,v) | u,v ∊ G(V) and [  ( CC(u)-CC(v) ) (CB(u)-CB(v) )  ] < 0 }

Construct a graph on 100 nodes G so as to maximize |S|.

genG() : function that generates a graph on 100 nodes which maximize S.

Output : file.txt  that contains the edge list of the graph generated.

Please include the output file also in the folder along with the code and a readme file that gives the details of the strategy used to generate the underlying graph.

Lab 6

28th August, 2015        10:50AM - 12:35PM

1. Threshold function” for a G(n,p) graph is that value of p (seen as a function of n: p(n)) below which a particular graph property ceases to exist while, above the value ‘p’ the property always exists.

Write a function that determines the threshold in G(n,p) for a giant component to exist. [Please note: you are to determine the threshold ‘p’ as a function of n]

Output: a plot that shows the threshold ‘p’ for increasing values of ‘n’.

Lab 7

4th September, 2015   10:50AM - 12:35PM

Apply the Girvan Newman algorithm on the network to identify the communities in the given network.

Display the original network as well as the network with communities identified.

Also, return a list of lists such that each inner list contains the edge list of a particular community.

Lab 8:  Lab Test - 1 (in house)

11th September, 2015   10:50AM - 12:35PM

Q1) Implement the below algorithm:

• In the first round, all nodes with one or less edges are removed from the graph, and all their edges are deleted. all the deleted nodes are considered to be put into bucket 1.
• In the second round, all nodes with two edges or less are removed from the graph and their edges deleted. The nodes removed in this round would now belong to bucket 2.
• Similarly, at the i-th iteration all nodes with less than 'i' neighbours are removed from the graph and put into i-th bucket. Note that removal of a node with k or fewer neighbours is done recursively. If removal (with the k links) exposes a new site with now less than k neighbours it is removed in the current iteration as well.

Input: a given graph G

Output: (k,L)

A list of lists L where the 1st inner list has nodes belonging to bucket 1, and so on. k is the total number of buckets.

Include a read-me file that has a complete analysis of the algorithm. Please ensure to mention what this algorithm is achieving, where can it be used and any other inferences that can be made.

Q2) As observed in one of the previous lab assignments, with increase in the edge probability 'p' of an ER-Graph, there is a special 'p=p0' at which a giant component emerges in the graph G(n,p). Similarly, there is a special 'p=p1' at which triangles appear in G(n,p). Find this threshold function empirically for the existence of triangles.

Output: plot the relation of p1 and n.

Include a read-me file where you mention the exact function for the threshold.

Lab 9

23rd September, 2015   10:50AM - 12:35PM

Propose a model for the spread of any contagion. Include a readme file that contains the following:

1. Description of the model
3. A few observations regarding the model considered.

Input: G(V,E) given in the form of an adjacency list

Output: Show the stepwise process of spread of contagion by displaying infected nodes with a different colour across different time steps.

Lab 10: Take home Test - 2

3 October 2015

(a)Write a function genScore() that takes as input - k, G; where k is the number of iterations that need to be performed and G is a graph that is given in the form of edge list given in file “input.txt”. The function must output the hub score (H) and the authority score (A) for all nodes after performing k iterations as a list. You can output the result in the terminal itself.

(b)Given a directed graph G(V,E) we can compute the hub score(H) and the authority score(A) for each node. Generate a graph G(V,E) having |V|=30 nodes such that:

For all nodes i ∊ V, Res= Σi |Hi - Ai| is the maximum.

Output: Res

Lab 11 :  Lab Test - 2 (in house)

10 October 2015, 2:00 PM - 04:00 PM

Given a set of authorities and hubs, we saw how to generate the hub score and the authority score using the principle of iterated improvements. Generate a graph G(V,E) such that the overall ranking of a hub based on the hub score improves in the kth iteration, where k is a parameter that is to be taken as the input.

Input: k ­ iteration at which the ranking must improve

Output: The constructed graph G(V,E). The list of hubs and the list of authorities.

The hub and authority scores for all nodes up to the kth iteration. The improvement in the ranking (i.e difference between the previous and current ranking)

Lab 12 :

15th October 2015, 03:20 PM - 04:10 PM

Creating entries in the QWiki to see if Q/A is essential for knowledge building repositories and observing the network dynamics of the same.

Lab 13 :

16 October 2015, 10:50AM - 12:35PM

Community based random walk model to detect if two nodes belong to the same community on not based on the intersection of the number of random walkers that have commonly visited the nodes under consideration.

Lab 14 : Lab Test - 3 (in house)

9 November 2015 , 08:40 PM - 11:15 PM

Ideal Teleportations

Q1. Generate an mXn grid network. Add k more edges to the network such that the average distance in the graph is minimized. No inbuilt function must be used.

Input:

m,n: dimensions of the grid

k: the number of extra edges allowed to be added

Output:

Avg Grid : average distance between nodes in the graph

Avg Tlps : average distance of nodes after adding extra k edges

Q2. Given an S(n,k) scale free network using preferential attachment model. Add k' more edges to be network such that the average distance in the graph is minimized. No inbuilt function must be used.

Input:

n,k: n number of nodes, k edges per incoming node

k': the number of extra edges allowed to be added

Output:

Avg Grid : average distance between nodes in the graph

Avg Tlps : average distance of nodes after adding extra k edges

Lab 15 : Take home 3

1. Create a Synthetic Scale Free Network

A synthetic scale free network with n nodes and k links per node, is gener-

ated the following way:

1. Create a new graph with k vertices {v1, v2, . . . , vk} with all the nodes adjacent to each other (a.k.a a k-clique)

2. for i = k + 1 to n do the following:

{

A new vertex v_i is introduced in the graph which picks k existing vertices randomly (with distribution proportionate to the degree of the vertices) and creates a link to each of these k vertices.

}

Write a python function scale free(n,k), which creates a graph G as per the algorithm given above, e.g., G =scale free(1000, 10) should return a

graph G with n = 1000 and k = 10.

Note: Do not use the built-in function to generate a scale-free network. You must write a code based on the above algorithm.

2. Missing Edges

The Big Question: Assume we remove a few edges from such a graph, is

it possible for us to find out the missing edges?

Given that you know the number of edges that are removed from the

Graph (let us call this ∆), can you guess the missing edges?

Write a python function reconstruct graph(G,∆) which takes as input, the graph G with ∆ number of missing links and reconstructs the original graph.

Course Overview

Description

This is a dual level course for 3rd/4th year B.Tech CSE students and is an advanced course assuming Data structures and Discrete Mathematics as pre-requisites. You may consider talking to the following students who have credited this course, to get more details:

1. Yayati Gupta (PhD student, CSL 709 2013 spring semester)

2. Varsha Bhat (PhD student, CSL 709 2013 spring semester)

3. Jaspal Singh (B.Tech student, CSL 709 2015 Spring Semester)

4. Uzair Khan (B.Tech student, CSL 709 2015 Summer semester)

5. Shubham Kumar (B.tech student, CSL 709 2015 Summer semester)

I strongly recommend that you watch the video in the below link for a great introduction to the subject:

The syllabus will comprise of the following:

• Random Graph Theory and its applications
• Scale free networks
• Small world networks
• Search on networks
• Centrality measures
• Clustering coefficient and other network parameters
• Community detection techniques
• Introduction to Social networks

The course will be divided into 4 quarters:

 Quarter I (The Motivation) The first quarter will comprise of reading of a couple of popular science books in this subject followed by a few talks. We will cover the basics of Network science. Quarter II (The Nitty-Gritties) The second quarter will comprise of lectures on a few important topics in the subject. Quarter III (The State of the Art) The third quarter will involve reading and understanding a bunch (around 10) of research papers. Quarter IV (Specialization) The fourth quarter will involve discussion/brainstorming of research ideas on the papers read in the previous quarter followed by activity reports.

Textbook

There isn't a prescribed text book for the course, we will be referring to quite a few books:

1. Networks: An Introduction by Mark Newman

2. Networks, Crowds and Markets by J. Kleinberg

3. Network Science Book by Albert-laszlo Barabasi

4. Understanding Social Networks: Theories, Concepts, and Findings by Charles Kadushin

5. Social and Economic Networks by Matthew Jackson 6. Connected by Nicholas A. Christakis

7. Linked by Albert-laszlo Barabasi 8. Six Degrees: The Science of a Connected Age by Duncan J. Watts

Course Details

Teaching Assistant

This course has one formal and several informal (volunteered) teaching assistants and student mentors :

Varsha Bhat (varsha.bhat@iitrpr.ac.in)

Office hours: (to be announced soon), Research Scholar's lab, Room #120

Informal teaching assistants and student mentors:

Akrati Saxena (akrati.saxena@iitrpr.ac.in)

Yayati Gupta (yayati.gupta@iitrpr.ac.in)

Anamika Chhabra (anamika.chhabra@gmail.com)

E-mail Policy

I request that you talk to me and the instructor during the office hours or immediately after the class. The mailing list for the course is csl709@iitrpr.ac.in, you must ensure that you are subscribed to this mailing list. In case of any questions, you are requested to post them on the group and not email us personally, (unless there is something way too personal). Under no circumstances should you call/message the TA/instructor over phone. Please avoid contacting us over phone/email, requesting for the postponing of the class or extension of deadline or any similar requests. In case there is any request, you may please post it on the mailing list : csl709@iitrpr.ac.in

The course comprises of 3 lecture hours and 2 hours of lab. Grading will be based on :

programming assignments, quizzes, activities, minor, major, attendance.

The grading will be relative, based on the following criteria:

 [85,100] A [80,85) A- [70,80) B [65-70) B- [55-65) C [50-55) C- [35-50) D [0-35) F

Distribution of Marks

 Programming exam 25% Quizzes 15% Activity 20% Attendance 05% Minor 10% Major 25%

Programming Lab

There will be regular lab sessions where exercise problems will be handled. There will be six lab tests that will be graded. Three will be take home and the remaining three will be in house. The best of two from both the sets will be considered for grading. The considered four tests will be evaluated for 25M.

Quiz

We will have regular quiz sessions to test your understanding of the concepts taught in the previous classes. If you are attentive in the class, make notes and read them once before the class, you will do very well and will most certainly secure full marks. The number of quizzes isn't pre-decided, your marks will be normalized and will comprise of 15% of the course weightage. Each quiz will have a boolean marking scheme where you are awarded full marks if you have made a sincere attempt and obtained majority of the answers correct.

Activity

Activity will comprise of the following parts:

• Research idea presentation on a selected paper which is discussed in the third quarter [10M]
• Submission of a project, approved by professor, on any Network Science topic of your interest [10M]

Attendance

Attendance for the course is compulsory. If you secure less than 75% attendance, you will be given an F grade, which is as per the institute rule. Attendance carries 0-5 marks, details of which is given in the table below:

 [95,100] 5 [90,95) 4 [85,90) 3 [80,85) 2 [75,80) 1 [0,75) F

I assume you all have undergone a course in real analysis and know what one means by a closed[] and open interval() :-)

Minor and Major

There will be two exams in the course, minor and major. The syllabus for the minor exam will comprise of the topics discussed in the first two quarters. The major exam would comprise of all the topics covered throughout the semester. However, the emphasis would be on topics covered in the last two quarters.