CSL 709 - Network Science - Homepage

CSL 709 - Network Science

(Fall 2015)

IIT Ropar

Prof. S. R. S. Iyengar

Class timings

Wednesday (2:25PM - 3:15PM)

Thursday (3:20PM - 4:10PM)

Friday (4:15PM - 5:05PM)

Lab timings

Friday 10:50AM - 12:35PM

Venue: Main Building, L4

Mailing list:Click here

Details of the Classes Conducted

Date and time | Topics covered | Cumulative hours | |

1 | 29th July 02:25 PM -03:15 PM | Course structure and Overview | 1 |

2 | 30th July 03:20 PM - 04:10 PM | Introduction | 2 |

3 | 31st July 04:15 PM - 05:05 PM | Discussion of the first three chapters of Linked | 3 |

4 | 5th August 02:25 PM - 03:15 PM | Small world phenomenon, hubs and connectors | 4 |

5 | 6th August 03:20 PM - 04:10 PM | Granovetter experiment and Strength of weak ties | 5 |

6 | 7th August 04:15 PM - 05:05 PM | Degree and clustering coefficient of nodes in real world networks. | 6 |

7 | 12th August 02:25 PM -03:15 PM | Quiz 1 | 7 |

8 | 13th August 03:20 PM - 04:10 PM | Brief introduction to the centrality measures of nodes in a network | 8 |

9 | 14th August 04:15 PM - 05:05 PM 06:00 PM - 07:00 PM | Centrality measures continued, Katz centrality Page ranking activity | 9 10 |

10 | 19th August 02:25 PM - 03:15 PM | Google page ranking algorithm along with the mathematics behind it. | 11 |

11 | 20th August 03:20 PM - 04:10 PM | Google page ranking continued - the random walk ranking | 12 |

12 | 21st August 04:15 PM - 05:05 PM | Dunbar’s number, strong and weak ties, triadic closure property, local bridges, relation of local bridges and weak ties; | 13 |

13 | 26th August 02:25 PM - 03:15 PM | Quiz 2 | 14 |

14 | 27th August 03:20 PM - 04:10 PM | Quick recap of chapter 3, empirical results to prove Granovetter’s predictions, strength of ties on Facebook and Twitter, embeddedness. | 15 |

15 | 2nd September 02:25 PM - 03:15 PM | Structural holes, Girvan–Newman algorithm for community detection. References to take a look at: - Brandes, Ulrik. "A faster algorithm for betweenness centrality*." Journal of Mathematical Sociology 25.2 (2001): 163-177.
- Girvan, Michelle, and Mark EJ Newman. "Community structure in social and biological networks." Proceedings of the national academy of sciences 99.12 (2002): 7821-7826.
| 16 |

16 | 3rd September 10:00 AM - 12:00 noon | Social contagion; benefits of their study; Homophily; Selection vs Social influence; affiliation networks. | 17-18 |

17 | 9th September 02:25 PM - 03:15 PM | Focal closure; Membership closure; Analysis and quantification of above using various datasets; | 19 |

18 | 10th September 03:20 PM - 04:10 PM | Fat men hypothesis and cascading behaviour, spatial segregation model (schelling model) and the interpretation | 20 |

19 | 11th September 04:15PM - 05:05PM | Chapter 13- internet of things, WWW, Bow-tie structure, Web 2.0 | 21 |

20 | 23rd September 04:15PM - 5:05PM | Discussed the various techniques to maximize the node pairs with inverted centralities. | 22 |

21 | 29th September 10:50AM - 12:35PM | Hubs and authorities; Bonacich centrality; principle of iterative improvement | 24 |

22 | 30th September 02:25 PM - 03:15 PM | Quiz; Human network navigation and the results discussed | 25 |

23 | 1st October 03:20 PM - 04:10 PM | Discussion on phylogenetics network and the study on co-authorship network of researchers in the same field | 26 |

24 | 8th October 03:20 PM - 04:10 PM | Continued discussion on the Word-Morph game | 27 |

25 | 14th October 02:25 PM - 03:15 PM | Quiz 4 , Discussion on the emergence of scaling in Random graphs - Coin tossing example and reason for exponential decrease about the mean; Expected degree distribution of the Web Graph and its comparison with reality- polynomial decay | 28 |

26 | 15th October 03:20PM - 04:10 PM | Lab Activity - building the dataset for looking at the induced network on the users of Wikipedia | 29 |

21st October 02:25 PM - 03:15 PM | Small world phenomenon - Milgram’s experiment and its implications; Approach of decentralized search - the greedy algorithm and its implication on grid network; myopic search; design of teleportation points | 30 | |

27 | 28th October 02:25 PM - 03:15 PM | Discussion and idea presentation on “Understanding human navigation using network analysis” | 31 |

28 | 29th October 03:20PM - 04:10 PM | Discussion and idea presentation on “Understanding Spreading Patterns on Social Networks Based on Network Topology” | 32 |

29 | 30th October 10:50AM - 12:35PM | Continued discussion on “Understanding Spreading Patterns on Social Networks Based on Network Topology” Discussion and idea presentation on “Deanonymizing Social Networks” Discussion and presentation on “Shifting Behaviour of Users: Towards understanding the Fundamental Law of Social Networks” | 35 |

30 | 4th November 02:25 PM - 03:15 PM | Discussion on Pseudo-Cores: The Terminus of an Intelligent Viral Meme's Trajectory/ Understanding Spreading Patterns on Social Networks Based on Network Topology) | 36 |

31 | 5th November 03:20PM - 04:10 PM | Discussion on - ”On estimating Average Degree” | 37 |

32 | 6th November 09:00AM - 01:00PM 04:15PM -05:15PM | Continued discussion on - ”On estimating Average Degree” Discussion on “Skillset distribution for Accelerated knowledge building in crowdsourced environments” Project Presentations by all students | 42 |

33 | 18th November |

Details of the Labs Conducted

Lab 1

31st July, 2015 10:50AM - 12:35PM

Introduction to python packages of networkx.

- compl_discon(G): This function returns 0 if the graph is connected and it returns the diameter of the complement of G, if the graph is disconnected. Note: Complement of a disconnected graph is always a connected graph.

- random_graph(n,p): Returns G(n,p), where G(n,p) is called an Erdos-Renyi random graph with n nodes, where each edge is included in the graph with probability p independent from every other edge.

The above programs must not use built in functions available in networkx.

Lab 2

7th August, 2015 10:50AM - 12:35PM

Of the various concepts of Network Science discussed in class, you are required to pick a topic of your choice and implement it on python and state your observations.

Lab 3

14th August, 2015 10:50AM - 12:35PM

- Write a function Grid(n) to generate a grid network of size n x n. Find the betweenness centrality and the closeness centrality of nodes in the network. Report whether the betweenness centrality and the closeness centrality of the nodes match.

- Take a real world network from SNAP and mark the importance of nodes in the network by taking a random walk on the network and report your observations.

Lab 4

21st August, 2015 10:50AM - 12:35PM

- Generate the graph of the students who participated pagerank activity.

- Generate the ranking of the students based on the Google page ranking algorithm with the help of the adjacency matrix A of the underlying graph. This is done by multiplying the matrix [A+I] k with the vector [1n]
- Generate the ranking by taking a random walk on the graph as discussed in class.

Compare the ranking of nodes in the graph based on the two techniques.

Lab 5 : Take home Test - 1

- Let the betweenness centrality ranking of a node ‘v’ be given by CB(v) and the closeness centrality ranking by CC(v). Write a function that takes input as a graph ‘G’ and find the number of unordered node pairs (u,v) such that [ ( CC(u)-CC(v) ) (CB(u)-CB(v) ) ] < 0

count_inv_pairs() : function to count the node pairs that have inverted rankings based on the two centrality measures considered.

Input : file.txt that contains the edge list of a graph G

Output : N is the number of pairs that respect the condition specified

- Let S be the set of all those node pairs as mentioned above. That is,

S = { (u,v) | u,v ∊ G(V) and [ ( CC(u)-CC(v) ) (CB(u)-CB(v) ) ] < 0 }

Construct a graph on 100 nodes G so as to maximize |S|.

genG() : function that generates a graph on 100 nodes which maximize S.

Output : file.txt that contains the edge list of the graph generated.

Please include the output file also in the folder along with the code and a readme file that gives the details of the strategy used to generate the underlying graph.

Lab 6

28th August, 2015 10:50AM - 12:35PM

- “Threshold function” for a G(n,p) graph is that value of p (seen as a function of n: p(n)) below which a particular graph property ceases to exist while, above the value ‘p’ the property always exists.

Write a function that determines the threshold in G(n,p) for a giant component to exist. [Please note: you are to determine the threshold ‘p’ as a function of n]

Output: a plot that shows the threshold ‘p’ for increasing values of ‘n’.

Lab 7

4th September, 2015 10:50AM - 12:35PM

- Download any network that is available on SNAP.

Apply the Girvan Newman algorithm on the network to identify the communities in the given network.

Display the original network as well as the network with communities identified.

Also, return a list of lists such that each inner list contains the edge list of a particular community.

Lab 8: Lab Test - 1 (in house)

11th September, 2015 10:50AM - 12:35PM

Q1) Implement the below algorithm:

- In the first round, all nodes with one or less edges are removed from the graph, and all their edges are deleted. all the deleted nodes are considered to be put into bucket 1.
- In the second round, all nodes with two edges or less are removed from the graph and their edges deleted. The nodes removed in this round would now belong to bucket 2.
- Similarly, at the i-th iteration all nodes with less than 'i' neighbours are removed from the graph and put into i-th bucket. Note that removal of a node with k or fewer neighbours is done recursively. If removal (with the k links) exposes a new site with now less than k neighbours it is removed in the current iteration as well.

Input: a given graph G

Output: (k,L)

A list of lists L where the 1st inner list has nodes belonging to bucket 1, and so on. k is the total number of buckets.

Include a read-me file that has a complete analysis of the algorithm. Please ensure to mention what this algorithm is achieving, where can it be used and any other inferences that can be made.

Q2) As observed in one of the previous lab assignments, with increase in the edge probability 'p' of an ER-Graph, there is a special 'p=p0' at which a giant component emerges in the graph G(n,p). Similarly, there is a special 'p=p1' at which triangles appear in G(n,p). Find this threshold function empirically for the existence of triangles.

Output: plot the relation of p1 and n.

Include a read-me file where you mention the exact function for the threshold.

Lab 9

23rd September, 2015 10:50AM - 12:35PM

Propose a model for the spread of any contagion. Include a readme file that contains the following:

- Description of the model
- Assumptions regarding the spread
- A few observations regarding the model considered.

Input: G(V,E) given in the form of an adjacency list

Output: Show the stepwise process of spread of contagion by displaying infected nodes with a different colour across different time steps.

Lab 10: Take home Test - 2

3 October 2015

(a)Write a function genScore() that takes as input - k, G; where k is the number of iterations that need to be performed and G is a graph that is given in the form of edge list given in file “input.txt”. The function must output the hub score (H) and the authority score (A) for all nodes after performing k iterations as a list. You can output the result in the terminal itself.

(b)Given a directed graph G(V,E) we can compute the hub score(H) and the authority score(A) for each node. Generate a graph G(V,E) having |V|=30 nodes such that:

For all nodes i ∊ V, Res= Σi |Hi - Ai| is the maximum.

Any assumptions made must be listed in the readme-file.

Output: Res

Lab 11 : Lab Test - 2 (in house)

10 October 2015, 2:00 PM - 04:00 PM

Given a set of authorities and hubs, we saw how to generate the hub score and the authority score using the principle of iterated improvements. Generate a graph G(V,E) such that the overall ranking of a hub based on the hub score improves in the kth iteration, where k is a parameter that is to be taken as the input.

Input: k iteration at which the ranking must improve

Output: The constructed graph G(V,E). The list of hubs and the list of authorities.

The hub and authority scores for all nodes up to the kth iteration. The improvement in the ranking (i.e difference between the previous and current ranking)

Lab 12 :

15th October 2015, 03:20 PM - 04:10 PM

Creating entries in the QWiki to see if Q/A is essential for knowledge building repositories and observing the network dynamics of the same.

Lab 13 :

16 October 2015, 10:50AM - 12:35PM

Community based random walk model to detect if two nodes belong to the same community on not based on the intersection of the number of random walkers that have commonly visited the nodes under consideration.

Lab 14 : Lab Test - 3 (in house)

9 November 2015 , 08:40 PM - 11:15 PM

Ideal Teleportations

Q1. Generate an mXn grid network. Add k more edges to the network such that the average distance in the graph is minimized. No inbuilt function must be used.

Input:

m,n: dimensions of the grid

k: the number of extra edges allowed to be added

Output:

Avg Grid : average distance between nodes in the graph

Avg Tlps : average distance of nodes after adding extra k edges

Q2. Given an S(n,k) scale free network using preferential attachment model. Add k' more edges to be network such that the average distance in the graph is minimized. No inbuilt function must be used.

Input:

n,k: n number of nodes, k edges per incoming node

k': the number of extra edges allowed to be added

Output:

Avg Grid : average distance between nodes in the graph

Avg Tlps : average distance of nodes after adding extra k edges

Lab 15 : Take home 3

1. Create a Synthetic Scale Free Network

A synthetic scale free network with n nodes and k links per node, is gener-

ated the following way:

1. Create a new graph with k vertices {v1, v2, . . . , vk} with all the nodes adjacent to each other (a.k.a a k-clique)

2. for i = k + 1 to n do the following:

{

A new vertex v_i is introduced in the graph which picks k existing vertices randomly (with distribution proportionate to the degree of the vertices) and creates a link to each of these k vertices.

}

Write a python function scale free(n,k), which creates a graph G as per the algorithm given above, e.g., G =scale free(1000, 10) should return a

graph G with n = 1000 and k = 10.

Note: Do not use the built-in function to generate a scale-free network. You must write a code based on the above algorithm.

2. Missing Edges

The Big Question: Assume we remove a few edges from such a graph, is

it possible for us to find out the missing edges?

Given that you know the number of edges that are removed from the

Graph (let us call this ∆), can you guess the missing edges?

Write a python function reconstruct graph(G,∆) which takes as input, the graph G with ∆ number of missing links and reconstructs the original graph.

Course Overview

Description

This is a dual level course for 3rd/4th year B.Tech CSE students and is an advanced course assuming Data structures and Discrete Mathematics as pre-requisites. You may consider talking to the following students who have credited this course, to get more details:

1. Yayati Gupta (PhD student, CSL 709 2013 spring semester)

2. Varsha Bhat (PhD student, CSL 709 2013 spring semester)

3. Jaspal Singh (B.Tech student, CSL 709 2015 Spring Semester)

4. Uzair Khan (B.Tech student, CSL 709 2015 Summer semester)

5. Shubham Kumar (B.tech student, CSL 709 2015 Summer semester)

I strongly recommend that you watch the video in the below link for a great introduction to the subject:

https://www.youtube.com/watch?v=2rzxAyY7D7k

The syllabus will comprise of the following:

- Random Graph Theory and its applications
- Scale free networks
- Small world networks
- Search on networks
- Centrality measures
- Clustering coefficient and other network parameters
- Spreading models and Virality
- Community detection techniques
- Introduction to Social networks

The course will be divided into 4 quarters:

Quarter I (The Motivation) | The first quarter will comprise of reading of a couple of popular science books in this subject followed by a few talks. We will cover the basics of Network science. |

Quarter II (The Nitty-Gritties) | The second quarter will comprise of lectures on a few important topics in the subject. |

Quarter III (The State of the Art) | The third quarter will involve reading and understanding a bunch (around 10) of research papers. |

Quarter IV (Specialization) | The fourth quarter will involve discussion/brainstorming of research ideas on the papers read in the previous quarter followed by activity reports. |

Textbook

There isn't a prescribed text book for the course, we will be referring to quite a few books:

1. Networks: An Introduction by Mark Newman

2. Networks, Crowds and Markets by J. Kleinberg

3. Network Science Book by Albert-laszlo Barabasi

4. Understanding Social Networks: Theories, Concepts, and Findings by Charles Kadushin

5. Social and Economic Networks by Matthew Jackson 6. Connected by Nicholas A. Christakis

7. Linked by Albert-laszlo Barabasi 8. Six Degrees: The Science of a Connected Age by Duncan J. Watts

Course Details

Teaching Assistant

This course has one formal and several informal (volunteered) teaching assistants and student mentors :

Varsha Bhat (varsha.bhat@iitrpr.ac.in)

Office hours: (to be announced soon), Research Scholar's lab, Room #120

Informal teaching assistants and student mentors:

Akrati Saxena (akrati.saxena@iitrpr.ac.in)

Yayati Gupta (yayati.gupta@iitrpr.ac.in)

Anamika Chhabra (anamika.chhabra@gmail.com)

E-mail Policy

I request that you talk to me and the instructor during the office hours or immediately after the class. The mailing list for the course is csl709@iitrpr.ac.in, you must ensure that you are subscribed to this mailing list. In case of any questions, you are requested to post them on the group and not email us personally, (unless there is something way too personal). Under no circumstances should you call/message the TA/instructor over phone. Please avoid contacting us over phone/email, requesting for the postponing of the class or extension of deadline or any similar requests. In case there is any request, you may please post it on the mailing list : csl709@iitrpr.ac.in

Grading

The course comprises of 3 lecture hours and 2 hours of lab. Grading will be based on :

programming assignments, quizzes, activities, minor, major, attendance.

Grading Criteria

The grading will be relative, based on the following criteria:

[85,100] | A |

[80,85) | A- |

[70,80) | B |

[65-70) | B- |

[55-65) | C |

[50-55) | C- |

[35-50) | D |

[0-35) | F |

Distribution of Marks

Programming exam | 25% |

Quizzes | 15% |

Activity | 20% |

Attendance | 05% |

Minor | 10% |

Major | 25% |

Programming Lab

There will be regular lab sessions where exercise problems will be handled. There will be six lab tests that will be graded. Three will be take home and the remaining three will be in house. The best of two from both the sets will be considered for grading. The considered four tests will be evaluated for 25M.

Quiz

We will have regular quiz sessions to test your understanding of the concepts taught in the previous classes. If you are attentive in the class, make notes and read them once before the class, you will do very well and will most certainly secure full marks. The number of quizzes isn't pre-decided, your marks will be normalized and will comprise of 15% of the course weightage. Each quiz will have a boolean marking scheme where you are awarded full marks if you have made a sincere attempt and obtained majority of the answers correct.

Activity

Activity will comprise of the following parts:

- Research idea presentation on a selected paper which is discussed in the third quarter [10M]
- Submission of a project, approved by professor, on any Network Science topic of your interest [10M]

Attendance

Attendance for the course is compulsory. If you secure less than 75% attendance, you will be given an F grade, which is as per the institute rule. Attendance carries 0-5 marks, details of which is given in the table below:

[95,100] | 5 |

[90,95) | 4 |

[85,90) | 3 |

[80,85) | 2 |

[75,80) | 1 |

[0,75) | F |

I assume you all have undergone a course in real analysis and know what one means by a closed[] and open interval() :-)

Minor and Major

There will be two exams in the course, minor and major. The syllabus for the minor exam will comprise of the topics discussed in the first two quarters. The major exam would comprise of all the topics covered throughout the semester. However, the emphasis would be on topics covered in the last two quarters.

Academic Integrity

We are overly and overtly serious when it comes to academic integrity. We have 0 tolerance towards students who cheat. Action against those who cheat will involve not less than assigning them an F grade followed by immediate expulsion from the course. Over and above this, the case will be reported to the disciplinary committee which may decide to suspend the student for a semester or two. To repeat, you must work alone on homework problems and not discuss them with anyone. The only materials you can refer when working on your problem sets are your course notes and the assigned textbook. You may not refer to online sources or talk to others. When you are in doubt, ask the instructor or TA what is allowed and what isn't. Just in case you have overseen/suspect someone copying in the quiz/exam/problem sets, dutifully report it to us, we will scrutinize their scripts and will initiate action against them if they are found guilty. We assure that your identity will be kept anonymous. We request that you report immediately (even if you are partially suspicious of any such malpractice amongst your classmates.)

Anonymous Feedback/Suggestion

Click here to submit an anonymous feedback/suggestion to the instructor/TA. Please be constructive in your criticism. Do it with the objective of helping us give our best to you. If you are highlighting on any course related problem/issue, be it teaching, evaluation, grading, etc., try proposing how would you want us to handle it otherwise.