1 of 131

KliqueFinder: �Identifying Clusters in Network Data

1

2 of 131

2

KliqueFinder (install latest version in C:\kliqfind)

3 of 131

Overview

3

4 of 131

Visualization of Close Collegial Ties and Best Practices in �the Regional Ravine Network (time 1)

  • Black Lines represent close colleagues at time 1
  • Colors represent clusters identified by KliqueFinder using the close colleague data at time 1.
  • Evidence of clusters: p < .01
  • Size of dot represents extent of use of climate change in ravine management

BEST PRACTICES

  • Yellow ring indicates member of the Regional Advisory Group (RAG) -- Intervention

Structural hole

5 of 131

Blue lines represent newly formed close colleagues that at time 2. Black lines represent close colleague nominations present at time 1. Nodes with a yellow ring represent Regional Advisory Group members targeted for the intervention.

Visualization of Close Collegial Ties and Best Practices in the Regional Ravine Network (time 2)

6 of 131

Measure of use of Climate Change in Practice

7 of 131

Questions about Networks

8 of 131

Why Look for Clusters and Graphical Representations

  • Understand pattern of interactions in the whole social space
    • How are ties or relations organized?
  • Locate actors in clusters/subgroups which shape identity within the larger social system
    • What are the foci of subgroups, how are actors influenced by subgroup members?
  • Locate resources and potential flows through the whole network
    • Which actors and subgroups provide resources, lead innovations?

8

9 of 131

Goal: �Identify Patterns in the Network

  • Rearrange rows and columns of social network matrix to reveal clustering
  • Plot actors and ties in two dimensions to reveal clustering

9

10 of 131

Theory for defining cluster membership

  • cohesion (clusters are called subgroups): an actor should be in a cluster if the actor has demonstrated a preference for engaging in ties with members of the cluster.
    • Result: ties are concentrated within subgroups
  • structural equivalence (blocks): an actor should be in a cluster if the actor engages in a similar pattern of ties as members of that cluster.
    • Result: blocks represent positions, but ties not necessarily concentrated within blocks.

10

11 of 131

Crystallized Sociogram: Friendships Among the French Financial Elite

Lines indicate friendships:

solid within subgroups,

dotted between subgroups.

numbers represent actors

Rgt,Cen,Soc,Non = political parties;

B=Banker, T=treasury; E=Ecole National D’administration

Frank, K.A. & Yasumoto, J. (1998). "Linking Action to Social Structure within a System: Social Capital Within and Between Subgroups." American Journal of Sociology, Volume 104, No 3, pages 642-686

11

12 of 131

Crystallized Sociogram: Clusters in Foodwebs

Krause, A., Frank, K.A., Mason, D.M., Ulanowicz, R.E. and Taylor, W.M. (2003). "Compartments exposed in food-web structure." Nature 426:282-285

12

13 of 131

Data Input

13

Old (10 spaces for each)

New: flexible columns,

Same results

File name must be less than 20 character. Best if file name is six characters

followed by .list: xxxxxx.list . For example stanne.list

Actor 1 interacts with actor 2 at a level of 3

Extent of relation can be binary or weighted

Prepping data in excel

Prepping Data in UCINET

Converting data using sas

ID’s should be 6 digits or less

14 of 131

Data

Actor 1 interacts with actor 2 at a level of 3

Extent of relation can be binary or weighted

Best if file name is six characters

followed by .list.

xxxxxx.list

For example stanne.list

New version of KliqueFinder is more flexible

About 10 column widths.

ID’s should be 6 digits or less

Prepping data in excel

Prepping Data in UCINET

Converting data using sas

Edgelist

First two rows do not appear in the data –

I put them there to show the format: 10

spaces for each entry

14

15 of 131

Steps for Finding Clusters���

1) Determine criterion for defining clusters

2) Maximize criterion

3) Examine evidence of clusters

4) Evaluate performance of the algorithm

5) Interpret clusters

commonality of attributes

focal experiences

subsequent behavior

15

16 of 131

Step 1) Criteria for Determining Group Membership

Structural Equivalence:

Factor analyze sociomatrix (Katz & Kahn)

iteratively rearrange and revalue rows and columns (CONCORR -- White el al., 1976)

Cohesion

utilize fixed criteria (e.g., must be connected to at least k others in clusters, or must be minimal path length from k others, etc).

use flexible criterion -- preference relative to cluster sizes and number of ties:

16

17 of 131

Model Based Cohesion

Wii’=1 if tie between actors i and i’, 0 otherwise

Samecluster ii’ = 1 if actors i and i’ are members of the same cluster,

0 otherwise.

Then θ1 represents cluster salience:

So ...... Maximize θ1 (odds ratio)

17

Same clusterii’

18 of 131

Odds Ratio for Association Between Common Cluster Membership and Relation Between Actors

18

Tie or relation occurring

Cluster

membership

Cluster

Cluster

Cluster

Cluster

Cluster

Odds ratio=(AD)/(BC)= Absence of relation outside of cluster * Presence of relation within cluster

Presence of relation outside of cluster * Absence of relation within cluster

19 of 131

Step 2: Maximizing Criterion

1) Find a cluster seed (3 actors who interact with each other, and with similar others)

2) Add to the cluster to maximize θ1 until you cannot do any more

3) Start new cluster with new seed

4) Shuffle between existing clusters

5) Make new clusters as necessary, dissolve existing ones as necessary.

19

20 of 131

KliqueFinder Algorithm: Phase I

Find cluster seed of 2 or 3

Identify single move that most increases objective function θ1

Does move

increase function?

yes

Reassign actor that makes best move

No

If assignment moves actor out of a cluster of 3, reassign remaining 2 to next best clusters

For finding best cluster seed:

1) can only choose from unaffiliated

actors

2) Each actor can only be a seed once

Initialize: assign each actor to own cluster

Adding clusters

removing clusters

21 of 131

KliqueFinder Algorithm:� Phases II and III

  • Phase II: If best move does not increase objective function and there are fewer than 3 actors available for clusters then
    • Attach all isolated (or unaffiliated) actors to best existing clusters, even if this reduces objective function
  • Phase III: shuffle actors between existing clusters without seeding new ones or disbanding existing ones
    • Number of clusters is fixed
    • This is simple hill climbing and can be cast as EM algorithm

22 of 131

KliqueFinder in R

22

23 of 131

KliqueFinder in R

23

24 of 131

24

# https://github.com/jtbates/kliqfindr

# https://github.com/r-lib/devtools

###(if you can not intall the package, please see the webpage above for more information)

#run lines 4-8 then stop

install.packages('devtools')

install.packages("igraphdata")

install.packages("igraph")

install.packages("igraphdata")

install.packages('RColorBrewer')

#run line 10 then stop

devtools::install_github("jtbates/kliqfindr")

#run line 13

library(kliqfindr)

# set working directory

getwd()

setwd("C:/Users/kenfrank/OneDrive - Michigan State University/H Drive/my web page")

getwd()

# add your list file to your working directory

test <- winkliq_run('sem020.list')

## Note here: you do not need to read in the list file as a dataframe

test

## see the result after running Kliqfindr

head(test$place)

## see the column named as "actor" and "subgroup" - which actor is assigned to which subgroup

groupbykf <- test$place[,c("actor", "subgroup")]

groupbykf

groupbykf <- groupbykf[order(groupbykf$actor),]

groupbykf

## groupbykf contains cluster information based on kliqfinder algorithm

## groupbykf is sorted by node id

# now we want to use igraph to plot this network

library(igraph)

# read in the data from the previous example

ties <- read.csv("sem020.csv", header=T, as.is=T)

nodes <- read.csv("sem020_node.csv", header=T, as.is=T)

# examine the data

head(ties)

head(nodes)

# check whether there are duplicates rows in the tie and node data files

nrow(ties); nrow(unique(ties[,c("sender","receiver")]))

nrow(nodes); length(unique(nodes$node))

#library(igraph)

# turn the data into igraph objects

network <- graph_from_data_frame(d=ties, vertices=nodes, directed=T)

# visualize this network with different colors indicating group members identified by klifindr

V(network)$community <- groupbykf$subgroup

plot(network, edge.arrow.size=.2, vertex.color=V(network)$community,

vertex.label=V(network)$fname,vertex.size=5*V(network)$attr1,layout=layout_with_fr)

25 of 131

Running KliqueFinder

  • Click on Browse…” button to specify the directory where the data file is located.

25

26 of 131

KliqueFinder

  • Choose Basic setup and then click Run setup file button.

26

27 of 131

KliqueFinder

  • Click on the Browse button to choose a data file.

27

28 of 131

Run Analysis

Data file

28

29 of 131

New Version of Data Input more Flexible

29

Old (10 spaces for each)

New: flexible columns,

Same results

File name must be less than 20 characters

ID’s should be 6 digits or less

Actor 1 interacts with actor 2 at a level of 3

Extent of relation can be binary or weighted

Prepping data in excel

Prepping Data in UCINET

Converting data using sas

30 of 131

View Clusters Output

30

31 of 131

Blocked Network Data

N Group And Actor Id

24 |AAAA|BBBBBB|CCCCCCCC|DDDDDD|

| | | | |

| 2 1|221 1| 11 2|111122|

Group ID|7445|612214|98133560|796037|

------------+----+------+--------+------+

1 A 7|A213|......|........|...1..|

1 A 24|4A3.|......|.4......|......|

1 A 4|33A.|......|........|......|

1 A 15|433A|......|........|......|

------------+----+------+--------+------+

2 B 26|.2..|B443..|........|......|

2 B 21|.1..|4B....|...4....|....2.|

2 B 12|....|4.B...|........|......|

2 B 2|....|33.B..|........|...1..|

2 B 1|..3.|3..3B.|........|.3..2.|

2 B 14|....|....1B|........|......|

------------+----+------+--------+------+

3 C 9|....|......|C...3.33|.3....|

3 C 8|.4..|..4...|.C.4..4.|4.....|

3 C 11|....|......|33C.4.3.|..4...|

3 C 13|.4..|.4....|444C....|......|

3 C 3|3...|.4....|4.44C...|......|

3 C 5|.1..|.....4|3.2.3C..|......|

3 C 6|....|......|444..4C4|......|

3 C 20|....|......|3..3.44C|......|

------------+----+------+--------+------+

4 D 17|.1..|......|.1......|D.1...|

4 D 19|....|......|4.3.....|3D4...|

4 D 16|....|......|4..4...4|44D...|

4 D 10|..3.|...1..|........|...D3.|

4 D 23|....|.3....|........|.343D.|

4 D 27|.1..|.1....|........|.3..3D|

θ1 =1.1738

31

32 of 131

Step 3) Examine evidence of clusters�

1) randomly redistribute relations

2) apply algorithm

3) record value of odds ratio and θ1

4) repeat 1000 times to generate distribution

5) use mean of distribution as baseline for comparison

32

33 of 131

Randomly Redistributing Ties

33

34 of 131

Apply Algorithm to Random Data,

34

θ1=.81822

35 of 131

Monte Carlo Sampling Distribution�video: (1:06:35-1:18:50) ID: kenfrank@msu.edu PW:kenfrank2014

Output in sampdist.dat

θ1=Log odds/2 Odds Ratio

Set up sampling. Remember

to do “new data” set up when done

To prepare for next analysis

Indicate simulate data

Data can include weights

35

36 of 131

Code for Reading in Sample Distribution Data

GET DATA

/TYPE=TXT

/FILE="C:\KLIQFIND\sampdist.dat"

/FIXCASE=1

/ARRANGEMENT=FIXED

/FIRSTCASE=1

/IMPORTCASE=ALL

/VARIABLES=

/1 theta1 0-29 F30.10

oddsratio 30-59 F30.10

samplesize 60-89 F30.10.

CACHE.

EXECUTE.

DATASET NAME DataSet9 WINDOW=FRONT.

DATASET ACTIVATE DataSet9.

GRAPH

/HISTOGRAM=theta1.

spss

title "Sampling distribution for theta1";

data one;

infile "sampdist.dat" missover;

Input theta1 odds1;

proc univariate plot;

var theta1;

SAS

36

Stata

*This command imports the data file

import delimited C:\KLIQFIND\sampdist.dat, delimiter(" ", asstring)

*These commands perform data management:

drop v1

rename v2 theta1

rename v3 oddsratio

rename v4 samplesize

*This command plots histogram for theta1:

hist theta1,freq

37 of 131

Comparison of Sampling Distributions

37

38 of 131

Distribution of θ1base From Application of the Algorithm to Data Simulated Without Regard for Cluster or Subgroup Membership

Observed value:

1.1738

38

39 of 131

Sampling Distribution Parameters

Edit simulation parameters.

First element is number of replications

39

Must keep # of reps in first 5 columns

40 of 131

Approximate p-value Based on Previous Simulations

PREDICTED THETA (1 base) BASED ON SIMULATIONS.

VALUE BASED ON UNWEIGHTED DATA.

0.76985

ESTIMATE OF THETA (1 subgroup processes)

0.40397 (total-predicted=evidence of groups): 1.1738-.76985=.40397

THE TOTAL THETA1 IS:

1.1738

APPROXIMATE TEST OF CONCENTRATION OF TIES

WITHIN SUBGROUPS BASED ON

SIZE OF THETA1 subgroup processes:

THETA1 |

SUBGROUP | APPROX | APPROX

PROCESSES| LRT | P-VALUE

0.40 34.82 0.00

Reject null hypotheses of no clusters:

H01 subgroup processes =0

40

41 of 131

Step 4) Evaluating the Performance of the Algorithm : Did the Algorithm Recover the Correct Clusters?

  • Many algorithms search for optimal clusters. KliqueFinder does not, but how different are the clusters it finds from the optimal or known clusters?

41

42 of 131

Output for Recovery of clusters

PREDICTED ACCURACY: LOG ODDS OF COMMON SUBGROUP

MEMBERSHIP, + OR - .5734 (FOR A 95% CI)

1.4989

The Log odds applies to the following table:

OBSERVED SUBGROUP

DIFFERENT SAME

___________________

| | |

DIFFERENT | A | B |

KNOWN | | |

SUBGROUP |--------|--------|

| | |

SAME | C | D |

| | |

-------------------

THE LOGODDS TRANSLATES TO AN ODDS RATIO OF

4.4766

WHICH INDICATES THE INCREASE IN THE ODDS

THAT KLIQUEFINDER WILL ASSIGN TWO ACTORS TO

THE SAME SUBGROUP IF THEY ARE TRULY IN THE

IN THE SAME SUBGROUP.

42

Specific accuracy for a given data set not known, results predicted from thousands of simulations – see next slide

43 of 131

Odds of Recovery (Toy Example)

1

2

3

4

5

6

1

1

1

0

1

0

2

1

0

0

0

0

3

1

1

0

0

1

4

0

1

1

1

1

5

0

0

0

0

1

6

1

0

0

1

1

Simulated data with known clusters

1

2

3

4

5

6

1

1

1

0

1

0

2

1

0

0

0

0

3

1

1

0

0

1

4

0

1

1

1

1

5

0

0

0

0

1

6

1

0

0

1

1

OBSERVED CLUSTER

DIFFERENT SAME

___________________

| | |

DIFFERENT | | |

KNOWN | A (6)| B (3)|

CLUSTER |--------|--------|

| | |

SAME | | |

| C (2)| D (4)|

-------------------

Observed clusters identified by KliqueFinder

Missassignment of actor 4 contributes 3 to cell B and 2 to cell C

Cell D: 4 pairs correctly assigned to same cluster:

(1,2; 1,3; 2,3; 5,6)

Cell A: 6 pairs correctly assigned to different clusters:

1,5; 2,5; 3,5; 1,6; 2,6; 3,6

Odds of recovery =(AD)/(BC)= 6x4/(3x2)=4.00

44 of 131

Make Sociogram in Netdraw�video  : (1:01:00-1:06:22): �ID: kenfrank@msu.edu PW:kenfrank2014

44

45 of 131

Sometimes Netdraw can’t find file�retrieve manually

45

46 of 131

Modifying Image in Netdraw

46

47 of 131

47

48 of 131

Removing Subgroup Nodes

Remove

group

nodes

48

49 of 131

49

N Group And Actor Id

24 |AAAA|BBBBBB|CCCCCCCC|DDDDDD|

| | | | |

| 2 1|221 1| 11 2|111122|

Group ID|7445|612214|98133560|796037|

------------+----+------+--------+------+

1 A 7|A213|......|........|...1..|

1 A 24|4A3.|......|.4......|......|

1 A 4|33A.|......|........|......|

1 A 15|433A|......|........|......|

------------+----+------+--------+------+

2 B 26|.2..|B443..|........|......|

2 B 21|.1..|4B....|...4....|....2.|

2 B 12|....|4.B...|........|......|

2 B 2|....|33.B..|........|...1..|

2 B 1|..3.|3..3B.|........|.3..2.|

2 B 14|....|....1B|........|......|

------------+----+------+--------+------+

3 C 9|....|......|C...3.33|.3....|

3 C 8|.4..|..4...|.C.4..4.|4.....|

3 C 11|....|......|33C.4.3.|..4...|

3 C 13|.4..|.4....|444C....|......|

3 C 3|3...|.4....|4.44C...|......|

3 C 5|.1..|.....4|3.2.3C..|......|

3 C 6|....|......|444..4C4|......|

3 C 20|....|......|3..3.44C|......|

------------+----+------+--------+------+

4 D 17|.1..|......|.1......|D.1...|

4 D 19|....|......|4.3.....|3D4...|

4 D 16|....|......|4..4...4|44D...|

4 D 10|..3.|...1..|........|...D3.|

4 D 23|....|.3....|........|.343D.|

4 D 27|.1..|.1....|........|.3..3D|

Density = 4/(4x8)=1/8

Kliqfinder uses

Density =4/(4x5)=.20 because

maximum number of nominations is 5

Data used for multidimensional

Scaling within subgroups.

Distance=

maximum value/cell entry

e.g., maximum value is 4,

So a tie of 2 🡪 4/2=2, distance of 2

DIRECT ASSOCIATIONS

GROUP 1 2 3 4

LABEL A B C D

N 4 6 8 6

GROUP

1 2.42 0.00 0.20 0.05

2 0.25 1.07 0.13 0.27

3 0.38 0.40 2.40 0.28

4 0.21 0.17 0.67 1.17

In xxxxxx.clusters

Distance in multidimensional

Scaling between subgroups

=maximum value /density

50 of 131

cohesion

Structural similarity

video: (1:19:15-1:23:40)) ID: kenfrank@msu.edu PW:kenfrank2014

50

51 of 131

Choosing lines: Groups

51

52 of 131

Confidentiality/Ethical issues in Collecting Network Data��

  • Need names on survey

  • Data can be confidential but not anonymous (especially for longitudinal)

  • R.L. Breiger, “Ethical Dilemmas in Social Network Research: Introduction to Special Issue.” Social Networks 27 / 2 (2005): 89 – 93. Read it online. http://www.u.arizona.edu/~breiger/2005BreigerIntroEthics.pdf
    • (All issues of social networks available via science direct)

  • Who benefits from network analysis? Who bears the cost?

    • Kadushin, Charles “Who benefits from network analysis: ethics of social network research” Social Networks 27 / 2 (2005): Pages 139-153.

  • Issues to raise when dealing with Human Subjects Board:

    • Klovdahl, Alden S. Social network research and human subjects protection: Towards more effective infectious disease control Pages 119-137
  • Hint on Human Subjects boards: they like precedents. Once you have one network study accepted, refer to it when submitting others!

52

videovideo : (1:23:41-1:28)ID: kenfrank@msu.edu PW:kenfrank2014

53 of 131

The SRI/KLiqueFinder Solution to confidentiality: aggregate to subgroups�

1) Provide information about who is in which cluster as well as information regarding the resources embedded in each cluster. Resources could be information, expertise, material resources, etc.

Benefit: reveals location of resources relative to social; structure

Protection: does not reveal specific responses because all information is at the cluster level.

2) Provide locations from in a sociogram unique for each respondent, indicating where that person is located (“you are here”). But figure does not include the lines from a sociogram, so respondents cannot infer others’ responses.

Benefit: Respondents then use this as a guide to individual behavior for identifying further resources or information.

Protection: Specific responses of others not revealed, so confidentiality preserved.

53

54 of 131

54

55 of 131

Choosing Lines: Actor Level Within

55

56 of 131

Choosing Lines: Actor Level

Remove

group

nodes

56

57 of 131

Choosing Lines: Actor Level Between

57

58 of 131

Choosing Lines: Group Level

58

59 of 131

Modifying the Image: �Adding Node Data or Relations�videovideo : ID: kenfrank@msu.edu PW:kenfrank2014 : (1:49:35-2:07:48)

59

60 of 131

Files for KliqueFinder

60

xxxxxx.list

Input data

xxxxxx.ilabel

xxxxxx.xnet

Node data

Network data

Alternative

network data

Kliqfind.par

Printo

Simulate.par

Parameters

KliqueFinder

Output

xxxxxx.clusters

Diagnostics

and matrix formatted

data

xxxxxx.vna

for Netdraw

xxxxxx.place

Data containing actor ID’s and cluster placement

Smacof1b MDS)

61 of 131

Modifying node data by Editing [datafile].vna: �File is read by netdraw. Copy relevant data into excel, edit, and replace

*node data

id type group gender

"0A " 2 1 0

"0B " 2 2 0

"0C " 2 3 0

"0D " 2 4 0

1 1 2 1

2 1 2 2

*Node properties

ID x y color shape size shortlabel active

"0A " -2.01889 -15.04530 16777215 1 30 A TRUE

"0B " -9.41864 15.75047 16777215 1 85 B TRUE

"0C " 2.06574 2.09162 16777215 1 52 C TRUE

"0D " 8.54812 10.10988 16777215 1 79 D TRUE

1 -10.52314 14.16442 16711680 1 10 1 TRUE

2 -8.29999 13.27802 16711680 1 10 2 TRUE

*Tie data

from to any strength actor group between within technology

1 2 1 3 1 0 0

1 4 1 3 1 0 1

1 19 1 3 1 0 1

1 23 1 2 1 0 1

1 26 1 3 1 0 0

2 26 1 3 1 0 0

2 10 1 1 1 0 1

*Tie properties

FROM TO color size headcolor headsize active

"0A " "0B " 12632256 1 12632256 0 TRUE

"0A " "0C " 12632256 9 12632256 0 TRUE

1 2 0 3 0 8 TRUE

1 4 12632256 3 0 8 TRUE

Add new node variable here (e.g. gender)

then add data

61

62 of 131

Adding Node Attributes with Extra File�KliqueFinder will put attributes into vna file

62

File=xxxxxx.ilabel where xxxxxx is the first 6 characters of your data file

xxxxxx.ilabel

xxxxxx.list

Cut and paste into

stanne.ilabel

stanne.list

1 Jacob 1 3 5

2 Stan 1 2 5

3 Linton 1 2 5

4 Charles 1 3 3

5 Mark 1 3 3

6 Tom 2 3 3

7 Ronald 2 3 5

8 Nan 2 1 3

9 Elizabeth 2 1 4

10 Barry 2 2 3

11 Martin 2 3 1

12 Steve 2 3 1

13 PeterC 2 1 5

14 Patrick 1 1 1

15 Katy 1 1 3

16 Kathleen 3 3 3

17 Ove 2 2 2

18 JamesC 5 5 5

19 Robert 4 4 4

20 JamesM 1 2 3 4

21 Noah 4 3 2 1

22 Marijtje 1 2 1 2

23 Ronald 2 1 2 1

24 Harrison 3 1 3 1

25 Duncan 4 1 4 1

10 columns for ID; Skip a space; Name; Node attribute 1-5

NA1 = gender

NA2=grade level

NA3=knowledge level

63 of 131

63

1 Jacob 1 3 5

2 Stan 1 2 5

3 Linton 1 2 5

4 Charles 1 3 3

5 Mark 1 3 3

6 Tom 2 3 3

7 Ronald 2 3 5

8 Nan 2 1 3

9 Elizabeth 2 1 4

10 Barry 2 2 3

11 Martin 2 3 1

12 Steve 2 3 1

13 PeterC 2 1 5

14 Patrick 1 1 1

15 Katy 1 1 3

16 Kathleen 3 3 3

17 Ove 2 2 2

18 JamesC 5 5 5

19 Robert 4 4 4

20 JamesM 1 2 3 4

21 Noah 4 3 2 1

22 Marijtje 1 2 1 2

23 Ronald 2 1 2 1

24 Harrison 3 1 3 1

25 Duncan 4 1 4 1

Select all

Open notepad and paste

64 of 131

64

65 of 131

65

66 of 131

66

67 of 131

Interactive: adding node data

or

67

68 of 131

68

69 of 131

Include Node Data in Image

69

70 of 131

• Each number is a teacher

• G_ indicates grade in which teacher teaches

• Lines connecting two numbers indicate teachers who are close colleagues

Solid lines within subgroups, dashed between

• Circles indicate cohesive subgroups

70

71 of 131

Ripple Plot

  • Overlay talk about technology on social geography of crystallized sociogram
  • Lines indicate talk about technology
  • Size of dot indicates teacher’s use of technology at time 1
  • Ripples indicate increase in use from time 1 to time 2

71

72 of 131

72

73 of 131

Crystalized Sociogram

73

• Each number is a teacher

• shape grade in which teacher teaches

• Lines connecting two numbers indicate teachers who are close colleagues

Solid lines within subgroups, dashed between

• Circles indicate cohesive subgroups

Distance between A

and B reflects history

of school: SES

integration

A

B

D

E

C

74 of 131

Ripple Plot

74

  • Overlay talk about technology on social geography of crystallized sociogram
  • Lines indicate talk about technology
  • Size of node indicates teacher’s use of technology at time 1
  • Ripples indicate increase in use from time 1 to time 2

A

B

D

E

C

75 of 131

Files for Making the Ripple Plot

75

Download to c:\kliqfind

data files for ripple plot (download to c:\kliqfind and then rudata files for ripple plot (download to c:\kliqfind and then run KliqueFinder on ripple.list)

76 of 131

KliqueFinder for Ripple Plot

76

ripple.list

Input data

ripple.ilabel

ripple.xnet

Node data

Close colleagues

Help with tech

Kliqfind.par

Printo

Simulate.par

Parameters

KliqueFinder

Output

ripple.clusters

Diagnostics

and matrix formatted

data

ripple.vna

for Netdraw

ripple.place

Data containing actor ID’s and subgroup placement

77 of 131

KliqueFinder Run: Ripple Plot

77

78 of 131

78

Turn these off

expand

79 of 131

Shape nodes by grade level

79

80 of 131

80

81 of 131

Change lines

81

82 of 131

82

83 of 131

Size nodes by initial expertise

83

84 of 131

84

Add ripples by hand ☹

85 of 131

Modifying Links by Editing [datafile].vna: �File is read by netdraw. Copy relevant data into excel, edit, and replace

*node data

id type group gender

"0A " 2 1 0

"0B " 2 2 0

"0C " 2 3 0

"0D " 2 4 0

1 1 2 1

2 1 2 2

*Node properties

ID x y color shape size shortlabel active

"0A " -2.01889 -15.04530 16777215 1 30 A TRUE

"0B " -9.41864 15.75047 16777215 1 85 B TRUE

"0C " 2.06574 2.09162 16777215 1 52 C TRUE

"0D " 8.54812 10.10988 16777215 1 79 D TRUE

1 -10.52314 14.16442 16711680 1 10 1 TRUE

2 -8.29999 13.27802 16711680 1 10 2 TRUE

*Tie data

from to any strength actor group between within technology

1 2 1 3 1 0 0

1 4 1 3 1 0 1

1 19 1 3 1 0 1

1 23 1 2 1 0 1

1 26 1 3 1 0 0

2 26 1 3 1 0 0

2 10 1 1 1 0 1

*Tie properties

FROM TO color size headcolor headsize active

"0A " "0B " 12632256 1 12632256 0 TRUE

"0A " "0C " 12632256 9 12632256 0 TRUE

1 2 0 3 0 8 TRUE

1 4 12632256 3 0 8 TRUE

Add new node variable here (e.g. gender)

then add data

Add new relation here (e.g. technology)

then add data

85

86 of 131

Modifying Links with Extra File�KliqueFinder will put attributes into vna file

86

File=xxxxxx.xnet where xxxxxx is the first 6 characters of your data file

xxxxxx.xnet

xxxxxx.list

stanne.xnet

stanne.list

1 2 4

19 15 3

22 26 1

Nominator nominee strength of tie

File containing extra network

87 of 131

87

88 of 131

Modifying Links: Interactive – Finicky

88

89 of 131

Interactive Modifying Links

89

90 of 131

Two mode�*Field, S. *Frank, K.A., Schiller, K, Riegle-Crumb, C, and Muller, C. 2006. “Identifying Social Contexts in Affiliation Networks: Preserving the Duality of People and Events. Social Networks 28:97-123. * co first authors.�

1

2

Data source

videovideo : ID: kenfrank@msu.edu PW:kenfrank2014:(1:39:25-1:49:35)

90

91 of 131

Copy homact.list from c:\kliqfind/setups to c:\kliqfind�Example two-mode data

91

92 of 131

Two-mode Data

Actor 1 participates in event 19 at a level of 1

Extent of relation can be binary or weighted

Edgelist

First two rows do not appear in the data –

I put them there to show the format: 10

spaces for each entry

92

New version of KliqueFinder is more flexible

About 10 column widths.

ID’s should be 6 digits or less

Prepping data in excel

Prepping Data in UCINET

Converting data using sas

93 of 131

Two mode �Clusters output

93

94 of 131

Blocked Two-Mode Blocked Network Data

94

95 of 131

Two-mode Crystallized Sociogram

95

96 of 131

Two mode: actors

and events

96

97 of 131

One-Mode Projection vs Two Mode data

98 of 131

Centralization & Centrality in KliqueFinder

  • KliqueFinder produces a measure of Warp.
  • Starts with distances defined by
    • Maximum value in network / observed value
      • E.g. maximum is 4 and a particular tie is 1, then distance is 4/1=4.
    • These are the distances used in the MDS to produce the sociograms (see “running KliqueFinder ppt”)
  • Obtains eigen values
    • within each cluster based on raw data within cluster
    • Between clusters based on 1/density of ties between clusters
      • Density=average value in a given block
  • Warp =sum of positive eigen values/sum of all eigen values
    • Note it does not use the square root of the eigen values (variances are more additive)
  • Output into xxxxxx.bcord (9th element) and into netdraw as node attribute for groups, called “centrality”
  • Centrality for individuals is distance to the center of their cluster (radius).

98

99 of 131

Running on a Large Data File �(more than 1000 actors)

99

If you start the program and it just sits there, it is looking for the

best seed for the first cluster. Seed is 3 actors, but it looks for all

combinations of 3 that share common ties in network.

Intensive, and unnecessary for large data (1st cluster does not

matter so much). To shortcut: change value from 1🡪2. save & run.

100 of 131

Software Challenge� videovideo : ID: kenfrank@msu.edu PW:kenfrank2014 :(2:07:57-2:08:15)

  • Analyze nonpr1.list
    • Evidence of clusters?
    • Performance of algorithm?
  • Replace lines with nonpr2
  • Describe the KliqueFinder algorithm

100

101 of 131

KliqueFinder Applications:�Adding Individual Attributes in SAS:

run KliqueFinder

data file collt1.list

make graph

use ID from other file? Yes:

sas file name: c:\kliqfind\indiv

[be sure to include full path]

id variable: nominator

string variable: gradelev

Save

In sas, run socgramz in the working directory

101

102 of 131

KliqueFinder Applications:�Adding Individual Attributes:

  • Select “Yes” for “User ID (character) from other SAS file?”

102

103 of 131

KliqueFinder Applications:�Adding Individual Attributes:

  • Type the following information in the corresponding boxes

  • Then Click “Save”

103

104 of 131

Choosing an ID Variable

104

105 of 131

With ID based on Grade

105

106 of 131

KliqueFinder Applications:�Replacing Lines

run KliqueFinder

data file collt1.list

make graph

save

retrieve socgramz.sas in the working directory

replace all occurrences of collt1.list with collt2.list

run

106

107 of 131

Opening socgramz.sas

107

108 of 131

Changing lines

108

109 of 131

Change lines to different source

109

110 of 131

New Lines based on Collt2

110

111 of 131

Batch KliqueFinder

111

112 of 131

Basics

  • Program runs KliqueFinder on multiple files
  • Input
    • List of filenames
    • Files containing data
    • BACK UP YOUR DATA FIRST!
  • Output
    • Clustering output (.place, .clusters, vna) for each list file

112

113 of 131

Files

File containing names of data files: testb.txt

Data file: stanne.list

Data file: ffe.list

113

BACK UP YOUR DATA FIRST!

114 of 131

KliqueFinder

  • Browse to directory you want to work in
  • Choose Basic setup and then click Run setup file button.

114

115 of 131

Running Batch Mode

115

File with names of data files

Click here to run as batch

BACK UP DATA FILES BEFORE RUNNING!

116 of 131

Prepping data in excel�videovideo : ID: kenfrank@msu.edu PW:kenfrank2014 :Time: (1:28-1:39)

Name your file xxxxxx.list

e.g., test01.list

Right click

Choose

Formatted text

(space delimited)

116

117 of 131

Prepping Data in UCINET

Navigate to where you want to save:�c:\kliqfind

Navigate to UCINET data

117

118 of 131

Must remove “!” from file.

There may be several

!’s points are there because of

Multiple data sets

118

119 of 131

Converting data using sas�videovideo : ID: kenfrank@msu.edu PW:kenfrank2014 :  :��Time: (2:10:43-2:19)

data one;

infile "badform.list";

input chooser chosen wt;

data two; set one;

file "ready1.list";

if wt ne . then put (chooser chosen wt) (10.);

run;

119

120 of 131

A Priori Clusters

A line with 99999 in the data file indicates in which a priori

cluster an actor is placed.

For example, actor 1 is in a priori cluster 3.

Run repeat2 setup, and then proceed as usual.

Remember to do “new data” setup when done.

KliqueFinder will make pictures

based on a priori clusters

120

121 of 131

Comparison of A Priori Clusters and Identified Solution

Data with a priori cluster assignments

Run as new data

Run as usual then �look at cluster

output

SIMILARITY BETWEEN THE START AND END GROUPS: ACTUAL POSS STANDARDIZED

52. 88. 9.55565

QAP standardized

measure, compare with normal distribution

121

122 of 131

Data Containing Cluster Assignments

1.0 1.0 2.0 1.0 3.0

2.0 2.0 2.0 1.0 3.0

3.0 4.0 1.0 1.0 3.0

4.0 19.0 4.0 1.0 3.0

5.0 23.0 4.0 1.0 3.0

6.0 26.0 2.0 1.0 3.0

17.0 6.0 3.0 1.0 3.0

18.0 8.0 3.0 1.0 3.0

19.0 20.0 3.0 1.0 3.0

20.0 15.0 1.0 1.0 3.0

21.0 12.0 2.0 1.0 3.0

22.0 17.0 4.0 1.0 3.0

23.0 16.0 4.0 1.0 3.0

24.0 27.0 4.0 1.0 3.0

-27.0 28.0 4.0 1.0 3.0

File called stanne.place [datafile.place]

Internal ID User ID Cluster ignore: for simulation only

If first number (internal ID) is negative, this indicates a tagalong –

an actor connected to only one other.

In this case, the last line should be read as the tagee, tagger, and group.

So, actor 28 is connected to only one other actor (27) and is

therefore assigned to actor 27’s cluster, which is cluster 4.

There may be

Slightly different numeric formats

Depending on the version of

KliqueFinder

122

123 of 131

Including Cluster Membership in Influence Model

123

SPSS

DATA LIST / intid 1-10 nominee 11-20 cluster 21-30 simx 31-40 extra 41-50.

BEGIN DATA

1.0 1.0 1.0 1.0 3.0

2.0 2.0 1.0 1.0 3.0

3.0 3.0 1.0 1.0 3.0

4.0 4.0 2.0 1.0 3.0

5.0 5.0 2.0 1.0 3.0

6.0 6.0 2.0 1.0 3.0

END DATA.

DATASET NAME clusters WINDOW=FRONT.

SORT CASES BY nominee(A).

EXECUTE.

MATCH FILES /FILE=yvar1

/FILE='indeg'

/FILE=clusters

/BY nominee.

EXECUTE.

SAS

data clusters;

*groups from KLiqueFinder;

input intid nominator cluster simx extra;

cards;

1.0 1.0 1.0 1.0 3.0

2.0 2.0 1.0 1.0 3.0

3.0 3.0 1.0 1.0 3.0

4.0 4.0 2.0 1.0 3.0

5.0 5.0 2.0 1.0 3.0

6.0 6.0 2.0 1.0 3.0

proc sort data=groups;

by nominator;

data withinfl;

merge yvar2 yvar1 infl expanse cluster attract(rename=(nominee=nominator));

by nominator;

drop nominee _type_ _freq_;

advanced:

run influence model for technology

Identify clusters from talkt2

Include cluster membership the influence model

124 of 131

Simulating data in KF

124

125 of 131

Simulating data in KF

125

2 3 3 6 6 20 20 5 5 1 1 4161 0 50 50 20 20 0 0 0 1 1 1 1 1 50 50 200 200 3 3 6 6 1 1 0 0

12345123451234512345123451234512345123451234512345123451234512345123451234512345123451234512345123451234512345123451234512345123451234512345123451234512345123451234512345123451234512345

numsim,numgrou1,numgrou2,pergrou1,pergrou2,numact1,numact2,maxconn1,maxconn2,maxwt1,maxwt2,sseed,usemarg,indept1,indept2,outdept1,outdept2,regsim,bydense,simrand,obasep,basep2,orangep,rangep2,compsim,OMAXPWW,MAXPWW,OMAXPWB,MAXPWB,OBASEWP,BASEWP2,OWRANGEP,WRANGEP2,xobasep,xbasep2,xorangep,xrangep2

numsim=the number of simulated data sets to create

numgrou1=beginning number of a priori groups to which actors are assigned

numgrou2=ending number of a priori groups to which actors are assigned

pergrou1=beginning number of actors per group (must be < or = to numact1/numgrou1)

pergrou2=ending number of actors per group (must be < or = to numact2/numgrou2)

numact1=beginning number of actors

numact2=ending number of actors

maxconn1=beginning: maximum number of connections an actor can initiate

maxconn2=ending: maximum number of connections an actor can initiate

maxwt1=beginning: maximum weight which can be assigned to a connection

maxwt2=ending: maximum weight which can be assigned to a connection

sseed=seed for generating first set of random data (subsequent seeds derived from this seed) hardcoded at the moment: 416151632

usemarg=1 if you want to use original marginals to simulate new data

indept1 beginning rate of within group exchanges

indept2 ending rate of within group exchanges

outdept1 beginning rate of outside group exchanges

outdept2 ending rate of outside group exchanges

regsim =1 is use regular simulation, =0 if use by rates

bydense=1 if use density, =0 if use proportion of exchanges which fall within groups

simrand =1 if parameters are random, 0 if fixed

compsim way of initiating subgroups in regular application of KF

Density of within group ties

Density of between group ties

126 of 131

Creating data for KF

data one;

infile datalines missover;

input lname $ fname $;

datalines;

run;

data two;

set one;

if lname ne "";

id=_n_;

proc sort;

by id;

data eight; set two;

file "c:\kliqfind\cep016.ilabel";

put id 1-10 @12 fname ;

run;

data three;

infile datalines missover;

input id chosen wt;

*generate data using:

simulate for making group data;

datalines;

run;

proc sort;

by id;

run;

data eightb; set three;

file "c:\kliqfind\cep016.list";

if id ne .;

put (id chosen wt) (10.);

run;

data four;

attrib choname length=$15;

merge two three;

by id;

chooseer=id;

choname=fname;

id=chosen;

drop fname lname;

run;

proc sort;

by id;

data five;

attrib fname length=$15.;

merge four two;

by id;

chooser=id;

if wt ne .;

proc sort data=five out=six;

by choname;

run;

data seven; set six;

by choname;

if choname ne "";

if first.choname then put choname ": " @;

put fname ", " @;

if last.choname then put;

run;

127 of 131

Adding Patches

127

Patch for

Two-mode

Patch for one

-mode

128 of 131

Alternative community detection algorithms

128

129 of 131

Scenarios for the Network analyst

For each of the scenarios below,

identify the theoretical processes at work

write down what model or tool you would employ to evaluate the theory.

describe what data you would collect to apply the model or tool to

describe what estimation procedure/tool you would use.

Sally is concerned that her daughter is experimenting with alcohol and thinks it is because her daughter’s friends are experimenting. Sally wonders generally if adolescents tend to drink more if their friends drink alcohol.

Michael wants to understand the social structure of his synagogue (church). He has an idea that there are certain sets of people who interact with each other, and, if he could understand what those sets of people are, he might better be able to tailor programs of the synagogue to be more effective.

How could Michael use the information above track the diffusion of new beliefs or behaviors in his synagogue?

Pennie wants to know under what conditions one social service agency would allocate resources to another. Is it because they have a history of doing so, they share clients, they deal with similar issues, etc.

What clustering among social service agencies might emerge as a result of the processes above?

129

130 of 131

Core periphery alternative structure

130

131 of 131

Reflection

  • What part is most confusing to you?
    • Why?
    • More than one interpretation?
  • Talk with one other, share