KliqueFinder: �Identifying Clusters in Network Data
Kenneth A. Frank
Michigan State University
Based on:
1
2
KliqueFinder (install latest version in C:\kliqfind)
Overview
3
Visualization of Close Collegial Ties and Best Practices in �the Regional Ravine Network (time 1)�
BEST PRACTICES
Structural hole
Blue lines represent newly formed close colleagues that at time 2. Black lines represent close colleague nominations present at time 1. Nodes with a yellow ring represent Regional Advisory Group members targeted for the intervention.
Visualization of Close Collegial Ties and Best Practices in the Regional Ravine Network (time 2)�
Measure of use of Climate Change in Practice
Questions about Networks
Why Look for Clusters and Graphical Representations
8
Goal: �Identify Patterns in the Network
9
Theory for defining cluster membership
10
Crystallized Sociogram: Friendships Among the French Financial Elite
Lines indicate friendships:
solid within subgroups,
dotted between subgroups.
numbers represent actors
Rgt,Cen,Soc,Non = political parties;
B=Banker, T=treasury; E=Ecole National D’administration
Frank, K.A. & Yasumoto, J. (1998). "Linking Action to Social Structure within a System: Social Capital Within and Between Subgroups." American Journal of Sociology, Volume 104, No 3, pages 642-686
11
Crystallized Sociogram: Clusters in Foodwebs
Krause, A., Frank, K.A., Mason, D.M., Ulanowicz, R.E. and Taylor, W.M. (2003). "Compartments exposed in food-web structure." Nature 426:282-285
12
Data Input
13
Old (10 spaces for each)
New: flexible columns,
Same results
File name must be less than 20 character. Best if file name is six characters
followed by .list: xxxxxx.list . For example stanne.list
Actor 1 interacts with actor 2 at a level of 3
Extent of relation can be binary or weighted
ID’s should be 6 digits or less
Data
Actor 1 interacts with actor 2 at a level of 3
Extent of relation can be binary or weighted
Best if file name is six characters
followed by .list.
xxxxxx.list
For example stanne.list
New version of KliqueFinder is more flexible
About 10 column widths.
ID’s should be 6 digits or less
Edgelist
First two rows do not appear in the data –
I put them there to show the format: 10
spaces for each entry
14
Steps for Finding Clusters���
1) Determine criterion for defining clusters
2) Maximize criterion
3) Examine evidence of clusters
4) Evaluate performance of the algorithm
5) Interpret clusters
commonality of attributes
focal experiences
subsequent behavior
15
Step 1) Criteria for Determining Group Membership
Structural Equivalence:
Factor analyze sociomatrix (Katz & Kahn)
iteratively rearrange and revalue rows and columns (CONCORR -- White el al., 1976)
Cohesion
utilize fixed criteria (e.g., must be connected to at least k others in clusters, or must be minimal path length from k others, etc).
use flexible criterion -- preference relative to cluster sizes and number of ties:
16
Model Based Cohesion
Wii’=1 if tie between actors i and i’, 0 otherwise
Samecluster ii’ = 1 if actors i and i’ are members of the same cluster,
0 otherwise.
Then θ1 represents cluster salience:
So ...... Maximize θ1 (odds ratio)
17
Same clusterii’
Odds Ratio for Association Between Common Cluster Membership and Relation Between Actors
18
Tie or relation occurring
Cluster
membership
Cluster
Cluster
Cluster
Cluster
Cluster
Odds ratio=(AD)/(BC)= Absence of relation outside of cluster * Presence of relation within cluster
Presence of relation outside of cluster * Absence of relation within cluster
Step 2: Maximizing Criterion
1) Find a cluster seed (3 actors who interact with each other, and with similar others)
2) Add to the cluster to maximize θ1 until you cannot do any more
3) Start new cluster with new seed
4) Shuffle between existing clusters
5) Make new clusters as necessary, dissolve existing ones as necessary.
19
KliqueFinder Algorithm: Phase I
Find cluster seed of 2 or 3
Identify single move that most increases objective function θ1
Does move
increase function?
yes
Reassign actor that makes best move
No
If assignment moves actor out of a cluster of 3, reassign remaining 2 to next best clusters
For finding best cluster seed:
1) can only choose from unaffiliated
actors
2) Each actor can only be a seed once
Initialize: assign each actor to own cluster
Adding clusters
removing clusters
KliqueFinder Algorithm:� Phases II and III
KliqueFinder in R
22
KliqueFinder in R
23
24
# https://github.com/jtbates/kliqfindr
# https://github.com/r-lib/devtools
###(if you can not intall the package, please see the webpage above for more information)
#run lines 4-8 then stop
install.packages('devtools')
install.packages("igraphdata")
install.packages("igraph")
install.packages("igraphdata")
install.packages('RColorBrewer')
#run line 10 then stop
devtools::install_github("jtbates/kliqfindr")
#run line 13
library(kliqfindr)
# set working directory
getwd()
setwd("C:/Users/kenfrank/OneDrive - Michigan State University/H Drive/my web page")
getwd()
# add your list file to your working directory
test <- winkliq_run('sem020.list')
## Note here: you do not need to read in the list file as a dataframe
test
## see the result after running Kliqfindr
head(test$place)
## see the column named as "actor" and "subgroup" - which actor is assigned to which subgroup
groupbykf <- test$place[,c("actor", "subgroup")]
groupbykf
groupbykf <- groupbykf[order(groupbykf$actor),]
groupbykf
## groupbykf contains cluster information based on kliqfinder algorithm
## groupbykf is sorted by node id
# now we want to use igraph to plot this network
library(igraph)
# read in the data from the previous example
ties <- read.csv("sem020.csv", header=T, as.is=T)
nodes <- read.csv("sem020_node.csv", header=T, as.is=T)
# examine the data
head(ties)
head(nodes)
# check whether there are duplicates rows in the tie and node data files
nrow(ties); nrow(unique(ties[,c("sender","receiver")]))
nrow(nodes); length(unique(nodes$node))
#library(igraph)
# turn the data into igraph objects
network <- graph_from_data_frame(d=ties, vertices=nodes, directed=T)
# visualize this network with different colors indicating group members identified by klifindr
V(network)$community <- groupbykf$subgroup
plot(network, edge.arrow.size=.2, vertex.color=V(network)$community,
vertex.label=V(network)$fname,vertex.size=5*V(network)$attr1,layout=layout_with_fr)
Running KliqueFinder
25
KliqueFinder
26
KliqueFinder
27
Run Analysis
Data file
28
New Version of Data Input more Flexible
29
Old (10 spaces for each)
New: flexible columns,
Same results
File name must be less than 20 characters
ID’s should be 6 digits or less
Actor 1 interacts with actor 2 at a level of 3
Extent of relation can be binary or weighted
View Clusters Output
30
Blocked Network Data
N Group And Actor Id
24 |AAAA|BBBBBB|CCCCCCCC|DDDDDD|
| | | | |
| 2 1|221 1| 11 2|111122|
Group ID|7445|612214|98133560|796037|
------------+----+------+--------+------+
1 A 7|A213|......|........|...1..|
1 A 24|4A3.|......|.4......|......|
1 A 4|33A.|......|........|......|
1 A 15|433A|......|........|......|
------------+----+------+--------+------+
2 B 26|.2..|B443..|........|......|
2 B 21|.1..|4B....|...4....|....2.|
2 B 12|....|4.B...|........|......|
2 B 2|....|33.B..|........|...1..|
2 B 1|..3.|3..3B.|........|.3..2.|
2 B 14|....|....1B|........|......|
------------+----+------+--------+------+
3 C 9|....|......|C...3.33|.3....|
3 C 8|.4..|..4...|.C.4..4.|4.....|
3 C 11|....|......|33C.4.3.|..4...|
3 C 13|.4..|.4....|444C....|......|
3 C 3|3...|.4....|4.44C...|......|
3 C 5|.1..|.....4|3.2.3C..|......|
3 C 6|....|......|444..4C4|......|
3 C 20|....|......|3..3.44C|......|
------------+----+------+--------+------+
4 D 17|.1..|......|.1......|D.1...|
4 D 19|....|......|4.3.....|3D4...|
4 D 16|....|......|4..4...4|44D...|
4 D 10|..3.|...1..|........|...D3.|
4 D 23|....|.3....|........|.343D.|
4 D 27|.1..|.1....|........|.3..3D|
θ1 =1.1738
31
Step 3) Examine evidence of clusters�
1) randomly redistribute relations
2) apply algorithm
3) record value of odds ratio and θ1
4) repeat 1000 times to generate distribution
5) use mean of distribution as baseline for comparison
32
Randomly Redistributing Ties
33
Apply Algorithm to Random Data,
34
θ1=.81822
Monte Carlo Sampling Distribution�video: (1:06:35-1:18:50) ID: kenfrank@msu.edu PW:kenfrank2014
Output in sampdist.dat
θ1=Log odds/2 Odds Ratio
Set up sampling. Remember
to do “new data” set up when done
To prepare for next analysis
Indicate simulate data
Data can include weights
35
Code for Reading in Sample Distribution Data
GET DATA
/TYPE=TXT
/FILE="C:\KLIQFIND\sampdist.dat"
/FIXCASE=1
/ARRANGEMENT=FIXED
/FIRSTCASE=1
/IMPORTCASE=ALL
/VARIABLES=
/1 theta1 0-29 F30.10
oddsratio 30-59 F30.10
samplesize 60-89 F30.10.
CACHE.
EXECUTE.
DATASET NAME DataSet9 WINDOW=FRONT.
DATASET ACTIVATE DataSet9.
GRAPH
/HISTOGRAM=theta1.
spss
title "Sampling distribution for theta1";
data one;
infile "sampdist.dat" missover;
Input theta1 odds1;
proc univariate plot;
var theta1;
SAS
36
Stata
*This command imports the data file
import delimited C:\KLIQFIND\sampdist.dat, delimiter(" ", asstring)
*These commands perform data management:
drop v1
rename v2 theta1
rename v3 oddsratio
rename v4 samplesize
*This command plots histogram for theta1:
hist theta1,freq
Comparison of Sampling Distributions
37
Distribution of θ1base From Application of the Algorithm to Data Simulated Without Regard for Cluster or Subgroup Membership
Observed value:
1.1738
38
Sampling Distribution Parameters
Edit simulation parameters.
First element is number of replications
39
Must keep # of reps in first 5 columns
Approximate p-value Based on Previous Simulations
PREDICTED THETA (1 base) BASED ON SIMULATIONS.
VALUE BASED ON UNWEIGHTED DATA.
0.76985
ESTIMATE OF THETA (1 subgroup processes)
0.40397 (total-predicted=evidence of groups): 1.1738-.76985=.40397
THE TOTAL THETA1 IS:
1.1738
APPROXIMATE TEST OF CONCENTRATION OF TIES
WITHIN SUBGROUPS BASED ON
SIZE OF THETA1 subgroup processes:
THETA1 |
SUBGROUP | APPROX | APPROX
PROCESSES| LRT | P-VALUE
0.40 34.82 0.00
Reject null hypotheses of no clusters:
H0:Θ1 subgroup processes =0
40
Step 4) Evaluating the Performance of the Algorithm : Did the Algorithm Recover the Correct Clusters?
41
Output for Recovery of clusters
PREDICTED ACCURACY: LOG ODDS OF COMMON SUBGROUP
MEMBERSHIP, + OR - .5734 (FOR A 95% CI)
1.4989
The Log odds applies to the following table:
OBSERVED SUBGROUP
DIFFERENT SAME
___________________
| | |
DIFFERENT | A | B |
KNOWN | | |
SUBGROUP |--------|--------|
| | |
SAME | C | D |
| | |
-------------------
THE LOGODDS TRANSLATES TO AN ODDS RATIO OF
4.4766
WHICH INDICATES THE INCREASE IN THE ODDS
THAT KLIQUEFINDER WILL ASSIGN TWO ACTORS TO
THE SAME SUBGROUP IF THEY ARE TRULY IN THE
IN THE SAME SUBGROUP.
42
Specific accuracy for a given data set not known, results predicted from thousands of simulations – see next slide
Odds of Recovery (Toy Example)
| 1 | 2 | 3 | 4 | 5 | 6 |
1 | | 1 | 1 | 0 | 1 | 0 |
2 | 1 | | 0 | 0 | 0 | 0 |
3 | 1 | 1 | | 0 | 0 | 1 |
4 | 0 | 1 | 1 | | 1 | 1 |
5 | 0 | 0 | 0 | 0 | | 1 |
6 | 1 | 0 | 0 | 1 | 1 | |
Simulated data with known clusters
| 1 | 2 | 3 | 4 | 5 | 6 |
1 | | 1 | 1 | 0 | 1 | 0 |
2 | 1 | | 0 | 0 | 0 | 0 |
3 | 1 | 1 | | 0 | 0 | 1 |
4 | 0 | 1 | 1 | | 1 | 1 |
5 | 0 | 0 | 0 | 0 | | 1 |
6 | 1 | 0 | 0 | 1 | 1 | |
OBSERVED CLUSTER
DIFFERENT SAME
___________________
| | |
DIFFERENT | | |
KNOWN | A (6)| B (3)|
CLUSTER |--------|--------|
| | |
SAME | | |
| C (2)| D (4)|
-------------------
Observed clusters identified by KliqueFinder
Missassignment of actor 4 contributes 3 to cell B and 2 to cell C
Cell D: 4 pairs correctly assigned to same cluster:
(1,2; 1,3; 2,3; 5,6)
Cell A: 6 pairs correctly assigned to different clusters:
1,5; 2,5; 3,5; 1,6; 2,6; 3,6
Odds of recovery =(AD)/(BC)= 6x4/(3x2)=4.00
Make Sociogram in Netdraw�video : (1:01:00-1:06:22): �ID: kenfrank@msu.edu PW:kenfrank2014�
44
Sometimes Netdraw can’t find file�retrieve manually
45
Modifying Image in Netdraw
46
47
Removing Subgroup Nodes
Remove
group
nodes
48
49
N Group And Actor Id
24 |AAAA|BBBBBB|CCCCCCCC|DDDDDD|
| | | | |
| 2 1|221 1| 11 2|111122|
Group ID|7445|612214|98133560|796037|
------------+----+------+--------+------+
1 A 7|A213|......|........|...1..|
1 A 24|4A3.|......|.4......|......|
1 A 4|33A.|......|........|......|
1 A 15|433A|......|........|......|
------------+----+------+--------+------+
2 B 26|.2..|B443..|........|......|
2 B 21|.1..|4B....|...4....|....2.|
2 B 12|....|4.B...|........|......|
2 B 2|....|33.B..|........|...1..|
2 B 1|..3.|3..3B.|........|.3..2.|
2 B 14|....|....1B|........|......|
------------+----+------+--------+------+
3 C 9|....|......|C...3.33|.3....|
3 C 8|.4..|..4...|.C.4..4.|4.....|
3 C 11|....|......|33C.4.3.|..4...|
3 C 13|.4..|.4....|444C....|......|
3 C 3|3...|.4....|4.44C...|......|
3 C 5|.1..|.....4|3.2.3C..|......|
3 C 6|....|......|444..4C4|......|
3 C 20|....|......|3..3.44C|......|
------------+----+------+--------+------+
4 D 17|.1..|......|.1......|D.1...|
4 D 19|....|......|4.3.....|3D4...|
4 D 16|....|......|4..4...4|44D...|
4 D 10|..3.|...1..|........|...D3.|
4 D 23|....|.3....|........|.343D.|
4 D 27|.1..|.1....|........|.3..3D|
Density = 4/(4x8)=1/8
Kliqfinder uses
Density =4/(4x5)=.20 because
maximum number of nominations is 5
Data used for multidimensional
Scaling within subgroups.
Distance=
maximum value/cell entry
e.g., maximum value is 4,
So a tie of 2 🡪 4/2=2, distance of 2
DIRECT ASSOCIATIONS
GROUP 1 2 3 4
LABEL A B C D
N 4 6 8 6
GROUP
1 2.42 0.00 0.20 0.05
2 0.25 1.07 0.13 0.27
3 0.38 0.40 2.40 0.28
4 0.21 0.17 0.67 1.17
In xxxxxx.clusters
Distance in multidimensional
Scaling between subgroups
=maximum value /density
cohesion
Structural similarity
video: (1:19:15-1:23:40)) ID: kenfrank@msu.edu PW:kenfrank2014
50
Choosing lines: Groups
51
Confidentiality/Ethical issues in Collecting Network Data��
52
videovideo : (1:23:41-1:28)ID: kenfrank@msu.edu PW:kenfrank2014
The SRI/KLiqueFinder Solution to confidentiality: aggregate to subgroups�
1) Provide information about who is in which cluster as well as information regarding the resources embedded in each cluster. Resources could be information, expertise, material resources, etc.
Benefit: reveals location of resources relative to social; structure
Protection: does not reveal specific responses because all information is at the cluster level.
2) Provide locations from in a sociogram unique for each respondent, indicating where that person is located (“you are here”). But figure does not include the lines from a sociogram, so respondents cannot infer others’ responses.
Benefit: Respondents then use this as a guide to individual behavior for identifying further resources or information.
Protection: Specific responses of others not revealed, so confidentiality preserved.
53
Choosing Lines: Actor Level Within
55
Choosing Lines: Actor Level
Remove
group
nodes
56
Choosing Lines: Actor Level Between
57
Choosing Lines: Group Level
58
Modifying the Image: �Adding Node Data or Relations�videovideo : ID: kenfrank@msu.edu PW:kenfrank2014 : (1:49:35-2:07:48)�
http://www.analytictech.com/ucinet/download.htm
http://faculty.ucr.edu/~hanneman/nettext/C4_netdraw.html#data
59
Files for KliqueFinder
60
xxxxxx.list
Input data
xxxxxx.ilabel
xxxxxx.xnet
Node data
Network data
Alternative
network data
Kliqfind.par
Printo
Simulate.par
Parameters
KliqueFinder
Output
xxxxxx.clusters
Diagnostics
and matrix formatted
data
xxxxxx.vna
for Netdraw
xxxxxx.place
Data containing actor ID’s and cluster placement
Smacof1b MDS)
Modifying node data by Editing [datafile].vna: �File is read by netdraw. Copy relevant data into excel, edit, and replace
*node data
id type group gender
"0A " 2 1 0
"0B " 2 2 0
"0C " 2 3 0
"0D " 2 4 0
1 1 2 1
2 1 2 2
*Node properties
ID x y color shape size shortlabel active
"0A " -2.01889 -15.04530 16777215 1 30 A TRUE
"0B " -9.41864 15.75047 16777215 1 85 B TRUE
"0C " 2.06574 2.09162 16777215 1 52 C TRUE
"0D " 8.54812 10.10988 16777215 1 79 D TRUE
1 -10.52314 14.16442 16711680 1 10 1 TRUE
2 -8.29999 13.27802 16711680 1 10 2 TRUE
*Tie data
from to any strength actor group between within technology
1 2 1 3 1 0 0
1 4 1 3 1 0 1
1 19 1 3 1 0 1
1 23 1 2 1 0 1
1 26 1 3 1 0 0
2 26 1 3 1 0 0
2 10 1 1 1 0 1
*Tie properties
FROM TO color size headcolor headsize active
"0A " "0B " 12632256 1 12632256 0 TRUE
"0A " "0C " 12632256 9 12632256 0 TRUE
1 2 0 3 0 8 TRUE
1 4 12632256 3 0 8 TRUE
Add new node variable here (e.g. gender)
then add data
61
Adding Node Attributes with Extra File�KliqueFinder will put attributes into vna file
62
File=xxxxxx.ilabel where xxxxxx is the first 6 characters of your data file
xxxxxx.ilabel
xxxxxx.list
Cut and paste into
stanne.ilabel
stanne.list
1 Jacob 1 3 5
2 Stan 1 2 5
3 Linton 1 2 5
4 Charles 1 3 3
5 Mark 1 3 3
6 Tom 2 3 3
7 Ronald 2 3 5
8 Nan 2 1 3
9 Elizabeth 2 1 4
10 Barry 2 2 3
11 Martin 2 3 1
12 Steve 2 3 1
13 PeterC 2 1 5
14 Patrick 1 1 1
15 Katy 1 1 3
16 Kathleen 3 3 3
17 Ove 2 2 2
18 JamesC 5 5 5
19 Robert 4 4 4
20 JamesM 1 2 3 4
21 Noah 4 3 2 1
22 Marijtje 1 2 1 2
23 Ronald 2 1 2 1
24 Harrison 3 1 3 1
25 Duncan 4 1 4 1
10 columns for ID; Skip a space; Name; Node attribute 1-5
NA1 = gender
NA2=grade level
NA3=knowledge level
63
1 Jacob 1 3 5
2 Stan 1 2 5
3 Linton 1 2 5
4 Charles 1 3 3
5 Mark 1 3 3
6 Tom 2 3 3
7 Ronald 2 3 5
8 Nan 2 1 3
9 Elizabeth 2 1 4
10 Barry 2 2 3
11 Martin 2 3 1
12 Steve 2 3 1
13 PeterC 2 1 5
14 Patrick 1 1 1
15 Katy 1 1 3
16 Kathleen 3 3 3
17 Ove 2 2 2
18 JamesC 5 5 5
19 Robert 4 4 4
20 JamesM 1 2 3 4
21 Noah 4 3 2 1
22 Marijtje 1 2 1 2
23 Ronald 2 1 2 1
24 Harrison 3 1 3 1
25 Duncan 4 1 4 1
Select all
Open notepad and paste
64
65
66
Interactive: adding node data
or
67
68
Include Node Data in Image
69
• Each number is a teacher
• G_ indicates grade in which teacher teaches
• Lines connecting two numbers indicate teachers who are close colleagues
Solid lines within subgroups, dashed between
• Circles indicate cohesive subgroups
70
Ripple Plot
71
Frank, K. A. and Zhao, Y. (2005). "Subgroups as a Meso-Level Entity in the Social Organization of Schools." Chapter 10, pages 279-318. Book honoring Charles Bidwell's retirement, edited by Larry Hedges and Barbara Schneider. New York: Sage publications.
72
Crystalized Sociogram
73
• Each number is a teacher
• shape grade in which teacher teaches
• Lines connecting two numbers indicate teachers who are close colleagues
Solid lines within subgroups, dashed between
• Circles indicate cohesive subgroups
Distance between A
and B reflects history
of school: SES
integration
A
B
D
E
C
Ripple Plot
74
A
B
D
E
C
Files for Making the Ripple Plot
75
Download to c:\kliqfind
data files for ripple plot (download to c:\kliqfind and then rudata files for ripple plot (download to c:\kliqfind and then run KliqueFinder on ripple.list)
KliqueFinder for Ripple Plot
76
ripple.list
Input data
ripple.ilabel
ripple.xnet
Node data
Close colleagues
Help with tech
Kliqfind.par
Printo
Simulate.par
Parameters
KliqueFinder
Output
ripple.clusters
Diagnostics
and matrix formatted
data
ripple.vna
for Netdraw
ripple.place
Data containing actor ID’s and subgroup placement
KliqueFinder Run: Ripple Plot
77
78
Turn these off
expand
Shape nodes by grade level
79
80
Change lines
81
82
Size nodes by initial expertise
83
84
Add ripples by hand ☹
Modifying Links by Editing [datafile].vna: �File is read by netdraw. Copy relevant data into excel, edit, and replace
*node data
id type group gender
"0A " 2 1 0
"0B " 2 2 0
"0C " 2 3 0
"0D " 2 4 0
1 1 2 1
2 1 2 2
*Node properties
ID x y color shape size shortlabel active
"0A " -2.01889 -15.04530 16777215 1 30 A TRUE
"0B " -9.41864 15.75047 16777215 1 85 B TRUE
"0C " 2.06574 2.09162 16777215 1 52 C TRUE
"0D " 8.54812 10.10988 16777215 1 79 D TRUE
1 -10.52314 14.16442 16711680 1 10 1 TRUE
2 -8.29999 13.27802 16711680 1 10 2 TRUE
*Tie data
from to any strength actor group between within technology
1 2 1 3 1 0 0
1 4 1 3 1 0 1
1 19 1 3 1 0 1
1 23 1 2 1 0 1
1 26 1 3 1 0 0
2 26 1 3 1 0 0
2 10 1 1 1 0 1
*Tie properties
FROM TO color size headcolor headsize active
"0A " "0B " 12632256 1 12632256 0 TRUE
"0A " "0C " 12632256 9 12632256 0 TRUE
1 2 0 3 0 8 TRUE
1 4 12632256 3 0 8 TRUE
Add new node variable here (e.g. gender)
then add data
Add new relation here (e.g. technology)
then add data
85
Modifying Links with Extra File�KliqueFinder will put attributes into vna file
86
File=xxxxxx.xnet where xxxxxx is the first 6 characters of your data file
xxxxxx.xnet
xxxxxx.list
stanne.xnet
stanne.list
1 2 4
19 15 3
22 26 1
Nominator nominee strength of tie
File containing extra network
87
Modifying Links: Interactive – Finicky
88
Interactive Modifying Links
89
Two mode�*Field, S. *Frank, K.A., Schiller, K, Riegle-Crumb, C, and Muller, C. 2006. “Identifying Social Contexts in Affiliation Networks: Preserving the Duality of People and Events. Social Networks 28:97-123. * co first authors.��
1
2
Data source
videovideo : ID: kenfrank@msu.edu PW:kenfrank2014:(1:39:25-1:49:35)
90
Copy homact.list from c:\kliqfind/setups to c:\kliqfind�Example two-mode data�
91
Two-mode Data
Actor 1 participates in event 19 at a level of 1
Extent of relation can be binary or weighted
Edgelist
First two rows do not appear in the data –
I put them there to show the format: 10
spaces for each entry
92
New version of KliqueFinder is more flexible
About 10 column widths.
ID’s should be 6 digits or less
Two mode �Clusters output�
93
Blocked Two-Mode Blocked Network Data
94
Two-mode Crystallized Sociogram
95
Two mode: actors
and events
96
One-Mode Projection vs Two Mode data
Centralization & Centrality in KliqueFinder
98
Running on a Large Data File �(more than 1000 actors)
99
If you start the program and it just sits there, it is looking for the
best seed for the first cluster. Seed is 3 actors, but it looks for all
combinations of 3 that share common ties in network.
Intensive, and unnecessary for large data (1st cluster does not
matter so much). To shortcut: change value from 1🡪2. save & run.
Software Challenge� videovideo : ID: kenfrank@msu.edu PW:kenfrank2014 :(2:07:57-2:08:15)�
100
KliqueFinder Applications:�Adding Individual Attributes in SAS:
run KliqueFinder
data file collt1.list
make graph
use ID from other file? Yes:
sas file name: c:\kliqfind\indiv
[be sure to include full path]
id variable: nominator
string variable: gradelev
Save
In sas, run socgramz in the working directory
101
KliqueFinder Applications:�Adding Individual Attributes:
102
KliqueFinder Applications:�Adding Individual Attributes:
103
Choosing an ID Variable
104
With ID based on Grade
105
KliqueFinder Applications:�Replacing Lines
run KliqueFinder
data file collt1.list
make graph
save
retrieve socgramz.sas in the working directory
replace all occurrences of collt1.list with collt2.list
run
106
Opening socgramz.sas
107
Changing lines
108
Change lines to different source
109
New Lines based on Collt2
110
Batch KliqueFinder
111
Basics
112
Files
File containing names of data files: testb.txt
Data file: stanne.list
Data file: ffe.list
113
BACK UP YOUR DATA FIRST!
KliqueFinder
114
Running Batch Mode
115
File with names of data files
Click here to run as batch
BACK UP DATA FILES BEFORE RUNNING!
Prepping data in excel�videovideo : ID: kenfrank@msu.edu PW:kenfrank2014 :Time: (1:28-1:39)
Name your file xxxxxx.list
e.g., test01.list
Right click
Choose
Formatted text
(space delimited)
116
Prepping Data in UCINET
Navigate to where you want to save:�c:\kliqfind
Navigate to UCINET data
117
Must remove “!” from file.
There may be several
!’s points are there because of
Multiple data sets
118
Converting data using sas�videovideo : ID: kenfrank@msu.edu PW:kenfrank2014 : :��Time: (2:10:43-2:19)�
data one;
infile "badform.list";
input chooser chosen wt;
data two; set one;
file "ready1.list";
if wt ne . then put (chooser chosen wt) (10.);
run;
119
A Priori Clusters
A line with 99999 in the data file indicates in which a priori
cluster an actor is placed.
For example, actor 1 is in a priori cluster 3.
Run repeat2 setup, and then proceed as usual.
Remember to do “new data” setup when done.
KliqueFinder will make pictures
based on a priori clusters
120
Comparison of A Priori Clusters and Identified Solution
Data with a priori cluster assignments
Run as new data
Run as usual then �look at cluster
output
SIMILARITY BETWEEN THE START AND END GROUPS: ACTUAL POSS STANDARDIZED
52. 88. 9.55565
QAP standardized
measure, compare with normal distribution
121
Data Containing Cluster Assignments
1.0 1.0 2.0 1.0 3.0
2.0 2.0 2.0 1.0 3.0
3.0 4.0 1.0 1.0 3.0
4.0 19.0 4.0 1.0 3.0
5.0 23.0 4.0 1.0 3.0
6.0 26.0 2.0 1.0 3.0
17.0 6.0 3.0 1.0 3.0
18.0 8.0 3.0 1.0 3.0
19.0 20.0 3.0 1.0 3.0
20.0 15.0 1.0 1.0 3.0
21.0 12.0 2.0 1.0 3.0
22.0 17.0 4.0 1.0 3.0
23.0 16.0 4.0 1.0 3.0
24.0 27.0 4.0 1.0 3.0
-27.0 28.0 4.0 1.0 3.0
File called stanne.place [datafile.place]
Internal ID User ID Cluster ignore: for simulation only
If first number (internal ID) is negative, this indicates a tagalong –
an actor connected to only one other.
In this case, the last line should be read as the tagee, tagger, and group.
So, actor 28 is connected to only one other actor (27) and is
therefore assigned to actor 27’s cluster, which is cluster 4.
There may be
Slightly different numeric formats
Depending on the version of
KliqueFinder
122
Including Cluster Membership in Influence Model
123
SPSS
DATA LIST / intid 1-10 nominee 11-20 cluster 21-30 simx 31-40 extra 41-50.
BEGIN DATA
1.0 1.0 1.0 1.0 3.0
2.0 2.0 1.0 1.0 3.0
3.0 3.0 1.0 1.0 3.0
4.0 4.0 2.0 1.0 3.0
5.0 5.0 2.0 1.0 3.0
6.0 6.0 2.0 1.0 3.0
END DATA.
DATASET NAME clusters WINDOW=FRONT.
SORT CASES BY nominee(A).
EXECUTE.
MATCH FILES /FILE=yvar1
/FILE='indeg'
/FILE=clusters
/BY nominee.
EXECUTE.
SAS
data clusters;
*groups from KLiqueFinder;
input intid nominator cluster simx extra;
cards;
1.0 1.0 1.0 1.0 3.0
2.0 2.0 1.0 1.0 3.0
3.0 3.0 1.0 1.0 3.0
4.0 4.0 2.0 1.0 3.0
5.0 5.0 2.0 1.0 3.0
6.0 6.0 2.0 1.0 3.0
proc sort data=groups;
by nominator;
data withinfl;
merge yvar2 yvar1 infl expanse cluster attract(rename=(nominee=nominator));
by nominator;
drop nominee _type_ _freq_;
advanced:
run influence model for technology
Identify clusters from talkt2
Include cluster membership the influence model
Simulating data in KF
124
Simulating data in KF
125
2 3 3 6 6 20 20 5 5 1 1 4161 0 50 50 20 20 0 0 0 1 1 1 1 1 50 50 200 200 3 3 6 6 1 1 0 0
12345123451234512345123451234512345123451234512345123451234512345123451234512345123451234512345123451234512345123451234512345123451234512345123451234512345123451234512345123451234512345
numsim,numgrou1,numgrou2,pergrou1,pergrou2,numact1,numact2,maxconn1,maxconn2,maxwt1,maxwt2,sseed,usemarg,indept1,indept2,outdept1,outdept2,regsim,bydense,simrand,obasep,basep2,orangep,rangep2,compsim,OMAXPWW,MAXPWW,OMAXPWB,MAXPWB,OBASEWP,BASEWP2,OWRANGEP,WRANGEP2,xobasep,xbasep2,xorangep,xrangep2
numsim=the number of simulated data sets to create
numgrou1=beginning number of a priori groups to which actors are assigned
numgrou2=ending number of a priori groups to which actors are assigned
pergrou1=beginning number of actors per group (must be < or = to numact1/numgrou1)
pergrou2=ending number of actors per group (must be < or = to numact2/numgrou2)
numact1=beginning number of actors
numact2=ending number of actors
maxconn1=beginning: maximum number of connections an actor can initiate
maxconn2=ending: maximum number of connections an actor can initiate
maxwt1=beginning: maximum weight which can be assigned to a connection
maxwt2=ending: maximum weight which can be assigned to a connection
sseed=seed for generating first set of random data (subsequent seeds derived from this seed) hardcoded at the moment: 416151632
usemarg=1 if you want to use original marginals to simulate new data
indept1 beginning rate of within group exchanges
indept2 ending rate of within group exchanges
outdept1 beginning rate of outside group exchanges
outdept2 ending rate of outside group exchanges
regsim =1 is use regular simulation, =0 if use by rates
bydense=1 if use density, =0 if use proportion of exchanges which fall within groups
simrand =1 if parameters are random, 0 if fixed
compsim way of initiating subgroups in regular application of KF
Density of within group ties
Density of between group ties
Creating data for KF
data one;
infile datalines missover;
input lname $ fname $;
datalines;
run;
data two;
set one;
if lname ne "";
id=_n_;
proc sort;
by id;
data eight; set two;
file "c:\kliqfind\cep016.ilabel";
put id 1-10 @12 fname ;
run;
data three;
infile datalines missover;
input id chosen wt;
*generate data using:
simulate for making group data;
datalines;
run;
proc sort;
by id;
run;
data eightb; set three;
file "c:\kliqfind\cep016.list";
if id ne .;
put (id chosen wt) (10.);
run;
data four;
attrib choname length=$15;
merge two three;
by id;
chooseer=id;
choname=fname;
id=chosen;
drop fname lname;
run;
proc sort;
by id;
data five;
attrib fname length=$15.;
merge four two;
by id;
chooser=id;
if wt ne .;
proc sort data=five out=six;
by choname;
run;
data seven; set six;
by choname;
if choname ne "";
if first.choname then put choname ": " @;
put fname ", " @;
if last.choname then put;
run;
Adding Patches
127
Patch for
Two-mode
Patch for one
-mode
Alternative community detection algorithms
128
Scenarios for the Network analyst
For each of the scenarios below,
identify the theoretical processes at work
write down what model or tool you would employ to evaluate the theory.
describe what data you would collect to apply the model or tool to
describe what estimation procedure/tool you would use.
Sally is concerned that her daughter is experimenting with alcohol and thinks it is because her daughter’s friends are experimenting. Sally wonders generally if adolescents tend to drink more if their friends drink alcohol.
Michael wants to understand the social structure of his synagogue (church). He has an idea that there are certain sets of people who interact with each other, and, if he could understand what those sets of people are, he might better be able to tailor programs of the synagogue to be more effective.
How could Michael use the information above track the diffusion of new beliefs or behaviors in his synagogue?
Pennie wants to know under what conditions one social service agency would allocate resources to another. Is it because they have a history of doing so, they share clients, they deal with similar issues, etc.
What clustering among social service agencies might emerge as a result of the processes above?
129
Core periphery alternative structure
130
Reflection