1 of 44

ERGM et al

MGT 780 SPRING 2022

STEVE BORGATTI

(C) 2022 STEPHEN P BORGATTI

1

25-Apr-22

26 APR

2 of 44

Agenda

  • Problem statement
  • ERGM
  • Relation to QAP
  • Overview of other stochastic models

(C) 2022 STEPHEN P BORGATTI

2

25-Apr-22

3 of 44

Problem statement

  • Model structure of networks

  • Presumably related to micro mechanisms governing tie formation

(C) 2022 STEPHEN P BORGATTI

3

25-Apr-22

4 of 44

Tie dependencies

  • Some of these mechanisms can be expressed as tie dependencies
    • Presence/absence of one tie affects probability of another tie
  • Reciprocity effect means that when i🡪j, there is an increased chance that i🡨j
    • In short, prob(xji|xij > prob(xji)
    • There are more mutual dyads than expect by chance
  • Transitivity effect means that �presence of i🡪j and j🡪k �increases chance of i🡪k
    • More triangles than expectedc

(C) 2022 STEPHEN P BORGATTI

4

25-Apr-22

5 of 44

Node attributes & dyadic covariates

  • Node attributes can also be incorporated
  • Probability of a tie involving node j is higher to the extent that node j is wealthy
  • Probability of a tie between two nodes is higher if they have the same gender
  • Probability of a tie between two nodes is higher if they are physically closer together

(C) 2022 STEPHEN P BORGATTI

5

25-Apr-22

6 of 44

Model assumptions

  • We assume the observed network is the result of a set of social processes that favor the formation of certain ties and hinders the formation of others.
    • The process maximized an unseen objective function by forming / not forming certain ties
  • The objective function consists of a set of parameters that indicate strength and direction of certain tendencies, e.g.,
    • A mild tendency toward mutual (reciprocated) dyads
    • A strong tendency toward closed triangles

(C) 2022 STEPHEN P BORGATTI

6

25-Apr-22

7 of 44

Evolutionary process

  • Suppose the objective function has
    • A strong preference for closed triangles
    • A general dislike for forming ties (keep density low)
  • Consider a possible tie between holly & john�vs holly & bill
    • Holly-John creates no (closed) triangles
    • Holly-Bill creates 3 triangles
    • The selection process will prefer�the Holly-Bill tie because it maxes�objectives
  • If objective function has negative�parameter for triangles, the holly-john tie becomes the more probable tie

(C) 2022 STEPHEN P BORGATTI

7

25-Apr-22

8 of 44

Formal model

(C) 2022 STEPHEN P BORGATTI

8

25-Apr-22

9 of 44

What is ERGM?

  •  

(C) 2020 STEPHEN P BORGATTI

9

25 April 2022

 

 

10 of 44

Converting problem to a logistic regression

  •  

(C) 2020 STEPHEN P BORGATTI

10

25 April 2022

Thanks to Filip Agneessens

 

Can cross out the Ks

11 of 44

  •  

(C) 2020 STEPHEN P BORGATTI

11

25 April 2022

Change in graph statistics that result from tie present vs absent

12 of 44

Odds <- change statistics

  • Take the natural log of this equation to remove the epower , giving logit:

  • Which results in a “simple” logistic regression equation
    • (or would be if the cases were independent – i.e. not “conditional on the rest of the graph, except for the parts included in the model”)
    • where���is the difference between graph statistics ( “change statistics” ) caused by adding a single tie

(C) 2020 STEPHEN P BORGATTI

12

25 April 2022

 

The “logit” of a probability is the log of the associated odds ratio

13 of 44

Logits to probabilities

  • The logit of a probability is the natural log of the associated odds
  • The odds of an event = the probability of it happening divided by probability it doesn’t happen
    • Odds = p/(1 – p)
  • Can convert odds to probabilities:
    • Multiply both sides by 1-p
      • odds*1 – odds*p = p
    • Add adds*p to both sides
      • odds = p + odds*p = p(1 + odds),
    • Divide both sides by 1 + odds
      • odds/(1+odds) = p

(C) 2022 STEPHEN P BORGATTI

13

25-Apr-22

So, a model written in terms of odds can ultimately be converted back to probabilities

14 of 44

Interpreting parameters

  • Simplest possible ERGM:
    • Logit(xij = 1) = b0*(Δ edges)
  • Edges is number of edges in network
  • Δ edges is change in number of edges when tie is present vs absent
    • Obviously, Δ edges is always 1
  • So, model says log odds of a tie = b0*1
    • Odds = exp(b0)
    • Probability of a tie = exp(b0)/(1 + exp(b0) = density of the network

(C) 2022 STEPHEN P BORGATTI

14

25-Apr-22

15 of 44

Simplest ERGM

  • B0 = -1.609*1, which is log odds of a tie,
  • Odds of a tie = exp(-1.609) = 0.200088
  • Prob of a tie is odds/(1+odds) = .2/(1+.2) = 0.1667
    • which is the density of padgm

(C) 2020 STEPHEN P BORGATTI

15

25 April 2022

Estimate Std. Error MCMC % p-value

edges -1.609 0.245 NA <1e-04 ***

Using PAGM data

Logit(Xij =1) = b0(Δedges)

16 of 44

Adding closure parameter

  • Logit(Xij =1) = b0(Δedges) + b1(Δtriangles)
  • b0 is like an intercept
    • Not normally interpreted
    • ~ density of network when no other parameters present
    • Typically negative
  • Parameter b1 is the increase in the log odds of a tie brought about by an increase of 1 triangle in the network due to adding a tie

(C) 2022 STEPHEN P BORGATTI

16

25-Apr-22

17 of 44

Padgm results

  • For a tie that will create
    • 0 triangles, such as a tie between a and b
      • the conditional log-odds is: -1.76, and probability of a tie is 0.146
    • 1 triangle, such as a tie between b and d
      • the logit is: -1.76 + 0.091 = -1.67, and probability of a tie is 0.158
    • 2 triangles, such as a tie between b and c
      • the logit is -1.76 +0.091*2= -1.58 (prob of tie is 0.17)

(C) 2020 STEPHEN P BORGATTI

17

25 April 2022

a

b

c

d

Logit(Xij =1) = b0(Δedges) + b1(Δtriangles)

The conditional log-odds of two actors having a tie is:

-1.764*(change in the number of ties) + 0.091*(change in number of triangles)

 

 

Estimate Std. Error MCMC % z value Pr(>|z|)

edges -1.76367 0.33136 0 -5.323 <0.0001 ***

triangle 0.09115 0.15760 0 0.578 0.563 NS

---

Note: your results will vary

18 of 44

Degree variance (centralization)

  • 2-stars
    • Nodes with high degree create many 2-stars

  • A network with low degree variance will have negative 2-star parameter

(C) 2022 STEPHEN P BORGATTI

18

25-Apr-22

19 of 44

(C) 2020 STEPHEN P BORGATTI

19

25 April 2022

Parameter

Name

Interpretation

Image

Arc

Edges

This is a baseline propensity for tie formation

Reciprocity

Mutual

This is often positive, suggesting that reciprocated ties are very likely to be observed for positive affect networks

Simple connectivity

twopath

This measures the extent to which actors who send ties also receive them. It controls for the correlation between in and out degree. It is often negative.

Popularity spread

gwidegree

popularity spread. indicates a network with high in-degree nodes (i.e., centralized)—gwidegree is opposite

Activity spread

gwodegree

popularity spread. indicates a network with high out-degree nodes (i.e., centralized)—gwiegree is opposite

Triangulation

gwesp

A positive effect indicates there is a high degree of closure, or multiple clusters of triangles in the data

Cyclic closure

ctriple

A negative effect indicates tendencies against cyclic triads (sometimes this is interpreted as a tendency against generalized exchange or generalized reciprocity)

Multiple connectivity

gwdsp

2-path in the networks. A negative estimate in conjunction with positive triangulation indicates that 2-paths tend to be closed (i.e., triangles)

20 of 44

Directed effects

(C) 2022 STEPHEN P BORGATTI

20

25-Apr-22

21 of 44

Exogenous effects

  • Here, exogenous means ‘not calculated from the network itself’
  • Node attributes
    • Wealthier nodes might have more ties
    • Women may have more ties
  • Similarities
    • Nodes with similar wealth (smaller diff) more likely to have tie
    • If nodes are the same gender, they may be more likely to have a tie

(C) 2022 STEPHEN P BORGATTI

21

25-Apr-22

22 of 44

Adding a node characteristic (exogenous var)

  • We have the wealth of each node, and believe wealthier nodes are more likely to connect
  • Logit(Xij =1) = b0(Δedges) + b1(wealth of pair)

(C) 2020 STEPHEN P BORGATTI

22

25 April 2022

Estimate Std. Error MCMC % p-value

edges -2.59493 0.53606 NA <1e-04 ***

nodecov.wealth 0.01055 0.00467 NA 0.026 *

23 of 44

Adding a node characteristic (exogenous var)

  • Log odds of a tie is -2.59*{change in the number of ties} + 0.01*{wealth of node i} + 0.01*{wealth of node j}
  • for a tie between two nodes with minimum wealth
    • the conditional log-odds is: -2.59 + 0.01*(3+3) = -2.53. Prob of a tie is 0.07
  • for a tie between two nodes with maximum wealth: -2.59 + 0.01*(146+146) = 0.33. prob of tie .58
  • for a tie between the node with maximum wealth and the node with minimum wealth:�-2.59 + 0.01*(146+3) = -1.1 prob of tie is 0.25

(C) 2020 STEPHEN P BORGATTI

23

25 April 2022

Estimate Std. Error MCMC % p-value

edges -2.59493 0.53606 NA <1e-04 ***

nodecov.wealth 0.01055 0.00467 NA 0.026 *

24 of 44

ERGM Goodness of fit

  •  

(C) 2020 STEPHEN P BORGATTI

24

25 April 2022

25 of 44

ERGM Goodness of fit – cont.

  • We use two sets of statistics
  • First, all statistics actually used in the model
    • # of triangles, # of reciprocated ties, # of cycles, # of 2-paths etc
    • The simulated networks had better reproduce these statistics very well
  • Second, additional statistics not explicitly included in the model
    • Number of nodes with degree 1, 2, 3 … etc
    • Number of pairs of nodes with distance 1, 2, 3 … etc
    • Simulated networks should reproduce these fairly well
  • Note that with only endogenous parameters, ERGMs do not predict specific ties at all well

(C) 2020 STEPHEN P BORGATTI

25

25 April 2022

26 of 44

ERGM in R

(C) 2022 STEPHEN P BORGATTI

26

25-Apr-22

27 of 44

(C) 2022 STEPHEN P BORGATTI

27

25-Apr-22

Download this script:

https://tinyurl.com/ergmscript

28 of 44

Padgm example

  • > padgm = Padgett_FlorentineFamilies$Marriage
  • >netpadgm = as.network(padgm)
  • > res = ergm(formula = netpadgm ~ edges)
  • > summary(res)
  • > odds = exp(-1.609)
  • > odds/(1 + odds)
  • > xDensity(padgm)

(C) 2022 STEPHEN P BORGATTI

28

25-Apr-22

29 of 44

Padgm triangles

  • > res = ergm(formula = netpadgm ~ edges + triangle)
  • > summary(res)�

(C) 2022 STEPHEN P BORGATTI

29

25-Apr-22

30 of 44

Goodness of fit

  • res = ergm(formula = netpadgm ~ edges + triangle)
  • summary(res)
  • gof(res)

(C) 2022 STEPHEN P BORGATTI

30

25-Apr-22

31 of 44

Reflections on ERGM

(C) 2022 STEPHEN P BORGATTI

31

25-Apr-22

32 of 44

General comments

  • Much more complicated than I indicated today
  • It’s not really a dyadic model
    • Meant to characterize network, not predict specific ties
    • More so, though, when node covariates are used
  • Model estimation sometimes (often?) fails
    • It’s not that some parameter turns out not to predict – you can’t run model

(C) 2022 STEPHEN P BORGATTI

32

25-Apr-22

33 of 44

Endogenous effects & exogenous covariates

  • Endogenous variables are all variables that can be calculated from the network itself
  • Exogenous variables are things that require outside information
    • Node attributes, such as gender
    • Other tie types, such as friendship ties when modeling advice ties
  • Endogenous does not mean that the effects reflect some process that is inherent in networks
    • It’s people (organizations, whatever) forming the ties
  • Beware of claims that these effects are “self-organizing processes”
    • This is a misconstrual of the idea that larger patterns in a network may emerge from local tendencies such as transitivity

(C) 2020 STEPHEN P BORGATTI

33

25 April 2022

34 of 44

Emergence & self-organization

  • Note clumpiness of graph
  • Networks with many �transitive triples tend to be clumpy
  • So clumpiness can be seen as an �emergent or self-�organizing property
  • But it is not �transitivity that is �self-organizing

(C) 2020 STEPHEN P BORGATTI

34

25 April 2022

35 of 44

Parameters and social processes

  • We assume that the observed network is the result of a long-term sequence of tie changes governed by social processes or mechanisms
  • We use the ERGM parameters to infer the social processes
  • There isn’t a 1-to-1 correspondence between social processes and ERGM parameters

(C) 2022 STEPHEN P BORGATTI

35

25-Apr-22

36 of 44

The triangle parameter

  • Positive significant triangle parameter indicates a tendency toward transitivity (i🡪j, j🡪k, and i🡪k)
  • Transitivity is not a social process. It is the outcome of a social process
    • And not just one

(C) 2022 STEPHEN P BORGATTI

36

25-Apr-22

37 of 44

Triangle parameter – cont.

  • Mechanism 1 - Balance theory (Heider; Festinger)
    • If A likes B, and B likes C, it induces cognitive dissonance in A if A doesn’t also appreciate C
      • May also cause conflict
    • So A will tend to like C, or drop B. Either way, we should see many triples in which A--B, B--C, and A--C
  • Mechanism 2 – Opportunity
    • If A makes friends with B, there is a good chance that B will introduce A to B’s friends, closing the gap

(C) 2022 STEPHEN P BORGATTI

37

25-Apr-22

38 of 44

Other models

(C) 2022 STEPHEN P BORGATTI

38

25-Apr-22

39 of 44

How does ergm relate to qap?

  • Qap is a dyadic model that predicts presence/absence/strength of ties between each pair of nodes as a function of other dyadic variables
    • Friends 🡨 distance + samegender
  • Ergm is fundamentally a whole network model that models the overall pattern as the result of a combination of micro-processes
    • Not ideal for predicting specific ties
  • They intersect if you use ergm with a dyadic covariate
    • P(Y = y) = exp(b0*Edges + b1*Triangles + b2*nodematch(gender) + b3*dyadxcov(distance))

(C) 2022 STEPHEN P BORGATTI

39

25-Apr-22

40 of 44

When using a dyadic covariate in ergm …

  • When using a dyadic covariate in ergm, then:

(C) 2022 STEPHEN P BORGATTI

40

25-Apr-22

QAP

ERGM

  • Conceptually simple
  • Easy to implement
  • Easily customizable – any standard statistic can be recast in QAP terms
  • Parameters identical to standard regression
  • Handles all sources of interdependence, even unknown ones
  • Difficult/impossible to quantify these sources of interdependence
  • Dependent variable can be anything -- e.g., correlation coefficients
  • Goodness of fit measured at the tie level�-- how well did we predict each tie?

  • Statistically complex
  • Difficult to estimate – often fails to converge
  • Effects to be tested must be pre-programmed by an expert
  • Parameters can be hard to interpret
  • An unknown source of interdependence can invalidate the model
  • Provides rich description of sources of inter-dependence, interpreted as tendencies toward certain micro-configurations
  • Dependent variable must be a network and, in practice, binary
  • Goodness of fit measured at the level of whole network statistics like degree dist.

41 of 44

Longitudinal extensions to ERGM

  • TERGM (temporal ergm)
    • Essentially an ERGM in which the network at time t-1 is included as a dyadic covariate
    • Assumes tie changes are independent of each other
  • STERGM (separable tergm)
    • Separately models tie formation and tie dissolution
  • LERGM (longitudinal ergm)
    • Similar to SAOM (covered next)
    • Continuous time model in which change is made one dyad at a time

(C) 2020 STEPHEN P BORGATTI

41

25 April 2022

42 of 44

SAOMs

  • Stochastic actor-oriented models
    • Implemented as rSiena package. So, often called Siena models
  • Designed to model process of change in ties
    • For directed networks, it is assumed that each actor only has control over their own outgoing ties
    • For undirected networks, actor proposes a tie to an alter, and the alter may or may not agree
    • The ties have inertia: they are states rather than events
  • Models have similar parameters to ERGMs
    • Endogenous effects - # of 2-paths, triangles, etc
    • Exogenous effects – node attributes, other relations, etc.

(C) 2020 STEPHEN P BORGATTI

42

25 April 2022

43 of 44

SAOM process

  • At each time point, randomly chosen node given opportunity to make a tie, keep a tie, or dissolve a tie with some other node
  • The actor makes choices that maximize an objective function based on the state of the network that would be true after the change
  • Actors act without coordination
    • But they change each other’s environment, so they are interdependent

(C) 2020 STEPHEN P BORGATTI

43

25 April 2022

44 of 44

(C) 2020 STEPHEN P BORGATTI

44

25 April 2022

 

MR-QAP

ERGM

LERGM

SAOM

REM

Summary description

Multiple regression in which cases are dyads and dependent variable is presence/absence or strength of tie. Significance assessed via permutation test

Odds of a tie modeled as a function of the contribution of that tie to graph statistics reflecting prevalence of micro-configurations

Odds of a tie modeled as a function of the contribution of that tie to graph statistics reflecting prevalence of micro-configurations

Actors make choices to add/drop ties to maximize utility function

 

What is modeled

Presence/absence or strength of ties

- Probability of observed network given set of network statistics (e.g., counts of micro-configurations)�- Presence/absence of ties as equilibrium outcome of tie-wise change process

Presence/absence of ties

- Presence/absence of ties�- (optional) Change in node-level attributes (e.g., behaviors)

Occurrence of relational events

Kinds of effects

Any variables that can be expressed in dyadic form, including both endogenous and exogeneous variables, as well as attribute-based variables�2. Endogeneous variables (e.g., whether a tie completes a transitive triple)

1. Exogenous covariates (e.g., similarity of interests)�2. Endogenous variables (e.g., # of transitive triples that would be added if tie were added)

1. Exogenous covariates (e.g., similarity of interests)�2. Endogenous variables (e.g., # of transitive triples that would be added if tie were added)

1. Exogenous covariates (e.g., similarity of interests)�2. Endogenous variables (e.g., # of transitive triples that would be added if tie were added)

 

Meaning of parameters

Positive parameter for X indicates that larger values of X are associated with greater probability or strength of tie

1. Exogenous variables: change in odds of tie given unit increase in X�2. Endogenous variables: Positive parameter for X (e.g., transitivity) indicates that odds of a tie is higher to the extent its presence contributes to change in X (e.g., change in # of transitive triples)

1. Exogenous variables: change in odds of tie given unit increase in X�2. Endogenous variables: Positive parameter for X (e.g., transitivity) indicates that odds of a tie is higher to the extent its presence contributes to change in X (e.g., change in # of transitive triples)

1. Exogenous variables: change in odds of tie given unit increase in X�2. Endogenous variables: Positive parameter for X (e.g., transitivity) indicates that odds of a tie is higher to the extent its presence contributes to change in X (e.g., change in # of transitive triples)

 

Approach to change

Implicit. Model gives effect of X on Y. Hence, we predict change in Y given a change in X. Variants can model Y(t)-Y(t-1) or control for Y(t-1) when modeling Y(t)

Implicit. Model gives effect of X on Y. Hence, we predict change in Y given a change in X. TERGM variant can control for Y at t-1

Explicit. Continuous time framework updates dyadic dependencies as each dyad changes. Ordering of tie changes makes a difference

Explicit. Continuous time framework updates dyadic dependencies as each dyad changes. Ordering of tie changes makes a difference

 

Approach to non-independence of observations in dyadic data

Control for sources of dependence via permutation method

Explicitly model sources of dependence between dyads

Explicitly model sources of dependence between dyads

Explicitly model sources of dependence between dyads; dependencies can be asymmetric (dependency of i-->j on u-->v does not imply the reverse)

 

Goodness of fit

Dyadic. Difference, across all dyads, between observed Yij and predicted Yij

Whole network. Do networks simulated from the model have the same overall network statistics as the observed network?

Whole network. Do networks simulated from the model have the same overall network statistics as the observed network?

Whole network. Do networks simulated from the model have the same overall network statistics as the observed network?

 

Temporal variants

- Can include lagged versions of both Xs and Y as controls�- Can explicitly model Y(t) - Y(t-1)

- Include lagged versions of both Xs and Y as controls (aka TERGM)�- Model tie formation and dissolution independently (aka STERGM)

Not applicable

Not applicable

Not applicable