Part 4: Network Inference

# Part 4: Network Inference
## #aectRTD workshop
### K. Bret Staudt Willet | Florida State University
### March 4, 2022

---

---

# <svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:#fff;overflow:visible;position:relative;"><path d="M326.612 185.391c59.747 59.809 58.927 155.698.36 214.59-.11.12-.24.25-.36.37l-67.2 67.2c-59.27 59.27-155.699 59.262-214.96 0-59.27-59.26-59.27-155.7 0-214.96l37.106-37.106c9.84-9.84 26.786-3.3 27.294 10.606.648 17.722 3.826 35.527 9.69 52.721 1.986 5.822.567 12.262-3.783 16.612l-13.087 13.087c-28.026 28.026-28.905 73.66-1.155 101.96 28.024 28.579 74.086 28.749 102.325.51l67.2-67.19c28.191-28.191 28.073-73.757 0-101.83-3.701-3.694-7.429-6.564-10.341-8.569a16.037 16.037 0 0 1-6.947-12.606c-.396-10.567 3.348-21.456 11.698-29.806l21.054-21.055c5.521-5.521 14.182-6.199 20.584-1.731a152.482 152.482 0 0 1 20.522 17.197zM467.547 44.449c-59.261-59.262-155.69-59.27-214.96 0l-67.2 67.2c-.12.12-.25.25-.36.37-58.566 58.892-59.387 154.781.36 214.59a152.454 152.454 0 0 0 20.521 17.196c6.402 4.468 15.064 3.789 20.584-1.731l21.054-21.055c8.35-8.35 12.094-19.239 11.698-29.806a16.037 16.037 0 0 0-6.947-12.606c-2.912-2.005-6.64-4.875-10.341-8.569-28.073-28.073-28.191-73.639 0-101.83l67.2-67.19c28.239-28.239 74.3-28.069 102.325.51 27.75 28.3 26.872 73.934-1.155 101.96l-13.087 13.087c-4.35 4.35-5.769 10.79-3.783 16.612 5.864 17.194 9.042 34.999 9.69 52.721.509 13.906 17.454 20.446 27.294 10.606l37.106-37.106c59.271-59.259 59.271-155.699.001-214.959z"/></svg> Important Links

## Homebase

- **Workshop website:** https://bretsw.github.io/aect22-workshop
- **Workshop code repository:** https://github.com/bretsw/aect22-workshop
- **tidytags R package:** https://github.com/ropensci/tidytags

## Agenda

- **Part 1: Introduction to Networks**
  - Slides: [Part 1 - Networks](1-networks.html)
- **Part 2: Introduction to R**
  - Slides: [Part 2 - R](2-intro-R.html)
- **Part 3: Network Description**
  - Slides: [Part 3 - Description](3-description.html)
- **Part 4: Network Inference**
  - Slides: [Part 4 - Inference](4-inference.html)

## Help

- Ask questions in the Zoom chat!
- Or, reach out directly:
  - Email: [bret.staudtwillet@fsu.edu](mailto:bret.staudtwillet@fsu.edu)
  - Twitter: [@bretsw](https://twitter.com/bretsw)

---

# <svg aria-hidden="true" role="img" viewBox="0 0 448 512" style="height:1em;width:0.88em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:#fff;overflow:visible;position:relative;"><path d="M212.686 315.314L120 408l32.922 31.029c15.12 15.12 4.412 40.971-16.97 40.971h-112C10.697 480 0 469.255 0 456V344c0-21.382 25.803-32.09 40.922-16.971L72 360l92.686-92.686c6.248-6.248 16.379-6.248 22.627 0l25.373 25.373c6.249 6.248 6.249 16.378 0 22.627zm22.628-118.628L328 104l-32.922-31.029C279.958 57.851 290.666 32 312.048 32h112C437.303 32 448 42.745 448 56v112c0 21.382-25.803 32.09-40.922 16.971L376 152l-92.686 92.686c-6.248 6.248-16.379 6.248-22.627 0l-25.373-25.373c-6.249-6.248-6.249-16.378 0-22.627z"/></svg> <br><br> **Part 4:** <br> Network Inference

---

- <svg aria-hidden="true" role="img" viewBox="0 0 640 512" style="height:1em;width:1.25em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:#782F40;overflow:visible;position:relative;"><path d="M425.7 256c-16.9 0-32.8-9-41.4-23.4L320 126l-64.2 106.6c-8.7 14.5-24.6 23.5-41.5 23.5-4.5 0-9-.6-13.3-1.9L64 215v178c0 14.7 10 27.5 24.2 31l216.2 54.1c10.2 2.5 20.9 2.5 31 0L551.8 424c14.2-3.6 24.2-16.4 24.2-31V215l-137 39.1c-4.3 1.3-8.8 1.9-13.3 1.9zm212.6-112.2L586.8 41c-3.1-6.2-9.8-9.8-16.7-8.9L320 64l91.7 152.1c3.8 6.3 11.4 9.3 18.5 7.3l197.9-56.5c9.9-2.9 14.7-13.9 10.2-23.1zM53.2 41L1.7 143.8c-4.6 9.2.3 20.2 10.1 23l197.9 56.5c7.1 2 14.7-1 18.5-7.3L320 64 69.8 32.1c-6.9-.8-13.5 2.7-16.6 8.9z"/></svg> [**igraph**](https://CRAN.R-project.org/package=igraph)

---

---

**Article:** [A social network perspective on peer supported learning in MOOCs for educators ](http://www.irrodl.org/index.php/irrodl/article/view/1852) (Kellogg, Booth, & Oliver, 2014)se(edgelist1)

---

What do you think this code will do?

```r
graph2_connected <-
  graph2 %>%
  delete_vertices(which((vertex_attr(., 'in_degree') ==  0)))
clusters0 <- 
  graph2_connected %>% igraph::cluster_spinglass()
```

---

Let's find out!

```r
graph2_connected <-
  graph2 %>%
  delete_vertices(which((vertex_attr(., 'in_degree') ==  0)))
clusters0 <- 
  graph2_connected %>% igraph::cluster_spinglass()
```

This code searches for **clusters**, or communities within the network. A **community** is a set of nodes with many edges inside the community and few edges between outside it (i.e. between the community itself and the rest of the network).

Specifically, this code uses the **spinglass clustering algorithm** to map community detection onto finding the ground state of an infinite range spin glass (i.e., fancy physics). In other words, the spinglass algorithm partitions the nodes into communities by optimizing an energy function.

---

Let's find out!

```r
graph2_connected <-
  graph2 %>%
  delete_vertices(which((vertex_attr(., 'in_degree') ==  0)))
clusters0 <- 
  graph2_connected %>% igraph::cluster_spinglass()
```

One of the important outcomes of this method is the **modularity** value `$M$`. Modularity measures how good the division is, or how separated are the different vertex types from each other.

The spinglass algorithm looks for the modularity of the optimal partition. For a given network, the partition with maximum modularity corresponds to the optimal community structure (i.e., a higher `$M$` is better).

Note also that if `$M$` = 0, all nodes belong to one group; if `$M$` < 0, each node belongs to separate community.

<hr>

Our initial use of the spinglass algorithm found **9 clusters** and `$M$` = **0.314**.

---

It is important to note that a different result is returned each time the spinglass clustering algorithm is run.

For this reason, we needed to run a number of simulations to see how many clusters the spinglass algorithm "typically" finds.

What do you think this code will do?

```r
cluster_matrix <- matrix(NA, nrow=1, ncol=1000)
for (i in 1:1000) {
        print(i)
        set.seed(i)
        csg = graph2_connected %>% igraph::cluster_spinglass()
        cluster_matrix[1,i] <- max(csg$membership)
}
```

---

Let's see!

<table class="table table-striped table-bordered" style="margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;">   </th>
   <th style="text-align:left;">  </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> number of tests: </td>
   <td style="text-align:left;"> 1000.00 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> mean: </td>
   <td style="text-align:left;"> 9.44 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> sd: </td>
   <td style="text-align:left;"> 1.02 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> min: </td>
   <td style="text-align:left;"> 6.00 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> max: </td>
   <td style="text-align:left;"> 14.00 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> median: </td>
   <td style="text-align:left;"> 9.00 </td>
  </tr>
</tbody>
</table>

---

What do you think this code will do?

```r
seeds <- which(as.vector(cluster_matrix) == median(cluster_matrix))
cluster_seed <- seeds %>% sample(1)
```

---

Let's see!

<table class="table table-striped table-bordered" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;">   </th>
   <th style="text-align:left;"> Score </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Number of nodes: </td>
   <td style="text-align:left;"> 442.00 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Number of edges: </td>
   <td style="text-align:left;"> 1978.00 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Modularity: </td>
   <td style="text-align:left;"> 0.31 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Number of clusters: </td>
   <td style="text-align:left;"> 9.00 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Size of cluster 1: </td>
   <td style="text-align:left;"> 34.00 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Size of cluster 2: </td>
   <td style="text-align:left;"> 58.00 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Size of cluster 3: </td>
   <td style="text-align:left;"> 59.00 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Size of cluster 4: </td>
   <td style="text-align:left;"> 63.00 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Size of cluster 5: </td>
   <td style="text-align:left;"> 37.00 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Size of cluster 6: </td>
   <td style="text-align:left;"> 108.00 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Size of cluster 7: </td>
   <td style="text-align:left;"> 32.00 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Size of cluster 8: </td>
   <td style="text-align:left;"> 4.00 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Size of cluster 9: </td>
   <td style="text-align:left;"> 47.00 </td>
  </tr>
</tbody>
</table>

---

We also want to see if these clusters appear merely by random chance, or if the interaction patterns are likely to be nonrandom.

Testing statistical significance for spinglass clustering is a bit different than the familiar tests that return `$p$`-values.

The idea behind this test of significance is that a random network of equal size and degree distribution as our studied network should have a lower modularity score--that is, if the observed network does in fact have statistically significant clustering.

The testing strategy is to generate 100 randomized instances of our network with the same size and degree distribution using the `sample_ degseq()` function.

---

What do you think this code will do?

```r
degrees <- 
  graph2_connected %>% 
  igraph::as.undirected() %>% 
  igraph::degree(mode='all')

random_modularities <- 
  replicate(100, 
            igraph::sample_degseq(degrees, method="vl"), 
            simplify=FALSE) %>%
  lapply(igraph::cluster_spinglass) %>%
  sapply(igraph::modularity)
```

---

Let's see!

```r
degrees <- 
  graph2_connected %>% 
  igraph::as.undirected() %>% 
  igraph::degree(mode='all')

A '0' result from this procedure indicates that no randomized networks have community structure with a modularity score that is higher than the one obtained from the original, observed network. Hence a '0' result means that our network has significant community structure; any non-zero results means that the detected spinglass clusters are not statistically significant.

Our testing strategy returned a result of **0**.

---

What do you think this code will do?

```r
cluster_membership <-
  clusters$membership %>%
  as.character()

graph2_clustered <- 
  graph2_connected %>%
  igraph::set_vertex_attr(name = 'popularity', 
                  value = degree(graph2_connected, mode = 'in')) %>%
  igraph::set_vertex_attr(name = 'grp', 
                  value = cluster_membership) %>%
  set_edge_attr(name='grp_weight', 
                value=ifelse(igraph::crossing(clusters, graph2_connected), 1, 15))
```

---

```r
sociogram2_clustered <-
  graph2_clustered %>%
  ggraph(layout = 'fr') +
  geom_edge_arc(alpha = .1, 
                width = .5, 
                strength = .5,
                color = 'steelblue'
  ) +
  geom_node_point(aes(size = popularity,
                      fill = grp),
                  alpha = .5,
                  color = 'black',
                  shape = 21
  ) +
  scale_fill_brewer(palette = 'Set1', guide = 'none') +
  scale_size(range = c(1,15), guide = 'none') +
  theme_wsj() +
  theme(axis.line=element_blank(),
        axis.text.x=element_blank(), axis.text.y=element_blank(),
        axis.ticks.x =element_blank(), axis.ticks.y =element_blank(),
        axis.title.x=element_blank(), axis.title.y=element_blank(),
        panel.background=element_blank(), panel.border=element_blank(),
        panel.grid.major=element_blank(), panel.grid.minor=element_blank())
```

---

---

---

### **Influence** and **Selection**

---

### **Influence** and **Selection**

Unfortunately, we don't have time to get into the important, but quite advanced, topics of SNA inference of **influence** and **selection**. I hope you'll keep exploring these areas, and I highly recommend three sources of information, in increasing order of difficulty:

1. [Chapter 20.3 Appendix C](https://datascienceineducation.com/c20.html#c20c) - "Social Network Influence and Selection Models" in the wonderful guide, [*Data Science in Education Using R*](https://datascienceineducation.com).

1. The article ["Idle chatter or compelling conversation? The potential of the social media-based #NGSSchat network for supporting science education reform efforts"](https://doi.org/10.1002/tea.21660) in *Journal of Research in Science  written by my colleague [Josh Rosenberg](https://joshuamrosenberg.com/).

1. A trove of SNA resources on the website of [Ken Frank](https://sites.google.com/msu.edu/kenfrank/social-network-resources), Professor at Michigan State University.

---

# <svg aria-hidden="true" role="img" viewBox="0 0 640 512" style="height:1em;width:1.25em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:#fff;overflow:visible;position:relative;"><path d="M278.9 511.5l-61-17.7c-6.4-1.8-10-8.5-8.2-14.9L346.2 8.7c1.8-6.4 8.5-10 14.9-8.2l61 17.7c6.4 1.8 10 8.5 8.2 14.9L293.8 503.3c-1.9 6.4-8.5 10.1-14.9 8.2zm-114-112.2l43.5-46.4c4.6-4.9 4.3-12.7-.8-17.2L117 256l90.6-79.7c5.1-4.5 5.5-12.3.8-17.2l-43.5-46.4c-4.5-4.8-12.1-5.1-17-.5L3.8 247.2c-5.1 4.7-5.1 12.8 0 17.5l144.1 135.1c4.9 4.6 12.5 4.4 17-.5zm327.2.6l144.1-135.1c5.1-4.7 5.1-12.8 0-17.5L492.1 112.1c-4.8-4.5-12.4-4.3-17 .5L431.6 159c-4.6 4.9-4.3 12.7.8 17.2L523 256l-90.6 79.7c-5.1 4.5-5.5 12.3-.8 17.2l43.5 46.4c4.5 4.9 12.1 5.1 17 .6z"/></svg> <br><br> Try it out!

Hop over to [**Workspace 4**](workspace4.Rmd)

---

# <svg aria-hidden="true" role="img" viewBox="0 0 576 512" style="height:1em;width:1.12em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:#fff;overflow:visible;position:relative;"><path d="M532 386.2c27.5-27.1 44-61.1 44-98.2 0-80-76.5-146.1-176.2-157.9C368.3 72.5 294.3 32 208 32 93.1 32 0 103.6 0 192c0 37 16.5 71 44 98.2-15.3 30.7-37.3 54.5-37.7 54.9-6.3 6.7-8.1 16.5-4.4 25 3.6 8.5 12 14 21.2 14 53.5 0 96.7-20.2 125.2-38.8 9.2 2.1 18.7 3.7 28.4 4.9C208.1 407.6 281.8 448 368 448c20.8 0 40.8-2.4 59.8-6.8C456.3 459.7 499.4 480 553 480c9.2 0 17.5-5.5 21.2-14 3.6-8.5 1.9-18.3-4.4-25-.4-.3-22.5-24.1-37.8-54.8zm-392.8-92.3L122.1 305c-14.1 9.1-28.5 16.3-43.1 21.4 2.7-4.7 5.4-9.7 8-14.8l15.5-31.1L77.7 256C64.2 242.6 48 220.7 48 192c0-60.7 73.3-112 160-112s160 51.3 160 112-73.3 112-160 112c-16.5 0-33-1.9-49-5.6l-19.8-4.5zM498.3 352l-24.7 24.4 15.5 31.1c2.6 5.1 5.3 10.1 8 14.8-14.6-5.1-29-12.3-43.1-21.4l-17.1-11.1-19.9 4.6c-16 3.7-32.5 5.6-49 5.6-54 0-102.2-20.1-131.3-49.7C338 339.5 416 272.9 416 192c0-3.4-.4-6.7-.7-10C479.7 196.5 528 238.8 528 288c0 28.7-16.2 50.6-29.7 64z"/></svg> <br><br> Quick Check In

**(Five minutes in groups, five minutes together)**

- What challenges did you encounter?
- What successes did you have?
- What questions remain?

---

# <svg aria-hidden="true" role="img" viewBox="0 0 384 512" style="height:1em;width:0.75em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:#fff;overflow:visible;position:relative;"><path d="M172.268 501.67C26.97 291.031 0 269.413 0 192 0 85.961 85.961 0 192 0s192 85.961 192 192c0 77.413-26.97 99.031-172.268 309.67-9.535 13.774-29.93 13.773-39.464 0zM192 272c44.183 0 80-35.817 80-80s-35.817-80-80-80-80 35.817-80 80 35.817 80 80 80z"/></svg> <br><br> Recap

---

# <svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:#fff;overflow:visible;position:relative;"><path d="M80 368H16a16 16 0 0 0-16 16v64a16 16 0 0 0 16 16h64a16 16 0 0 0 16-16v-64a16 16 0 0 0-16-16zm0-320H16A16 16 0 0 0 0 64v64a16 16 0 0 0 16 16h64a16 16 0 0 0 16-16V64a16 16 0 0 0-16-16zm0 160H16a16 16 0 0 0-16 16v64a16 16 0 0 0 16 16h64a16 16 0 0 0 16-16v-64a16 16 0 0 0-16-16zm416 176H176a16 16 0 0 0-16 16v32a16 16 0 0 0 16 16h320a16 16 0 0 0 16-16v-32a16 16 0 0 0-16-16zm0-320H176a16 16 0 0 0-16 16v32a16 16 0 0 0 16 16h320a16 16 0 0 0 16-16V80a16 16 0 0 0-16-16zm0 160H176a16 16 0 0 0-16 16v32a16 16 0 0 0 16 16h320a16 16 0 0 0 16-16v-32a16 16 0 0 0-16-16z"/></svg> <br><br> Appendix: <br> Helpful Resources <br> and Troubleshooting

---

# Resources

**Beginners:**
- [RStudio Beginners' Guide](https://education.rstudio.com/learn/beginner/)
- Book: [*Data Science in Education Using R*](https://datascienceineducation.com)
  - See [Chapter 12](https://datascienceineducation.com/c12.html) - Walkthrough 6: Exploring Relationships Using Social Network Analysis With Social Media Data
  - [Physical copy of DSIEUR](https://www.routledge.com/Data-Science-in-Education-Using-R/Estrellado-Freer-Mostipak-Rosenberg-Velasquez/p/book/9780367422257)
  - [Even more resources from DSIEUR](https://datascienceineducation.com/c18.html)

**Intermediates:**
- [RStudio Intermediates' Guide](https://education.rstudio.com/learn/intermediate/)
- [{tidytags} package notes](https://docs.ropensci.org/tidytags/index.html)
- Book: [*R for Data Science*](http://r4ds.had.co.nz/)

**Experts:**
- [RStudio Experts' Guide](https://education.rstudio.com/learn/expert/)
- Book: [*Learning Statistics with R*](https://learningstatisticswithr.com/)
- [*Data Science in Education Using R*](https://datascienceineducation.com)
  - See [Chapter 20.3 Appendix C](https://datascienceineducation.com/c20.html#c20c) - Social Network Influence and Selection Models
- SNA resources: [Dr. Ken Frank's website](https://sites.google.com/msu.edu/kenfrank/social-network-resources)

---

# Troubleshooting

- Try to find out what the specific problem is
  -  Identify what is *not* causing the problem
- "Unplug and plug it back in" - restart R; close and reopen R
- Seek out workshops and other learning opportunities
- Reach out to others! Sharing what is causing an issue can often help to clarify the problem
  - [RStudio Community forum](https://community.rstudio.com/) (highly recommended!)
  - Twitter hashtag: [#RStats](https://twitter.com/search?q=%23RStats&src=typeahead_click&f=live)
  - [Contact Bret!](http://bretsw.com)
- General strategies on learning more: [Chapter 17 of *Data Science in Education Using R*](https://datascienceineducation.com/c17.html)

---

# <svg aria-hidden="true" role="img" viewBox="0 0 640 512" style="height:1em;width:1.25em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:#fff;overflow:visible;position:relative;"><path d="M192 160h32V32h-32c-35.35 0-64 28.65-64 64s28.65 64 64 64zM0 416c0 35.35 28.65 64 64 64h32V352H64c-35.35 0-64 28.65-64 64zm337.46-128c-34.91 0-76.16 13.12-104.73 32-24.79 16.38-44.52 32-104.73 32v128l57.53 15.97c26.21 7.28 53.01 13.12 80.31 15.05 32.69 2.31 65.6.67 97.58-6.2C472.9 481.3 512 429.22 512 384c0-64-84.18-96-174.54-96zM491.42 7.19C459.44.32 426.53-1.33 393.84.99c-27.3 1.93-54.1 7.77-80.31 15.04L256 32v128c60.2 0 79.94 15.62 104.73 32 28.57 18.88 69.82 32 104.73 32C555.82 224 640 192 640 128c0-45.22-39.1-97.3-148.58-120.81z"/></svg> <br><br> *Next up* <br> Choose Your Own Adventure!