class: left, bottom, title-slide # Part 4: Network Inference ## #aectRTD workshop ### K. Bret Staudt Willet | Florida State University ### March 4, 2022 --- class: inverse, center, middle #
<br><br> Workshop Information --- #
Important Links ## Homebase - **Workshop website:** https://bretsw.github.io/aect22-workshop - **Workshop code repository:** https://github.com/bretsw/aect22-workshop - **tidytags R package:** https://github.com/ropensci/tidytags ## Agenda - **Part 1: Introduction to Networks** - Slides: [Part 1 - Networks](1-networks.html) - **Part 2: Introduction to R** - Slides: [Part 2 - R](2-intro-R.html) - **Part 3: Network Description** - Slides: [Part 3 - Description](3-description.html) - **Part 4: Network Inference** - Slides: [Part 4 - Inference](4-inference.html) ## Help - Ask questions in the Zoom chat! - Or, reach out directly: - Email: [bret.staudtwillet@fsu.edu](mailto:bret.staudtwillet@fsu.edu) - Twitter: [@bretsw](https://twitter.com/bretsw) --- class: inverse, center, middle #
<br><br> **Part 4:** <br> Network Inference --- #
Useful R packages <img src="img/tools.jpg" width="600px" style="display: block; margin: auto;" /> -- -
[**igraph**](https://CRAN.R-project.org/package=igraph) -- -
[**ergm**](https://CRAN.R-project.org/package=ergm) -- -
[**brms**](https://CRAN.R-project.org/package=brms) --- class: inverse, center, middle #
<br><br> Inference 1: <br> **Clusters** --- #
Inference 1: Clusters <img src="img/article-cover.png" width="700px" style="display: block; margin: auto;" /> **Article:** [A social network perspective on peer supported learning in MOOCs for educators ](http://www.irrodl.org/index.php/irrodl/article/view/1852) (Kellogg, Booth, & Oliver, 2014)se(edgelist1) --- #
Inference 1: Clusters What do you think this code will do? ```r graph2_connected <- graph2 %>% delete_vertices(which((vertex_attr(., 'in_degree') == 0))) clusters0 <- graph2_connected %>% igraph::cluster_spinglass() ``` --- #
Inference 1: Clusters Let's find out! ```r graph2_connected <- graph2 %>% delete_vertices(which((vertex_attr(., 'in_degree') == 0))) clusters0 <- graph2_connected %>% igraph::cluster_spinglass() ``` -- This code searches for **clusters**, or communities within the network. A **community** is a set of nodes with many edges inside the community and few edges between outside it (i.e. between the community itself and the rest of the network). -- Specifically, this code uses the **spinglass clustering algorithm** to map community detection onto finding the ground state of an infinite range spin glass (i.e., fancy physics). In other words, the spinglass algorithm partitions the nodes into communities by optimizing an energy function. --- #
Inference 1: Clusters Let's find out! ```r graph2_connected <- graph2 %>% delete_vertices(which((vertex_attr(., 'in_degree') == 0))) clusters0 <- graph2_connected %>% igraph::cluster_spinglass() ``` One of the important outcomes of this method is the **modularity** value `\(M\)`. Modularity measures how good the division is, or how separated are the different vertex types from each other. -- The spinglass algorithm looks for the modularity of the optimal partition. For a given network, the partition with maximum modularity corresponds to the optimal community structure (i.e., a higher `\(M\)` is better). -- Note also that if `\(M\)` = 0, all nodes belong to one group; if `\(M\)` < 0, each node belongs to separate community. -- <hr> Our initial use of the spinglass algorithm found **9 clusters** and `\(M\)` = **0.314**. --- #
Inference 1: Clusters It is important to note that a different result is returned each time the spinglass clustering algorithm is run. -- For this reason, we needed to run a number of simulations to see how many clusters the spinglass algorithm "typically" finds. -- What do you think this code will do? ```r cluster_matrix <- matrix(NA, nrow=1, ncol=1000) for (i in 1:1000) { print(i) set.seed(i) csg = graph2_connected %>% igraph::cluster_spinglass() cluster_matrix[1,i] <- max(csg$membership) } ``` --- #
Inference 1: Clusters Let's see! ```r cluster_matrix <- matrix(NA, nrow=1, ncol=1000) for (i in 1:1000) { print(i) set.seed(i) csg = graph2_connected %>% igraph::cluster_spinglass() cluster_matrix[1,i] <- max(csg$membership) } ``` <table class="table table-striped table-bordered" style="margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:left;"> </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> number of tests: </td> <td style="text-align:left;"> 1000.00 </td> </tr> <tr> <td style="text-align:left;"> mean: </td> <td style="text-align:left;"> 9.44 </td> </tr> <tr> <td style="text-align:left;"> sd: </td> <td style="text-align:left;"> 1.02 </td> </tr> <tr> <td style="text-align:left;"> min: </td> <td style="text-align:left;"> 6.00 </td> </tr> <tr> <td style="text-align:left;"> max: </td> <td style="text-align:left;"> 14.00 </td> </tr> <tr> <td style="text-align:left;"> median: </td> <td style="text-align:left;"> 9.00 </td> </tr> </tbody> </table> --- #
Inference 1: Clusters What do you think this code will do? ```r seeds <- which(as.vector(cluster_matrix) == median(cluster_matrix)) cluster_seed <- seeds %>% sample(1) ``` --- #
Inference 1: Clusters Let's see! <table class="table table-striped table-bordered" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:left;"> Score </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Number of nodes: </td> <td style="text-align:left;"> 442.00 </td> </tr> <tr> <td style="text-align:left;"> Number of edges: </td> <td style="text-align:left;"> 1978.00 </td> </tr> <tr> <td style="text-align:left;"> Modularity: </td> <td style="text-align:left;"> 0.31 </td> </tr> <tr> <td style="text-align:left;"> Number of clusters: </td> <td style="text-align:left;"> 9.00 </td> </tr> <tr> <td style="text-align:left;"> Size of cluster 1: </td> <td style="text-align:left;"> 34.00 </td> </tr> <tr> <td style="text-align:left;"> Size of cluster 2: </td> <td style="text-align:left;"> 58.00 </td> </tr> <tr> <td style="text-align:left;"> Size of cluster 3: </td> <td style="text-align:left;"> 59.00 </td> </tr> <tr> <td style="text-align:left;"> Size of cluster 4: </td> <td style="text-align:left;"> 63.00 </td> </tr> <tr> <td style="text-align:left;"> Size of cluster 5: </td> <td style="text-align:left;"> 37.00 </td> </tr> <tr> <td style="text-align:left;"> Size of cluster 6: </td> <td style="text-align:left;"> 108.00 </td> </tr> <tr> <td style="text-align:left;"> Size of cluster 7: </td> <td style="text-align:left;"> 32.00 </td> </tr> <tr> <td style="text-align:left;"> Size of cluster 8: </td> <td style="text-align:left;"> 4.00 </td> </tr> <tr> <td style="text-align:left;"> Size of cluster 9: </td> <td style="text-align:left;"> 47.00 </td> </tr> </tbody> </table> --- #
Inference 1: Clusters We also want to see if these clusters appear merely by random chance, or if the interaction patterns are likely to be nonrandom. -- Testing statistical significance for spinglass clustering is a bit different than the familiar tests that return `\(p\)`-values. -- The idea behind this test of significance is that a random network of equal size and degree distribution as our studied network should have a lower modularity score--that is, if the observed network does in fact have statistically significant clustering. -- The testing strategy is to generate 100 randomized instances of our network with the same size and degree distribution using the `sample_ degseq()` function. --- #
Inference 1: Clusters What do you think this code will do? ```r degrees <- graph2_connected %>% igraph::as.undirected() %>% igraph::degree(mode='all') random_modularities <- replicate(100, igraph::sample_degseq(degrees, method="vl"), simplify=FALSE) %>% lapply(igraph::cluster_spinglass) %>% sapply(igraph::modularity) ``` --- #
Inference 1: Clusters Let's see! ```r degrees <- graph2_connected %>% igraph::as.undirected() %>% igraph::degree(mode='all') random_modularities <- replicate(100, igraph::sample_degseq(degrees, method="vl"), simplify=FALSE) %>% lapply(igraph::cluster_spinglass) %>% sapply(igraph::modularity) ``` -- A '0' result from this procedure indicates that no randomized networks have community structure with a modularity score that is higher than the one obtained from the original, observed network. Hence a '0' result means that our network has significant community structure; any non-zero results means that the detected spinglass clusters are not statistically significant. -- Our testing strategy returned a result of **0**. --- #
Inference 1: Clusters What do you think this code will do? ```r cluster_membership <- clusters$membership %>% as.character() graph2_clustered <- graph2_connected %>% igraph::set_vertex_attr(name = 'popularity', value = degree(graph2_connected, mode = 'in')) %>% igraph::set_vertex_attr(name = 'grp', value = cluster_membership) %>% set_edge_attr(name='grp_weight', value=ifelse(igraph::crossing(clusters, graph2_connected), 1, 15)) ``` --- #
Inference 1: Clusters ```r sociogram2_clustered <- graph2_clustered %>% ggraph(layout = 'fr') + geom_edge_arc(alpha = .1, width = .5, strength = .5, color = 'steelblue' ) + geom_node_point(aes(size = popularity, fill = grp), alpha = .5, color = 'black', shape = 21 ) + scale_fill_brewer(palette = 'Set1', guide = 'none') + scale_size(range = c(1,15), guide = 'none') + theme_wsj() + theme(axis.line=element_blank(), axis.text.x=element_blank(), axis.text.y=element_blank(), axis.ticks.x =element_blank(), axis.ticks.y =element_blank(), axis.title.x=element_blank(), axis.title.y=element_blank(), panel.background=element_blank(), panel.border=element_blank(), panel.grid.major=element_blank(), panel.grid.minor=element_blank()) ``` --- #
Inference 1: Clusters <img src="img/sociogram2-clustered.png" width="100%" style="display: block; margin: auto;" /> --- class: inverse, center, middle #
<br><br> Inference 2: <br> **Influence** and **Selection** --- #
Inference 2 ### **Influence** and **Selection** <img src="img/dsieur-cover-routledge.jpg" width="280px" style="display: block; margin: auto;" /> --- #
Inference 2 ### **Influence** and **Selection** Unfortunately, we don't have time to get into the important, but quite advanced, topics of SNA inference of **influence** and **selection**. I hope you'll keep exploring these areas, and I highly recommend three sources of information, in increasing order of difficulty: -- 1. [Chapter 20.3 Appendix C](https://datascienceineducation.com/c20.html#c20c) - "Social Network Influence and Selection Models" in the wonderful guide, [*Data Science in Education Using R*](https://datascienceineducation.com). -- 1. The article ["Idle chatter or compelling conversation? The potential of the social media-based #NGSSchat network for supporting science education reform efforts"](https://doi.org/10.1002/tea.21660) in *Journal of Research in Science written by my colleague [Josh Rosenberg](https://joshuamrosenberg.com/). -- 1. A trove of SNA resources on the website of [Ken Frank](https://sites.google.com/msu.edu/kenfrank/social-network-resources), Professor at Michigan State University. --- class: inverse, center, middle #
<br><br> Try it out! Hop over to [**Workspace 4**](workspace4.Rmd) --- class: inverse, center, middle #
<br><br> Quick Check In **(Five minutes in groups, five minutes together)** - What challenges did you encounter? - What successes did you have? - What questions remain? --- class: inverse, center, middle #
<br><br> Recap - **Part 1: Introduction to Networks** - Slides: [Part 1 - Networks](1-networks.html) - **Part 2: Introduction to R** - Slides: [Part 2 - R](2-intro-R.html) - **Part 3: Network Description** - Slides: [Part 3 - Description](3-description.html) - **Part 4: Network Inference** - Slides: [Part 4 - Inference](4-inference.html) --- class: inverse, center, middle #
<br><br> Appendix: <br> Helpful Resources <br> and Troubleshooting --- # Resources **Beginners:** - [RStudio Beginners' Guide](https://education.rstudio.com/learn/beginner/) - Book: [*Data Science in Education Using R*](https://datascienceineducation.com) - See [Chapter 12](https://datascienceineducation.com/c12.html) - Walkthrough 6: Exploring Relationships Using Social Network Analysis With Social Media Data - [Physical copy of DSIEUR](https://www.routledge.com/Data-Science-in-Education-Using-R/Estrellado-Freer-Mostipak-Rosenberg-Velasquez/p/book/9780367422257) - [Even more resources from DSIEUR](https://datascienceineducation.com/c18.html) **Intermediates:** - [RStudio Intermediates' Guide](https://education.rstudio.com/learn/intermediate/) - [{tidytags} package notes](https://docs.ropensci.org/tidytags/index.html) - Book: [*R for Data Science*](http://r4ds.had.co.nz/) **Experts:** - [RStudio Experts' Guide](https://education.rstudio.com/learn/expert/) - Book: [*Learning Statistics with R*](https://learningstatisticswithr.com/) - [*Data Science in Education Using R*](https://datascienceineducation.com) - See [Chapter 20.3 Appendix C](https://datascienceineducation.com/c20.html#c20c) - Social Network Influence and Selection Models - SNA resources: [Dr. Ken Frank's website](https://sites.google.com/msu.edu/kenfrank/social-network-resources) --- # Troubleshooting - Try to find out what the specific problem is - Identify what is *not* causing the problem - "Unplug and plug it back in" - restart R; close and reopen R - Seek out workshops and other learning opportunities - Reach out to others! Sharing what is causing an issue can often help to clarify the problem - [RStudio Community forum](https://community.rstudio.com/) (highly recommended!) - Twitter hashtag: [#RStats](https://twitter.com/search?q=%23RStats&src=typeahead_click&f=live) - [Contact Bret!](http://bretsw.com) - General strategies on learning more: [Chapter 17 of *Data Science in Education Using R*](https://datascienceineducation.com/c17.html) --- class: inverse, center, middle #
<br><br> *Next up* <br> Choose Your Own Adventure!