Online Data and Open Source Tools

# Online Data and Open Source Tools
## Analyzing Educational Internet Data Using R
### Bret Staudt Willet, Spencer Greenhalgh, & Joshua Rosenberg
### October 23, 2019

---

**Access our slide deck here:**  
https://bretsw.github.io/aect19-workshop/

**Chat with Josh here:**
https://tennessee.zoom.us/my/jmrosenberg

**Follow us on Twitter:**  
[@bretsw](https://twitter.com/bretsw)  
[@spgreenhalgh](https://twitter.com/spgreenhalgh)  
[@jrosenberg6432](https://twitter.com/jrosenberg6432)

---

# Who We Are

* **Bret Staudt Willet**
  * PhD Candidate, Ed Psych & Ed Tech, Michigan State University
  * *Primary area of interest:* social media and K-20 educators' professional learning
* **Dr. Spencer Greenhalgh**
  * Assistant Professor, Information Communication Technology, University of Kentucky
  * *Primary area of interest:* implications of digital contexts for teaching, learning, and other meaningful practices
* **Dr. Joshua Rosenberg**
  * Assistant Professor, STEM Education, University of Tennessee, Knoxville
  * *Primary area of interest:* data science in education + data science education

---

# How We Teach

---

# Agenda

---

# Morning Agenda

1. Instructor introductions
1. Pre-survey
1. Attendee introductions
1. Why learn R?
1. Download the repository
1. Getting started with R Studio
1. Demo Doc Part 1: Processing data
1. Loading data
1. Demo Doc Part 2: Visualizing data
1. Demo Doc Part 3: Modeling data 
1. Questions
1. Lunch break

---

# Afternoon Agenda

1. Considerations for using social media data in LD&T research
  1. Conducting ethical research
  1. Framing the research
  1. Organizing the research process
  1. Collecting data
  1. Analyzing data
  1. Disseminating research
  1. Learning and doing more with R
1. Self-guided exploration (Option A: Text analysis; Option B: Social network analysis)
1. Questions
1. Post-survey + (optional) upload your .Rmd file for the day

---

# Pre-survey

http://bit.ly/r-at-aect-pre

---

# Why learn R?

---

# Why learn R?

* It is capable of carrying out basic and complex statistical analyses
* It is able to work with data small (*n* = 30!) and large (*n* = 100,000+) efficiently
* It is a programming language and so is quite flexible
* There is a great, inclusive community of users and developers (and teachers)
* It is cross-platform, open-source, and freely-available

---

# Download the repository

---

# Download the repository

You can access all of the workshop materials at: https://github.com/bretsw/aect19-workshop/

---

# Download the repository

---

# Getting started with R Studio

---

# Installations

To download R:
- Visit this page to download R: https://cran.r-project.org/
- Find your operating system (Mac, Windows, or Linux)
- Download the 'latest release' on the page for your operating system and download and install the application

To download R Studio:
- Visit this page to download R studio: https://www.rstudio.com/products/rstudio/download/
- Find your operating system (Mac, Windows, or Linux)
- Download the 'latest release' on the page for your operating system and download and install the application

## Check that it worked

Open R Studio. Find the console window and type in `2 + 2`. If what you can guess is returned (hint: it's what you expect!), then R Studio *and* R both work.

---

# Working With Projects

Before proceeding, we're going to take a few steps to set ourselves to make the analysis easier; namely, through the use of Projects, an R Studio-specific organizational tool.

We already have a project set up. Navigate to the downloaded repo files and open `aect-workshop-2019.Rproj`

(To create a project in R Studio, you can navigate to "File" and then "New Directory". Then, click "New Project".)

Names should be short and easy-to-remember but also informative.

---

# Looking at RStudio

---

# Packages

"Packages" are shareable collections of R code that provide functions (i.e., a command to perform a specific task), data, and documentation. Packages increase the functionality of R by improving and expanding on base R (basic R functions).

### Installing and Loading Packages

To download a package, you must call `install.packages()`:

```r
install.packages("dplyr")
```

You can also navigate to the Packages pane, and then click "Install", which will work the same as the line of code above. This is a way to install a package using code or part of the R Studio interface.

Usually, writing code is a bit quicker, but using the interface can be very useful and complementary to use of code.

---

# Using packages

*After* the package is installed, it must be loaded into your R Studio session using `library()`:

```r
library(dplyr)
```

We only have to install a package once, but to use it, we have to load it each time we start a new R session.

> A package is a like a book, a library is like a library; you use library() to check a package out of the library
> (Hadley Wickham, Chief Scientist, R Studio)

---

# Configuring R Studio

By default, R Studio saves all of the objects in your environment. In general, this is not ideal, because it means that you may have taken steps interactively that are not documented your code.

---

# The tidyverse

The tidyverse is a set of packages for data manipulation, exploration, and visualization using the design philosophy of 'tidy' data. Tidy data has a specific structure: each variable is a column, each observation is a row, and each type of observational unit is a table.

The packages contained in the Tidyverse provide useful functions that augment base R functionality.

You can instal and load the complete tidyverse with:

```r
install.packages("tidyverse")
```

```r
library(tidyverse)
```

---

# Demo doc (part 0): Try it out!

* Note. Check out [R Studio Cloud](https://rstudio.cloud/) if you're having any installation trouble!*

*You can find the demo doc in RStudio or in your downloaded repository files—it is titled "demo-doc.Rmd"*

*The doc is also [here](https://github.com/bretsw/aect19-workshop/blob/master/demo-doc.Rmd) in the repository*

---

# Looking at Demo Doc

---

# Demo doc (part 1): Try it out!

* Note. Check out [R Studio Cloud](https://rstudio.cloud/) if you're having any installation trouble!*

*You can find the demo doc in RStudio or in your downloaded repository files—it is titled `demo-doc.Rmd`*

*The doc is also [here](https://github.com/bretsw/aect19-workshop/blob/master/demo-doc.Rmd) in the repository*

---

# Processing Data

---

# The pipe: %>%

First, let's load the data we downloaded in the last step.

```r
student_responses <- read_csv("data/student-responses-data.csv")
```

We're also going to introduce a powerful, unusual *operator* in R, the pipe. The pipe is this symbol: `%>%`. It lets you *compose* functions. It does this by passing the output of one function to the next.

Select reduces the number of *columns* of a dataset.

```r
student_mot_vars <- # save object student_mot_vars by...
  student_responses %>% # using dataframe student_responses
  select(SCIEEFF, JOYSCIE, INTBRSCI, EPIST, INSTSCIE) # and selecting only these five variables

student_mot_vars
```

Also, check out the helper functions: `contains()`, `starts_with()`, and `ends_with()`, i.e.:

```r
student_responses %>% 
  select(contains("sci"))
```

---

# Saving the results

Note that we saved the output from the `select()` function to `student_mot_vars` but we could also save it back to `student_responses`, which would simply overwrite the original data frame (the following code is not run here):

```r
student_responses <- # save object student_responses by...
  student_responses %>% # using dataframe student_responses
  select(student_responses, SCIEEFF, JOYSCIE, INTBRSCI, EPIST, INSTSCIE) # and selecting only these five variables
```

---

# Renaming

We can also rename the variables at the same time we select them. Let's save the results to `student_mot_vars`.

```r
student_mot_vars <- student_responses %>% 
  select(student_efficacy = SCIEEFF,
         student_joy = JOYSCIE, 
         student_broad_interest = INTBRSCI,
         student_epistemic_beliefs = EPIST,
         student_instrumental_motivation = INSTSCIE
  )

student_mot_vars %>% 
  head(3)
```

```
## # A tibble: 3 x 5
##   student_efficacy student_joy student_broad_i… student_epistem…
##              <dbl>       <dbl>            <dbl>            <dbl>
## 1            3.28       2.16              1.73             2.16 
## 2           -0.668      0.509             0.491           -0.501
## 3           -1.62       0.0992           -0.638           -0.193
## # … with 1 more variable: student_instrumental_motivation <dbl>
```

---

# Creating a new variable

`mutate()` takes instructions to create a new variable.

What goes *before* the equals sign is the name of the new variable.

What goes after are the instructions.

```r
student_responses %>% 
  select(student_efficacy = SCIEEFF, 
         student_joy = JOYSCIE, 
         student_broad_interest = INTBRSCI, 
         student_epistemic_beliefs = EPIST,
         student_instrumental_motivation = INSTSCIE) %>% 
  mutate(student_joy_interest = (student_joy + student_broad_interest) / 2) %>% 
  head(3)
```

```
## # A tibble: 3 x 6
##   student_efficacy student_joy student_broad_i… student_epistem…
##              <dbl>       <dbl>            <dbl>            <dbl>
## 1            3.28       2.16              1.73             2.16 
## 2           -0.668      0.509             0.491           -0.501
## 3           -1.62       0.0992           -0.638           -0.193
## # … with 2 more variables: student_instrumental_motivation <dbl>,
## #   student_joy_interest <dbl>
```

---

# Filtering the data set

Filter takes *logical statements* (statements that can evalute to true or false) to select a number of *rows* from a dataset.

```
## # A tibble: 3 x 6
##   student_efficacy student_joy student_broad_i… student_epistem…
##              <dbl>       <dbl>            <dbl>            <dbl>
## 1           3.28         2.16             1.73             2.16 
## 2          -0.668        0.509            0.491           -0.501
## 3          -0.0378       1.67             0.332            2.16 
## # … with 2 more variables: student_instrumental_motivation <dbl>,
## #   student_joy_interest <dbl>
```

---

# Grouping and summarizing

It is often useful to aggregate a data set.

The `group_by()` and `summarize()` functions are especially helpful for this purpose.

Let's find the mean and standard deviation (and counts) of student broad interest by teacher.

```r
student_responses %>% 
  select(student_broad_interest = INTBRSCI, teacher_id = SCHID) %>% 
  group_by(teacher_id) %>% 
  summarize(student_broad_interest_mean = mean(student_broad_interest, na.rm = TRUE),
            student_broad_interest_sd = sd(student_broad_interest, na.rm = TRUE),
            n = n()) %>% 
  head(3)
```

```
## # A tibble: 3 x 4
##   teacher_id student_broad_interest_mean student_broad_interest_sd     n
##   <chr>                            <dbl>                     <dbl> <int>
## 1 00001                          -0.157                      0.846    36
## 2 00003                          -0.136                      1.08     31
## 3 00004                          -0.0430                     1.26     39
```

---

# Arranging

Finally, let's arrange the table by the number of students in each class.

`arrange()` sorts in order from the smallest to largest values.

`desc()` tells `arrange()` to sort in descending order.

```
## # A tibble: 3 x 4
##   teacher_id student_broad_interest_mean student_broad_interest_sd     n
##   <chr>                            <dbl>                     <dbl> <int>
## 1 00107                           0.204                      0.786    42
## 2 00117                           0.204                      0.893    42
## 3 00131                           0.0349                     0.870    41
```

---

# Counting

Sometimes, we simply want to count the number of observations associated with each group.

```r
student_responses %>% 
  select(teacher_id = SCHID) %>% 
  count(teacher_id)
```

We could then use this as the basis of *other* summary statistics.

```r
student_responses %>% 
  select(teacher_id = SCHID) %>% 
  count(teacher_id) %>% 
  summarize(n_mean = mean(n),
            n_sd = sd(n))
```

```
## # A tibble: 1 x 2
##   n_mean  n_sd
##    <dbl> <dbl>
## 1   32.3  8.61
```

On average, there are around 32 (*SD* = 8.61) students in each teachers' class.

---

# Loading data

---

# Loading a CSV File

Okay, we're ready to go. The easiest way to read a CSV file is with the function `read_csv()` from the package `readr`, which is contained within the Tidyverse.

Again, we'll load (or have loaded, already) the tidyverse library.

```r
library(tidyverse) # so tidyverse packages can be used for analysis
```

Just as a recap of a line that we ran earlier.

```r
student_responses <- read_csv("data/student-responses-data.csv")
```

Since we loaded the data, we now want to look at it. We can type its name in the function `glimpse()` to print some information on the dataset (this code is not run here).

```r
glimpse(student_responses)
```

You can also print the name of the object.

---

# Loading Excel and SAV files

## Loading Excel files

```r
install.packages("readxl")
```

```r
library(readxl)
```

```r
my_data <-
  read_excel("path/to/file.sav")
```

## Loading SAV files

```r
install.packages("haven")
```

```r
library(haven)
my_data <-
  read_sav("path/to/file.xlsx")
```

---

# Saving (Writing) Files

Using our data frame `student_responses`, we can save it as a CSV (for example) with the following function. The first argument, `student_reponses`, is the name of the object that you want to save. The second argument, `student-responses.csv`, what you want to call the saved dataset.

```r
write_csv(student_responses, "student-responses.csv")
```

That will save a CSV file entitled `student-responses.csv` in the working directory. If you want to save it to another directory, simply add the file path to the file, i.e. `path/to/student-responses.csv`. To save a file for SPSS, load the haven package and use `write_sav()`. There is not a function to save an Excel file, but you can save as a CSV and directly load it in Excel.

---

# Demo doc (part 2): Try it out!

* Note. Check out [R Studio Cloud](https://rstudio.cloud/) if you're having any installation trouble!*

*You can find the demo doc in RStudio or in your downloaded repository files—it is titled `demo-doc.Rmd`*

*The doc is also [here](https://github.com/bretsw/aect19-workshop/blob/master/demo-doc.Rmd) in the repository*

---

# Visualizing data

---

# The grammar of graphics

*ggplot2* is a tidyverse package for creating visualizations (or figures).

Let's work with a smaller data set, for now.

```r
student_mot_vars <- # save object student_mot_vars by...
  student_responses %>% # using dataframe student_responses
  select(student_efficacy = SCIEEFF, # selecting variable SCIEEFF and renaming to student_efficiency
         student_joy = JOYSCIE, # selecting variable JOYSCIE and renaming to student_joy
         student_broad_interest = INTBRSCI, # selecting variable INTBRSCI and renaming to student_broad_interest
         student_epistemic_beliefs = EPIST, # selecting variable EPIST and renaming to student_epistemic_beliefs
         student_instrumental_motivation = INSTSCIE,  # selecting variable INSTSCIE and renaming to student_instrumental_motivation
         teacher_id = SCHID
  )
```

---

# Scatter plot

Notice the three parts of all `ggplot2` graphs: a) the data (`student_mot_vars`), b) an `aes()`thetic mapping, and c) a `geom`:

---

# Scatter plot with a regression line

```r
ggplot(student_mot_vars,
       aes(x = student_efficacy, y = student_broad_interest)) +
  geom_point() +
  geom_smooth(method = "lm")
```

---

# Cleaning up

```r
ggplot(student_mot_vars,
       aes(x = student_efficacy, y = student_broad_interest)) +
  geom_point() +
  geom_smooth(method = "lm") +
  theme_bw() +
  xlab("Student Efficacy") +
  ylab("Student Broad Interest")
```

---

# Demo doc (part 3): Try it out!

* Note. Check out [R Studio Cloud](https://rstudio.cloud/) if you're having any installation trouble!*

*You can find the demo doc in RStudio or in your downloaded repository files—it is titled `demo-doc.Rmd`*

*The doc is also [here](https://github.com/bretsw/aect19-workshop/blob/master/demo-doc.Rmd) in the repository*

---

# Modeling data

---

# Linear models

The `lm()` function is a very helpful general purpose function for linear models. We'll use the `student_mot_vars` data frame we created earlier.

```r
lm(student_epistemic_beliefs ~ student_broad_interest, data = student_mot_vars)
```

```
## 
## Call:
## lm(formula = student_epistemic_beliefs ~ student_broad_interest, 
##     data = student_mot_vars)
## 
## Coefficients:
##            (Intercept)  student_broad_interest  
##                 0.2407                  0.2401
```

---

# Linear models

Let's save the results back to an object, `m1`.

```r
m1 <- lm(student_epistemic_beliefs ~ student_broad_interest, data = student_mot_vars)
```

---

# Linear models

We can then run a summary on the output.

```r
summary(m1)
```

```
## 
## Call:
## lm(formula = student_epistemic_beliefs ~ student_broad_interest, 
##     data = student_mot_vars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.6197 -0.5614 -0.1755  0.5917  2.5144 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             0.24066    0.01390   17.31   <2e-16 ***
## student_broad_interest  0.24015    0.01423   16.88   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.013 on 5323 degrees of freedom
##   (387 observations deleted due to missingness)
## Multiple R-squared:  0.0508,	Adjusted R-squared:  0.05062 
## F-statistic: 284.9 on 1 and 5323 DF,  p-value: < 2.2e-16
```

---

# Linear models

You can then build up a more complex model, i.e.:

```r
m2 <- lm(student_epistemic_beliefs ~ student_broad_interest + student_joy + student_broad_interest:student_joy, data = student_mot_vars)
```

You can then run `summary()` to view the results.

---

# Multi-level models

```r
library(lme4)
```

```
## Warning: package 'lme4' was built under R version 3.5.2
```

```
## Loading required package: Matrix
```

```
## 
## Attaching package: 'Matrix'
```

```
## The following object is masked from 'package:tidyr':
## 
##     expand
```

```r
m2_mlm <- lmer(student_epistemic_beliefs ~ student_broad_interest + student_joy + student_broad_interest:student_joy + (1|teacher_id), data = student_mot_vars)
m2_mlm
```

```
## Linear mixed model fit by REML ['lmerMod']
## Formula: 
## student_epistemic_beliefs ~ student_broad_interest + student_joy +  
##     student_broad_interest:student_joy + (1 | teacher_id)
##    Data: student_mot_vars
## REML criterion at convergence: 14853.02
## Random effects:
##  Groups     Name        Std.Dev.
##  teacher_id (Intercept) 0.1839  
##  Residual               0.9648  
## Number of obs: 5315, groups:  teacher_id, 176
## Fixed Effects:
##                        (Intercept)              student_broad_interest  
##                            0.16608                             0.07491  
##                        student_joy  student_broad_interest:student_joy  
##                            0.27409                             0.01991
```

---

# Afternoon

---

# Afternoon Agenda

---

# Considerations for Using Social Media Data in Learning Design & Technology Research

---

# Chapter overview

Our purpose in this chapter is to suggest considerations that LDT professionals should make as they use social media data in their research.

---

# Important notes

* Our purpose is *not* to dismiss traditional research methods

* There is considerable room for debate and disagreement around the proper way to conduct social media research

* Our focus is not providing answers but raising questions

* The issues and processes we raise are not linear; this is not a simple recipe to follow

* We avoid the traditional distinction between *quantitative* and *qualitative* research, and instead suggest various other distinctions as we go

---

# What's your project?

---

# Conducting ethical research

---

# Conducting ethical research

* Often, IRBs do not flag social media work as human subjects research; however, we would suggest that it is a human phenomenon

* Not a checklist, but a thorough consideration of research context and ethical principles is needed

* Public vs. private

* Harms and benefits

* Vulnerability

* Anonymity

* Consent

* Legal considerations

---

# What ethical considerations should you examine in your project?

---

# Framing the research

---

# Framing the research

* Paradigms and assumptions

* Research design, methods, and modes of inquiry

* Conceptual frameworks

* Community, network, or space?

* Phenomena and units of analysis

---

# How will you frame your project?

---

# Organizing the research process

---

# Organizing the research process

* Software

* Storing data

* Workflows

* Documentation

---

# What do you need to do to organize your project?

---

# Collecting data

---

# Collecting data

* Obtrusive or unobtrusive

* Unobtrusive: *digital traces*

* Process

* Application programming interfaces (APIs)
    * TAGS + rtweet
  * Web scraping
  * Accessing archives

* Quantity: big vs. small

---

# How will you collect data for your project?

---

# Analyzing data

---

# Analyzing data

* Spam

* Machine vs. human analysis

* Networks

---

# How will you analyze data for your project?

---

# Disseminating research

---

# Disseminating research

* Sharing data

* Sharing code

* TAGS + rtweet = rTAGS?

* Github (http://github.com)
  * Open Science Framework (http://osf.io)

* Publishing and publicizing research

---

# How will you disseminate your project?

---

# Conclusion

---

# Conclusion

* The ready availability of social media data provides many opportunities...

* ...but also considerable challenges for LDT researchers.

* So, get familiar with the broad considerations,

* and don't skip asking the important questions.

* Start and end with research ethics, and revisit throughout.

---

# Self-guided exploration

---

# How We Teach

---

# Introduction to text analysis

There are all kinds of things you might wonder about the text of a document.

For instance:

* How many words are in a document?

* Which words are most used most often in a text?

* How many times is a specific word used?

---

# Introduction to social network analysis

*A network is a set of objects, and network analysis is interested in the relationships between them.*

---

# Introduction to social network analysis

*So, what are the objects? How are they related?*

---

# Self-guided exploration

### Option A: Text analysis (Difficulty: Moderate)

*The doc is [here](https://github.com/bretsw/aect19-workshop/blob/master/explore-on-your-own-1.Rmd) in the repository*

*You can also find it in the `explore-on-your-own1.Rmd1` file*

### Option B: Social network analysis (Difficulty: Hard)

*The doc is [here](https://github.com/bretsw/aect19-workshop/blob/master/explore-on-your-own-2.Rmd) in the repository*

*You can also find it in the `explore-on-your-own2.Rmd1` file*
---

# Acknowledgments

Thank you to [Emily Bovee](https://github.com/emilybovee) for co-developing the workshop this workshop is adapted from: https://github.com/jrosen48/MTSU-workshop and https://github.com/jrosen48/MSU-workshop-2019

Parts of the content for this workshop are also adapted from:

- The [*Data Science in Education Using R* book](https://github.com/data-edu/data-science-in-education) by Emily A. Bovee, Ryan A. Estrellado, Jesse Mostipak, Joshua M. Rosenberg, and Isabella C. Velásquez to be published by Routledge in 2020 
- An American Educational Research Association 2019 annual meeting [workshop on *Reproducible and transparent research with R*](https://github.com/ResearchTransparency/rr_aera19) by [Daniel Anderson](https://github.com/datalorax]) and [Joshua Rosenberg](https://github.com/jrosen48/)

Finally, parts of the content for this workshop are inspired from content associated with the [Data Science Specialization for UO COE](https://github.com/uo-datasci-specialization) (led by [Daniel Anderson](https://github.com/datalorax]))

---

# Questions?

Bret Staudt Willet: [staudtwi@msu.edu](mailto:staudtwi@msu.edu)  
Spencer Greenhalgh: [spencer.greenhalgh@uky.edu](mailto:spencer.greenhalgh@uky.edu)  
Joshua Rosenberg: [jmrosenberg@utk.edu](mailto:jmrosenberg@utk.edu)

The GitHub repository for this workshop is:
https://github.com/bretsw/aect19-workshop

---

# Post-survey

http://bit.ly/r-at-eact-post

(optional: upload your .Rmd file of work for the day)

---

# Questions?

---

# Learning and doing more with R

---

# Resources

- [Doing data science with R](https://r4ds.had.co.nz/) by Wickham and
    Grolemund (2017)
  - [Big magic with R: Creating learning beyond
    fear](https://speakerdeck.com/apreshill/big-magic-with-r-creative-learning-beyond-fear)
    by Hill (2017)
  - [Data science in education](https://github.com/data-edu/data-science-in-education)
  - [R Studio Community](https://community.rstudio.com/)
  - [Data Science in Education Using R](https://github.com/data-edu/data-science-in-education)
  
---

# Courses

- [\#r4ds](https://medium.com/@kierisi/r4ds-the-next-iteration-d51e0a1b0b82)
    (see a talk at rstudio::conf()
    [here](https://resources.rstudio.com/rstudio-conf-2019/r4ds-online-learning-community-improvements-to-self-taught-data-science-and-the-critical-need-for-diversity-equity-and-inclusion-in-data-science-education)
    by Maegan (2019))
  - [Data science for social scientists](http://datascience.tntlab.org/) by
    Landers (2019)
  - [University of Oregon Data Science Specialization for the College of
    Education](https://github.com/uo-datasci-specialization) by Anderson (2019)

---

# Useful packages

With one exception, can install via `install.packages(<name-of-package)`, i.e., `install.packages("tidylog")`.

- tidyverse: tools for data manipulation and processing
- tidylog: companion to the tidyverse; gives you feedback
- haven: import and export files for other statistical software
- apaTables: create APA-style tables in MS Word format
- devtools: to download packages only available on GitHub
- psych: general psychological science-related analysis tools
- skimr: quickly calculate summary statistics for a data frame

---

# Useful packages

With one exception, can install via `install.packages(<name-of-package)`, i.e., `install.packages("tidylog")`.

- lavaan: structural equation modeling
- tipyLPA: latent profile analysis
- lme4: multi-level modeling (HLM)
- sjstats: convenient statistical functions
- poLCA: latent class analysis
- rmarkdown: reports (PDF, Word, HTML, etc.)
- papaja: APA-style paper generation (must install from [GitHub](https://github.com/crsh/papaja))
- blogdown create a website/blog
- xaringan: create presentations
- MplusAutomation: Run Mplus via R

---

# People (and a few hashtags) to follow (on Twitter)
- [Jesse Mostipak](https://twitter.com/kierisi)
- [Alison Hill](https://twitter.com/apreshill)
- [Hadley Wickham](https://twitter.com/hadleywickham)
- [Daniel Anderson](https://twitter.com/datalorax_)
- [Mara Averick](https://twitter.com/dataandme)
- [Thomas Mock](https://twitter.com/thomas_mock)
- [Angela Li](https://twitter.com/CivicAngela)
- [R4DS community](https://twitter.com/R4DScommunity)
- [#tidytuesday](https://twitter.com/hashtag/TidyTuesday?src=hash)
- [#rstats](https://twitter.com/hashtag/rstats?src=hash)
- [David Robinson](https://twitter.com/drob)
- [Julia Silge](https://twitter.com/juliasilge)
- [Gabriela de Quieroz](https://twitter.com/gdequeiroz)
- [Emily Bovee](https://twitter.com/ebovee09)
- [Joshua Rosenberg](https://twitter.com/jrosenberg6432)
- [Spencer Greenhalgh](https://twitter.com/spgreenhalgh)
- [Bret Staudt Willet](https://twitter.com/bretsw)

---

# Reference Material

The following slides include material that we cut for time or simplicity, but which may be helpful for you later.

---

# Computer science foundations

---

# Data structures

- Data Frame

```r
data.frame(var1 = c("a", "b", "c"),
           var2 = c(4, 2, 5))
```

```
##   var1 var2
## 1    a    4
## 2    b    2
## 3    c    5
```

- Tibble

```r
library(tidyverse)
tibble(var1 = c("a", "b", "c"),
       var2 = c(4, 2, 5))
```

```
## # A tibble: 3 x 2
##   var1   var2
##   <chr> <dbl>
## 1 a         4
## 2 b         2
## 3 c         5
```

---

# Data structures

```r
vec1 = c("a", "b", "c")
vec1 
```

```
## [1] "a" "b" "c"
```

```r
vec2 = c(4, 2, 5)
vec2
```

```
## [1] 4 2 5
```

```r
tibble(var1 = vec1,
       var2 = vec2)
```

```
## # A tibble: 3 x 2
##   var1   var2
##   <chr> <dbl>
## 1 a         4
## 2 b         2
## 3 c         5
```

---

# Conditional logic

- `ifelse()`

```r
vec1 <- runif(10) # generates random number between 0 and 1
d1 <- tibble(var1 = vec1)

d1 %>% 
  mutate(var2 = ifelse(var1 >= .5, "greater", "lesser"))
```

```
## # A tibble: 10 x 2
##      var1 var2   
##     <dbl> <chr>  
##  1 0.102  lesser 
##  2 0.657  greater
##  3 0.848  greater
##  4 0.944  greater
##  5 0.225  lesser 
##  6 0.0332 lesser 
##  7 0.392  lesser 
##  8 0.813  greater
##  9 0.732  greater
## 10 0.884  greater
```

---

# Conditional logic

- `dplyr::case_when()`

```r
d1 %>% 
  mutate(var2 = case_when(
    var1 >= .75 ~ "a lot",
    var1 >= .5 & var1 < .75 ~ "greater",
    var1 >= .25 & var1 < .5  ~ "lesser",
    var1 < .25 ~ "very little" 
  ))
```

```
## # A tibble: 10 x 2
##      var1 var2       
##     <dbl> <chr>      
##  1 0.102  very little
##  2 0.657  greater    
##  3 0.848  a lot      
##  4 0.944  a lot      
##  5 0.225  very little
##  6 0.0332 very little
##  7 0.392  lesser     
##  8 0.813  a lot      
##  9 0.732  greater    
## 10 0.884  a lot
```

---

# Objects, functions, directories, and data structures

Here's a short URL: `https://goo.gl/bUeMhV`

This is what it resolves to (it's a CSV file): `https://raw.githubusercontent.com/data-edu/data-science-in-education/master/data/pisaUSA15/stu-quest.csv`

This chunk of code will save that file to a data directory in our working environment.

```r
student_responses_url <-
  "https://goo.gl/bUeMhV"

student_responses_file_name <-
  paste0(getwd(), "/data/student-responses-data.csv")

download.file(
  url = student_responses_url,
  destfile = student_responses_file_name)
```

---

# Step 1

1. The *character string* `"https://goo.gl/wPmujv"` is being saved to an *object* called `student_responses_url`.

```r
student_responses_url <-
  "https://goo.gl/bUeMhV"
```

---

# Step 2

2. We concatenate the working directory file path to the desired file name for the CSV using a *function* called `str_c`. This is stored in another *object* called `student_reponses_file_name`. This creates a file name with a *file path* in your working directory and it saves the file in the folder that you are working in.

```r
student_responses_file_name <-
  str_c(getwd(), "./data/student-responses-data.csv")
```

---

# Step 3

3. The `student_responses_url` *object* is passed to the `url` argument of the *function* called `download.file()` along with `student_responses_file_name`, which is passed to the `destfile` argument.

In short, the `download.file()` function needs to know
- where the file is coming from (which you tell it through the `url`) argument and
- where the file will be saved (which you tell it through the `destfile` argument).

```r
download.file(
  url = student_responses_url,
  destfile = student_responses_file_name)
```

---

# Vignettes

Vignettes are long-form documentation (and tutorials) that can provide a helpful introduction to a package.

Run `vignette()` in order to view *all* of the vignettes available for a package:

```r
vignette(package = "dplyr")
```

Then, you can load a specific vignette.

```r
vignette("dplyr", package = "dplyr")
```

These are also available through CRAN (i.e., https://cran.r-project.org/web/packages/dplyr/index.html)

---

# Using a specific function

If you know the specific function that you want to look up, you can run this in the R Studio console:

```r
?dplyr::filter
```

Once you know what you want to do with the function, you can run it in your code:

```r
dat <- # example data frame
  tibble(letter = c("A", "A", "A", "B", "B"),
         number = c(1L, 2L, 3L, 4L, 5L))

filter(dat, letter == "A")
```

```
## # A tibble: 3 x 2
##   letter number
##   <chr>   <int>
## 1 A           1
## 2 A           2
## 3 A           3
```

---