Skip to content

R in 3 Months (Spring 2025)

Advanced Summarizing

This lesson is called Advanced Summarizing, part of the R in 3 Months (Spring 2025) course. This lesson is called Advanced Summarizing, part of the R in 3 Months (Spring 2025) course.

Transcript

Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.

View code shown in video

# Load Packages -----------------------------------------------------------

library(tidyverse)
library(fs)
library(readxl)
library(janitor)

# Create Directories ------------------------------------------------------

dir_create("data-raw")

# Download Data -----------------------------------------------------------

# https://www.oregon.gov/ode/educator-resources/assessment/Pages/Assessment-Group-Reports.aspx

# download.file("https://www.oregon.gov/ode/educator-resources/assessment/Documents/TestResults2122/pagr_schools_math_tot_raceethnicity_2122.xlsx",
#               mode = "wb",
#               destfile = "data-raw/pagr_schools_math_tot_raceethnicity_2122.xlsx")
# 
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/Documents/TestResults2122/TestResults2019/pagr_schools_math_tot_raceethnicity_1819.xlsx",
#               mode = "wb",
#               destfile = "data-raw/pagr_schools_math_tot_raceethnicity_1819.xlsx")
# 
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/TestResults2018/pagr_schools_math_raceethnicity_1718.xlsx",
#               mode = "wb",
#               destfile = "data-raw/pagr_schools_math_raceethnicity_1718.xlsx")
# 
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/TestResults2017/pagr_schools_math_raceethnicity_1617.xlsx",
#               mode = "wb",
#               destfile = "data-raw/pagr_schools_math_raceethnicity_1617.xlsx")
# 
# download.file("https://www.oregon.gov/ode/educator-resources/assessment/TestResults2016/pagr_schools_math_raceethnicity_1516.xlsx",
#               mode = "wb",
#               destfile = "data-raw/pagr_schools_math_raceethnicity_1516.xlsx")


# Import Data -------------------------------------------------------------

math_scores_2021_2022 <-
  read_excel(path = "data-raw/pagr_schools_math_tot_raceethnicity_2122.xlsx") |> 
  clean_names()


# Tidy and Clean Data -----------------------------------------------------

third_grade_math_proficiency_2021_2022 <-
  math_scores_2021_2022 |> 
  filter(student_group == "Total Population (All Students)") |> 
  filter(grade_level == "Grade 3") |> 
  select(academic_year, school_id, contains("number_level")) |> 
  pivot_longer(cols = starts_with("number_level"),
               names_to = "proficiency_level",
               values_to = "number_of_students") |> 
  mutate(proficiency_level = case_when(
    proficiency_level == "number_level_4" ~ "4",
    proficiency_level == "number_level_3" ~ "3",
    proficiency_level == "number_level_2" ~ "2",
    proficiency_level == "number_level_1" ~ "1"
  )) |> 
  mutate(number_of_students = parse_number(number_of_students))

third_grade_math_proficiency_2021_2022 |> 
  group_by(school_id) |> 
  mutate(pct = number_of_students / sum(number_of_students, na.rm = TRUE)) |> 
  ungroup()

Your Turn

Solution

# Load Packages -----------------------------------------------------------

library(tidyverse)
library(fs)
library(readxl)
library(janitor)

# Create Directories ------------------------------------------------------

dir_create("data-raw")

# Download Data -----------------------------------------------------------

# https://www.oregon.gov/ode/reports-and-data/students/Pages/Student-Enrollment-Reports.aspx

# download.file("https://www.oregon.gov/ode/reports-and-data/students/Documents/fallmembershipreport_20222023.xlsx",
#               mode = "wb",
#               destfile = "data-raw/fallmembershipreport_20222023.xlsx")
# 
# download.file("https://www.oregon.gov/ode/reports-and-data/students/Documents/fallmembershipreport_20212022.xlsx",
#               mode = "wb",
#               destfile = "data-raw/fallmembershipreport_20212022.xlsx")
# 
# download.file("https://www.oregon.gov/ode/reports-and-data/students/Documents/fallmembershipreport_20202021.xlsx",
#               mode = "wb",
#               destfile = "data-raw/fallmembershipreport_20202021.xlsx")
# 
# download.file("https://www.oregon.gov/ode/reports-and-data/students/Documents/fallmembershipreport_20192020.xlsx",
#               mode = "wb",
#               destfile = "data-raw/fallmembershipreport_20192020.xlsx")
# 
# download.file("https://www.oregon.gov/ode/reports-and-data/students/Documents/fallmembershipreport_20182019.xlsx",
#               mode = "wb",
#               destfile = "data-raw/fallmembershipreport_20182019.xlsx")

# Import Data -------------------------------------------------------------

enrollment_2022_2023 <- read_excel(path = "data-raw/fallmembershipreport_20222023.xlsx",
                                   sheet = "School 2022-23") |> 
  clean_names()

# Tidy and Clean Data -----------------------------------------------------

enrollment_by_race_ethnicity_2022_2023 <-
  enrollment_2022_2023 |> 
  select(district_institution_id, x2022_23_american_indian_alaska_native:x2022_23_percent_multi_racial) |>  
  select(-contains("percent")) |> 
  pivot_longer(cols = -district_institution_id,
               names_to = "race_ethnicity",
               values_to = "number_of_students") |> 
  mutate(race_ethnicity = str_remove(race_ethnicity, pattern = "x2022_23_")) |> 
  mutate(race_ethnicity = case_when(
    race_ethnicity == "american_indian_alaska_native" ~ "American Indian Alaska Native",
    race_ethnicity == "asian" ~ "Asian",
    race_ethnicity == "black_african_american" ~ "Black/African American",
    race_ethnicity == "hispanic_latino" ~ "Hispanic/Latino",
    race_ethnicity == "multi_racial" ~ "Multiracial",
    race_ethnicity == "native_hawaiian_pacific_islander" ~ "Native Hawaiian Pacific Islander",
    race_ethnicity == "white" ~ "White"
  )) |> 
  mutate(number_of_students = parse_number(number_of_students))

enrollment_by_race_ethnicity_2022_2023 |> 
  group_by(district_institution_id, race_ethnicity) |>
  summarize(number_of_students = sum(number_of_students, na.rm = TRUE)) |> 
  ungroup() |> 
  group_by(district_institution_id) |>
  mutate(pct = number_of_students / sum(number_of_students, na.rm = TRUE)) |> 
  ungroup()

Create a new variable called pct that shows each race/ethnicity as a percentage of all students in each district. This will require two steps.
You'll need to use group_by() and summarize() to calculate the number of students in each race/ethnicity group in each district.
You’ll need to use group_by() and mutate() to calculate the percentage of students in each race/ethnicity group in each district.

Don’t forget to ungroup() at the end of each step. Use this code to get started:

# Load Packages -----------------------------------------------------------

library(tidyverse)
library(fs)
library(readxl)
library(janitor)

# Create Directories ------------------------------------------------------

dir_create("data-raw")

# Download Data -----------------------------------------------------------

# https://www.oregon.gov/ode/reports-and-data/students/Pages/Student-Enrollment-Reports.aspx

# download.file("https://www.oregon.gov/ode/reports-and-data/students/Documents/fallmembershipreport_20222023.xlsx",
#               mode = "wb",
#               destfile = "data-raw/fallmembershipreport_20222023.xlsx")
# 
# download.file("https://www.oregon.gov/ode/reports-and-data/students/Documents/fallmembershipreport_20212022.xlsx",
#               mode = "wb",
#               destfile = "data-raw/fallmembershipreport_20212022.xlsx")
# 
# download.file("https://www.oregon.gov/ode/reports-and-data/students/Documents/fallmembershipreport_20202021.xlsx",
#               mode = "wb",
#               destfile = "data-raw/fallmembershipreport_20202021.xlsx")
# 
# download.file("https://www.oregon.gov/ode/reports-and-data/students/Documents/fallmembershipreport_20192020.xlsx",
#               mode = "wb",
#               destfile = "data-raw/fallmembershipreport_20192020.xlsx")
# 
# download.file("https://www.oregon.gov/ode/reports-and-data/students/Documents/fallmembershipreport_20182019.xlsx",
#               mode = "wb",
#               destfile = "data-raw/fallmembershipreport_20182019.xlsx")

# Import Data -------------------------------------------------------------

enrollment_2022_2023 <- read_excel(path = "data-raw/fallmembershipreport_20222023.xlsx",
                                   sheet = "School 2022-23") |> 
  clean_names()

# Tidy and Clean Data -----------------------------------------------------

enrollment_by_race_ethnicity_2022_2023 <-
  enrollment_2022_2023 |> 
  select(district_institution_id, school_institution_id,
         x2022_23_american_indian_alaska_native:x2022_23_multi_racial) |> 
  select(-contains("percent")) |> 
  pivot_longer(cols = -c(district_institution_id, school_institution_id),
               names_to = "race_ethnicity",
               values_to = "number_of_students") |> 
  mutate(race_ethnicity = str_remove(race_ethnicity, pattern = "x2022_23_")) |> 
  mutate(race_ethnicity = case_when(
    race_ethnicity == "american_indian_alaska_native" ~ "American Indian Alaska Native",
    race_ethnicity == "asian" ~ "Asian",
    race_ethnicity == "black_african_american" ~ "Black/African American",
    race_ethnicity == "hispanic_latino" ~ "Hispanic/Latino",
    race_ethnicity == "multiracial" ~ "Multi-Racial",
    race_ethnicity == "native_hawaiian_pacific_islander" ~ "Native Hawaiian Pacific Islander",
    race_ethnicity == "white" ~ "White",
    race_ethnicity == "multi_racial" ~ "Multiracial"
  )) |> 
  mutate(number_of_students = parse_number(number_of_students))

Learn More

Daniel Carter has a nice walkthrough of using group_by() and mutate().

If you forget to ungroup() every once in a while, you’re joining an illustrious group.

ugh foiled by a missing ungroup() once again #rstats
— Andrew Heiss (🐘 @andrew@fediscience.org) (@andrewheiss) November 25, 2019

When in doubt, try ungroup() #rstats
— Ben Casselman (@bencasselman) October 4, 2019

To my #rstats friends: Practice safe stats. Remember to dplyr::ungroup() after you're done with your within-group operations. pic.twitter.com/r4JblvgSjd
— Hlynur Hallgríms (@hlynur) July 19, 2018

Need a cheery reminder to use ungroup()? Here you go!

Don't forget to bring dplyr::ungroup() to the party 🎁🥳 #rstats

Thanks to @apreshill for inspiring this one! pic.twitter.com/gsf66KXJ2d
— Allison Horst (@allison_horst) November 21, 2019

Have any questions? Put them below and we will help you out!

You need to be signed-in to comment on this post. Login.

Course Content

127 Lessons

Welcome to R in 3 Months (Spring 2025)

Complete Pre-Survey

How to Organize Your Files in R in 3 Months

Welcome to Getting Started with R

Install RStudio

Objects and Functions

Examine our Data

Import Our Data Again

Week 1 Live Session (Spring 2025)

Welcome to Fundamentals of R

Update Everything

Start a New Project

group_by() and summarize()

Create a New Data Frame

Bring it All Together (Data Wrangling)

Week 2 Project Assignment

Week 2 Coworking Session (Spring 2025)

Week 2 Live Session (Spring 2025)

The Grammar of Graphics

Setting color and fill Aesthetic Properties

Setting color and fill Scales

Setting x and y Scales

Adding Text to Plots

Bring it All Together (Data Visualization)

Week 3 Project Assignment

Week 3 Coworking Session (Spring 2025)

Week 3 Live Session (Spring 2025)

Quarto Overview

Tips for Working with Quarto

Bring It All Together (Quarto)

An Important Workflow Tip

Week 4 Project Assignment

Week 4 Coworking Session (Spring 2025)

Week 4 Live Session (Spring 2025)

Week 5 Coworking Session (Spring 2025)

Downloading and Importing Data

Overview of Tidy Data

Tidy Data Rule #1: Every Column is a Variable

Tidy Data Rule #3: Every Cell is a Single Value

Tidy Data Rule #2: Every Row is an Observation

Week 6 Coworking Session (Spring 2025)

Week 6 Live Session (Spring 2025)

Changing Variable Types

Dealing with Missing Data

Advanced Summarizing

Binding Data Frames

Week 7 Coworking Session (Spring 2025)

Week 7 Live Session (Spring 2025)

Bring It All Together (Advanced Data Wrangling)

Week 8 Project Assignment

Week 8 Coworking Session (Spring 2025)

Week 8 Live Session (Spring 2025)

Best Practices in Data Visualization

Pipe Data into ggplot

Reorder Plots to Highlight Findings

Use Color to Highlight Findings

Add Descriptive Labels to Your Plots

Use Titles to Highlight Findings

Use Annotations to Explain

Week 9 Coworking Session (Spring 2025)

Week 9 Live Session (Spring 2025)

Create a Custom Theme

Customize Your Fonts

Try New Plot Types

Bring it All Together (Advanced Data Visualization)

Week 11 Project Assignment

Week 11 Coworking Session (Spring 2025)

Week 11 Live Session (Spring 2025)

Advanced Markdown

Advanced YAML and Code Chunk Options

Making Your Reports Shine: Word Edition

Making Your Reports Shine: PDF Edition

Making Your Reports Shine: HTML Edition

Publishing Your Work

Quarto Extensions

Parameterized Reporting, Part 1

Parameterized Reporting, Part 2

Parameterized Reporting, Part 3

Week 12 Coworking Session (Spring 2025)

Week 12 Live Session (Spring 2025)

R in 3 Months Progress Survey

R in 3 Months Feedback Survey

R in 3 Months Final Project

Week 13 Coworking Session (Spring 2025)

Week 13 Live Session (Spring 2025)

All Videos from R in 3 Months (Spring 2025)

Working with labelled data

Understanding Documentation Pages

Using Function Arguments

Difference between == and %in%