Use Color to Highlight Findings
This lesson is called Use Color to Highlight Findings, part of the Going Deeper with R course. This lesson is called Use Color to Highlight Findings, part of the Going Deeper with R course.
Transcript
Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.
Loading transcript...
View code shown in video
# Load Packages -----------------------------------------------------------
library(tidyverse)
library(fs)
# Create Directory --------------------------------------------------------
dir_create("data")
# Download Data -----------------------------------------------------------
download.file(
"https://github.com/rfortherestofus/going-deeper-positron/raw/main/data/third_grade_math_proficiency.rds",
mode = "wb",
destfile = "data/third_grade_math_proficiency.rds"
)
# Import Data -------------------------------------------------------------
third_grade_math_proficiency <-
read_rds("data/third_grade_math_proficiency.rds") |>
select(
academic_year,
school,
school_id,
district,
proficiency_level,
number_of_students
) |>
mutate(
is_proficient = case_when(
proficiency_level >= 3 ~ TRUE,
.default = FALSE
)
) |>
group_by(academic_year, school, district, school_id, is_proficient) |>
summarize(number_of_students = sum(number_of_students, na.rm = TRUE)) |>
ungroup() |>
group_by(academic_year, school, district, school_id) |>
mutate(
percent_proficient = number_of_students /
sum(number_of_students, na.rm = TRUE)
) |>
ungroup() |>
filter(is_proficient == TRUE) |>
select(academic_year, school, district, percent_proficient) |>
rename(year = academic_year)
# Plot --------------------------------------------------------------------
top_growth_school <-
third_grade_math_proficiency |>
filter(district == "Portland SD 1J") |>
group_by(school) |>
mutate(
growth_from_previous_year = percent_proficient - lag(percent_proficient)
) |>
ungroup() |>
slice_max(
order_by = growth_from_previous_year,
n = 1
) |>
pull(school)
third_grade_math_proficiency |>
filter(district == "Portland SD 1J") |>
mutate(
highlight_school = case_when(
school == top_growth_school ~ "Y",
.default = "N"
)
) |>
mutate(
school = fct_relevel(
school,
top_growth_school,
after = Inf
)
) |>
ggplot(
aes(
x = year,
y = percent_proficient,
color = highlight_school,
group = school
)
) +
geom_line() +
scale_color_manual(
values = c(
"Y" = "orange",
"N" = "gray80"
)
)
Your Turn
Highlight the district in your line chart that had the largest increase in its Hispanic/Latino population between 2021-2022 and 2022-2023.
Learn More
The Datawrapper blog has an amazing blog post by Lisa Charlotte Muth on using color effectively in data viz.
The issue that I ran into where the lines weren't visible because there were too many is called overplotting. Claus Wilke discusses overplotting in Chapter 18 of his book Fundamentals of Data Visualization (the chapter is about overplotting with points, but the concepts are the same).
Have any questions? Put them below and we will help you out!
Course Content
44 Lessons
You need to be signed-in to comment on this post. Login.