Skip to content
R for the Rest of Us Logo

Fundamentals of R

filter()

Transcript

Click on the transcript to go to that point in the video. Please note that transcripts are auto generated and may contain minor inaccuracies.

View code shown in video
# Load Packages -----------------------------------------------------------

library(tidyverse)

# Import Data -------------------------------------------------------------

penguins <-
  read_csv("penguins.csv")

# filter() ----------------------------------------------------------------

# We use filter() to choose a subset of observations.

# We use == to select all observations that meet the criteria.

penguins |>
  filter(species == "Adelie") 

# We use != to select all observations that don't meet the criteria.

penguins |>
  filter(species != "Adelie") 

# We use filter_out() to do the same thing

penguins |>
  filter_out(species == "Adelie") 

# We can combine comparisons and logical operators.

penguins |>
  filter(species == "Adelie" | species == "Chinstrap") 

# We can use %in% to collapse multiple comparisons into one.

penguins |>
  filter(species %in% c("Adelie", "Chinstrap"))

# We can also use when_any() to do the same thing

penguins |>
  filter(when_any(species == "Adelie", species == "Chinstrap"))

# We can chain together multiple filter functions.
# Doing it this way, we don't have create complex logic in one line.

# Complicated version

penguins |>
  filter(
    species %in% c("Adelie", "Chinstrap") & island == "Torgersen"
  ) 

# Simpler version

penguins |>
  filter(species %in% c("Adelie", "Chinstrap")) |>
  filter(island == "Torgersen")

# when_all() version

penguins |>
  filter(
    when_all(
      species %in% c("Adelie", "Chinstrap"),
      island == "Torgersen"
    )
  )

# We can use <, >, <=, and => for numeric data.

penguins |>
  filter(body_mass_g > 4000)

# We can drop NAs with !is.na().

penguins |>
  filter(!is.na(sex))

# But the double negative is confusing.
# We can also drop NAs with drop_na().

penguins |>
  drop_na(sex)

Your Turn

# Load Packages -----------------------------------------------------------

# Load the tidyverse package

library(tidyverse)

# Import Data -------------------------------------------------------------

penguins <- read_csv("penguins.csv")
			
# filter() ----------------------------------------------------------------

# Use filter() to only keep female penguins

# YOUR CODE HERE

# Use filter() to only keep penguins NOT on Torgersen island

# YOUR CODE HERE

# Use filter() to only keep penguins on Torgersen island or Biscoe island
# Use the or logical operator (|) to do this

# YOUR CODE HERE

# Rewrite your filter() code above to keep the penguins from Torgersen island or Biscoe island
# This time, though, use the %in% operator

# YOUR CODE HERE

# Use a comparison operator to keep penguins with flipper lengths greater than or equal to 193 millimeters

# YOUR CODE HERE

# Drop any rows that have missing data in the flipper_length_mm variable

# Do this first with !is.na()

# YOUR CODE HERE

# Do this a second time with drop_na()

# YOUR CODE HERE

Learn More

To learn more about the filter() function, check out Chapter 3 of R for Data Science.

Have any questions? Put them below and we will help you out!

You need to be signed-in to comment on this post. Login.