Resources
This carefully curated collection of resources will help you find packages and learning resources to help you on your R journey.
Easily clean up messy databases with fuzzy matching in R
This article introduces data journalists to fuzzy matching techniques using R to clean up databases with inconsistently entered text data. It outlines the challenge of recognizing similar information recorded in various ways and the computer's inability to naturally interpret them as identical. The tutorial explains 'fuzzy' matching, which identifies similarities in letter patterns to group text together more accurately. Essential R libraries like tidyverse and stringdist are loaded to demonstrate the process. Practical examples from the 2025 IRE conference schedule data show how to extract and clean session names with potential entry mistakes, using fuzzy matching to consolidate the categories accurately.
Go to Resource
Easily download files from the Open Science Framework with Papercheck
The 20% Statistician is a blog focusing on statistics, research methods, and open science. It aims to help researchers understand crucial statistical concepts, claiming that grasping 20% of statistics can improve 80% of inferences. A recent post highlights the challenge of downloading files from the Open Science Framework (OSF). The authors, DeBruine and Lakens, introduced 'Papercheck,' an R package with a function 'osf_file_download' that simplifies this process. Papercheck recreates OSF's folder structure within a local directory, making it user-friendly to access project files for review or reuse.
Go to Resource
Efficiency and Consistency: Automate Subset Graphics with ggplot2 and purrr - Cédric Scherer
Efficiency and Consistency: Automate Subset Graphics with ggplot2 and purrr is a blog post by Cédric Scherer that discusses automated plot generation with ggplot2. The post explains how to create a set of explorative or explanatory charts for different variables or categories of a dataset using a functional programming approach. It provides examples and tips for working with variables and explores data sets visually.
Go to Resource
Efficient R Programming
Efficient R programming is a book that provides tips and techniques for writing efficient and optimized code in R.
Go to Resource
Engineering Production-Grade Shiny Apps
This book is a guide to building robust Shiny applications that are ready for production use. It covers topics such as project management, technical optimization, and team collaboration. The target audience includes developers who have basic knowledge of Shiny and want to build production-grade applications.
Go to Resource
Exploratory Data Analysis in R
This content details the process of Exploratory Data Analysis (EDA) using R. It emphasizes the importance of EDA as a crucial part of data science, particularly in understanding data and identifying biases. The article introduces several R packages that facilitate EDA, including overviewR, which is particularly focused on time series data analysis but is applicable to other data types. Key features of each package are compared, and the usage of the {palmerpenguins} dataset is illustrated. Package installation, data loading, and functions like str() and summary() are discussed, giving readers an introduction to effective data analysis in R.
Go to Resource
Exploring {ggplot2}’s Geoms and Stats
This content delves into the intricacies of geoms and stats within the {ggplot2} package's Layered Grammar of Graphics. It emphasizes the way plots are constructed by adding layers, each comprising a geom and a stat. A geom dictates the visual representation, while a stat preprocesses the data. The article explains how, for instance, geom_histogram() applies a binning stat. The exploration includes R code for listing and correlating the geoms and stats in {ggplot2}, and generating a plot to visualize the combinations. Additionally, it demonstrates data extraction post-transformation.
Go to Resource
Exploring Complex Survey Data Analysis Using R
This content outlines a comprehensive guide on analyzing complex survey data using R. It begins with an introduction to survey analysis in R, prerequisites, and the datasets used, followed by detailed sections on survey design, data collection, and post-survey processing including data cleaning, weighting, and documentation. The book further delves into practical aspects like getting started with R packages, performing descriptive analyses and statistical tests, building models, and effective communication of results. Additionally, it emphasizes reproducible research with project-based workflows and version control, catering to both beginners and advanced users.
Go to Resource
File Management With The {fs} Package
Albert Rapp's 'File Management With The {fs} Package' tutorial guides data scientists through various file system operations using the {fs} package in R. It demonstrates convenience functions that simplify tasks like path assembly, file extension modification, and directory information retrieval. Through examples, Rapp elucidates how to assemble paths regardless of trailing slashes, change file extensions for data transformation, and get directory details. The post includes code snippets and offers a video version for those who prefer learning through visual aids. Practical tips on iterating over file paths and creating organized output directories are also shared to enhance workflow efficiency.
Go to Resource
Filenames to variables
This content describes a technique for incorporating information from the filenames of multiple CSV files into a data frame during import. The article is by Luis D. Verde Arregoitia and focuses on the scenario where related data is split across multiple files by government agencies, often with key variables only indicated in each file's name. The tutorial demonstrates using the R programming language to group a dataset by several variables, export each group to its own CSV file without the grouping variables but with the naming reflecting those variables, and then re-importing the files while adding the filename-derived information back into the data frame.
Go to Resource
Fix labels and understand scale functions in ggplot - YouTube
This YouTube video provides an explanation of how to fix labels and understand scale functions in ggplot.
Go to Resource
Flowcharts made easy with the package {flowchart}
The {flowchart} package in R facilitates the creation of flowcharts, particularly useful in health research to show participant flow in studies. It integrates with the tidyverse workflow, offering customizable functions that work with pipe operators. Unlike other packages, it adapts flowcharts automatically to the data, enhancing reproducibility. The post explains installation, initialization, and drawing processes using the SAFO clinical trial dataset. It's easy to produce complex flowcharts without manual parameter setting thanks to the package's tidyverse-centric design.
Go to Resource