Resources

Dengue Data Hub

The Dengue Data Hub is a centralized platform that provides access to global dengue-related data across 225 countries. Launched and managed by Dr. Thiyanga S. Talagala, the hub is funded by the R Consortium, based in the USA. It is designed to facilitate the research and analysis of dengue fever patterns and trends. Users can explore and download datasets to study the impact of the disease, understand its prevalence, and potentially aid in the development of preventative measures. Researchers looking for dengue data can contact Dr. Talagala for more information.

Go to Resource

Don't use Quarto documents to clean or analyze data

This content advocates against using Quarto, R Markdown, or Jupyter for data cleaning and analysis, emphasizing that these platforms should be used for communication rather than exploratory tasks. Diego Catalan Molina advises that data inputs should be clean before being loaded into documents which should serve as a vehicle to tell a story. He suggests creating engaging outlines focused on findings' importance and using these documents exclusively to share results, not every plot or table during the exploratory phase of data analysis.

Go to Resource

Easily clean up messy databases with fuzzy matching in R

This article introduces data journalists to fuzzy matching techniques using R to clean up databases with inconsistently entered text data. It outlines the challenge of recognizing similar information recorded in various ways and the computer's inability to naturally interpret them as identical. The tutorial explains 'fuzzy' matching, which identifies similarities in letter patterns to group text together more accurately. Essential R libraries like tidyverse and stringdist are loaded to demonstrate the process. Practical examples from the 2025 IRE conference schedule data show how to extract and clean session names with potential entry mistakes, using fuzzy matching to consolidate the categories accurately.

Go to Resource

Enhance Quarto Project Workflows and Standards • froggeR

froggeR is an R package designed to enhance Quarto project workflows for R users. It provides a suite of functions that automate project setup tasks, enforce consistent documentation, and allow users to focus on analysis rather than configuration. The package includes features for creating Quarto projects with custom templates, managing YAML headers, applying git protection with a comprehensive .gitignore, styling documents with SCSS templates, and generating structured project documentation. It's particularly useful for R users managing multiple Quarto projects, encouraging collaboration, and standardizing project structures.

Go to Resource

Exploratory Data Analysis in R

This content details the process of Exploratory Data Analysis (EDA) using R. It emphasizes the importance of EDA as a crucial part of data science, particularly in understanding data and identifying biases. The article introduces several R packages that facilitate EDA, including overviewR, which is particularly focused on time series data analysis but is applicable to other data types. Key features of each package are compared, and the usage of the {palmerpenguins} dataset is illustrated. Package installation, data loading, and functions like str() and summary() are discussed, giving readers an introduction to effective data analysis in R.

Go to Resource

Exploring Complex Survey Data Analysis Using R

This content outlines a comprehensive guide on analyzing complex survey data using R. It begins with an introduction to survey analysis in R, prerequisites, and the datasets used, followed by detailed sections on survey design, data collection, and post-survey processing including data cleaning, weighting, and documentation. The book further delves into practical aspects like getting started with R packages, performing descriptive analyses and statistical tests, building models, and effective communication of results. Additionally, it emphasizes reproducible research with project-based workflows and version control, catering to both beginners and advanced users.

Go to Resource

Extract Data from Professional Volleyball Leagues in North America with {rvolleydata}

The R package {rvolleydata} is designed for those interested in analyzing professional volleyball data, providing a simple interface to collect structured data from North American leagues such as League One Volleyball Pro (LOVB), Athletes Unlimited Pro Volleyball (AUPVB), and Major League Volleyball (MLV). The package can be installed from CRAN for stable use or from GitHub for the development version. Comprehensive usage guidelines are available in the package vignette, which helps users employ {rvolleydata} effectively to obtain clean and tidy volleyball league data for their analyses.

Go to Resource

FakeDataR

FakeDataR is an R package designed for locally generating synthetic copies of real datasets, thereby enhancing privacy and enabling secure data sharing without exposing sensitive information. It preserves the original structure, including schema, types, factor levels, numeric ranges, and missingness, while offering heuristics for masking sensitive fields. The package supports direct integration with R for seamless LLM workflows and reproducible examples. It comes with features like bundled exports for easy sharing and database schema compatibility for data synthesis. Use FakeDataR when needing shape-consistent synthetic data while avoiding privacy breaches, not for formal privacy guarantees or statistical benchmarks.

Go to Resource

File Management With The {fs} Package

Albert Rapp's 'File Management With The {fs} Package' tutorial guides data scientists through various file system operations using the {fs} package in R. It demonstrates convenience functions that simplify tasks like path assembly, file extension modification, and directory information retrieval. Through examples, Rapp elucidates how to assemble paths regardless of trailing slashes, change file extensions for data transformation, and get directory details. The post includes code snippets and offers a video version for those who prefer learning through visual aids. Practical tips on iterating over file paths and creating organized output directories are also shared to enhance workflow efficiency.

Go to Resource

Focus and feedback in the tidyverse

This content features Tracy Teal interviewing Hadley Wickham for Open Source Stories, discussing the tidyverse's history and its influences. Hadley reflects on his parents' influence and his role as Posit's Chief Scientist in making data science more accessible. Themes include early computing exposure, relational databases, tidy data principles, and the balance between assisting and imposing solutions. Personal anecdotes highlight how Hadley's upbringing shaped the development of tidyverse tools aimed at simplifying and tidying data in R.

Go to Resource

Four ways to write assertion checks in R

This content provides a personal narrative detailing the importance of writing assertion checks in R, particularly when dealing with data that can change structure over time. The author shares a transformation from a confident young analyst to one who has learned to be cautious and employ defensive programming techniques. The focus is on the 'identifier' function, illustrating the need for assertions with 'stopifnot()' to handle unexpected and incorrect inputs. The article emphasizes rigorous validation of assumptions to prevent silent errors in code.

Go to Resource

Full-Stack Survey Research with SurveyMonkey • svmkR

svmkR is an R package that provides a comprehensive toolkit for managing SurveyMonkey surveys within the R programming environment. It enables users to create, upload, download, and analyze surveys directly from R. Users can calculate margins of error, apply statistical survey weights through raking, and generate SurveyMonkey-style banner presentations for polls. The package is installed from GitHub and serves as a full-stack survey research solution. The source is available on GitHub, and the package was developed by a team of contributors, building on the surveymonkey package by enhancing and refactoring it.

Go to Resource