Resources

Data Wrangling with dplyr and tidyr

data.table

data.table provides a high-performance version of base R’s data.frame with syntax and feature enhancements for ease of use, convenience and programming speed.

Go to Resource

Deep dive intro dplyr

Dive into dplyr tutorial on Kaggle

Go to Resource

dplyr

dplyr is a package in R that provides a grammar of data manipulation. It offers a consistent set of verbs to solve common data manipulation challenges, such as adding new variables, selecting variables, filtering cases, summarizing data, and arranging rows. It also provides support for working with different computational backends, including arrow, dtplyr, dbplyr, duckplyr, duckdb, and sparklyr. The package can be installed as part of the tidyverse or separately.

Go to Resource

dplyr 1.2.0

dplyr 1.2.0 is a major release of the popular data manipulation package in R. The update introduces new features like filter_out() as a complement to the existing filter() function, as well as when_any() and when_all() helpers. It also adds functions such as recode_values(), replace_values(), and replace_when() to extend the capabilities for recoding and replacing data. These improvements have been shaped by the tidyverse community's tidyups proposal process, and the announcement encourages users to install the update from CRAN. Alongside providing code examples, the importance of understandable and maintainable code when filtering data is emphasized.

Go to Resource

Duplicating Quarto elements with code templates to reduce copy and paste errors

This blog post from the Water Data For The Nation Blog demonstrates how to use Quarto code templates to create reproducible Quarto documents, such as reports and slideshows, with fewer errors. Using custom templates allows for the easy replication of code chunks, such as those producing statistical summaries or visualizations for different datasets. The example used is USGS streamgage data, with a focus on automating the creation of slideshows in Quarto's markdown format. Advanced topics like adding columns, tables, and speaker notes to PowerPoint slides via Quarto are also covered. Methods for iterating over data in a more efficient and less error-prone way than traditional copy and paste techniques are highlighted.

Go to Resource

Generating quarto syntax within R – Notes from a data witch

This blog post introduces 'quartose', an R package designed to integrate with Quarto for literate programming. The author, located in Sydney, discusses the nuances of names and their personal connection to this topic before exploring a data analysis task using the 'babynames' package. The analysis involves mapping names to data frames and visualizing name popularity over time. The post concludes with a demonstration of 'quarto_tabset()' that allows inserting plots or data frames into a document as a tabbed interface, enhancing the presentation of data in a readable and interactive format.

Go to Resource

Getting more out of dplyr

Video presentation by Suzan Baert on getting more out of dplyr at SatRday 2018 Amsterdam.

Go to Resource

haven

Import and Export SPSS, Stata, and SAS files using the haven package

Go to Resource

Homelessness and Rents in Canada

This content is a comprehensive R code walkthrough for analyzing homelessness and rent data in Canada. It uses multiple R libraries, including the tidyverse for data wrangling, can census for accessing census data, and patchwork for visualizing data. Important steps include data import, cleaning, and transforming with functions like mutate, filter, and summarize. Quantile calculations for rents and adjustments for CPI are shown to assess real rents over time. It highlights metros like Vancouver and Toronto, using colors to represent different years. The code indicates a rich, data-driven analysis and visualization process focusing on socio-economic issues of homelessness and rents.

Go to Resource

How (and Why) I came to Use R for Data Analysis and Evaluation

Alberto Espinoza recounts his journey with R for data analysis and evaluation, marking his 10-year experience since first encountering R during his graduate assistantship. Initially clueless about R, he was tasked with assisting and leading statistics labs using R. Despite early challenges and a steep learning curve, he recognized R's power over software like SPSS or Excel. His continued use of R spanned graduate projects, market research, data preparation for Tableau, and Survey Monkey analysis. Espinoza outlines R's advantages: reproducibility, efficiency, clarity, and an extensive package ecosystem, underlining R's significance in his professional growth.

Go to Resource

How to Split Data into Equal Sized Groups in R: A Comprehensive Guide for Beginners

This guide instructs beginners on splitting data into equal-sized groups in R, essential for cross-validation and balanced datasets. Using the split() function, cut_number() from ggplot2, and group_split() from dplyr, it provides syntax and examples for various data types. Detailed explanations and practical examples aid in mastering data manipulation for model evaluation and analysis.

Go to Resource