## Dplyr Make New Variables

Compute and append one or more new columns.

``dplyr::mutate(iris, sepal = Sepal.Length + Sepal. Width)``

Apply window function to each column.

``dplyr::mutate_each(iris, funs(min_rank))``

Compute one or more new columns. Drop original columns.

``dplyr::transmute(iris, sepal = Sepal.Length + Sepal. Width)``

Copy with values shifed by 1.

``dplyr::lead``

Copy with values lagged by 1.

``dplyr::lag``

Ranks with no gaps.

``dplyr::dense_rank``

Ranks. Ties get min rank.

``dplyr::min_rank``

Ranks rescaled to [0, 1].

``dplyr::percent_rank``

Ranks. Ties got to first value.

``dplyr::row_number``

Bin vector into n buckets.

``dplyr::ntile``

Are values between a and b?

``dplyr::between``

Cumulative distribution.

``dplyr::cume_dist``

Cumulative all

``dplyr::cumall``

Cumulative any

``dplyr::cumany``

Cumulative mean

``dplyr::cummean``

Cumulative sum

``cumsum``

Cumulative max

``cummax``

Cumulative min

``cummin``

Cumulative prod

``cumprod``

Element-wise max

``pmax``

Element-wise min

``pmin``

## Dplyr Combine Data Sets

Join matching rows from b to a.

``dplyr::lef_join(a, b, by = "x1")``

Join matching rows from a to b.

``dplyr::right_join(a, b, by = "x1")``

Join data. Retain only rows in both sets.

``dplyr::inner_join(a, b, by = "x1")``

Join data. Retain all values, all rows.

``dplyr::full_join(a, b, by = "x1")``

All rows in a that have a match in b.

``dplyr::semi_join(a, b, by = "x1")``

All rows in a that do not have a match in b.

``dplyr::anti_join(a, b, by = "x1")``

Rows that appear in both y and z.

``dplyr::intersect(y, z)``

Rows that appear in either or both y and z.

``dplyr::union(y, z)``

Rows that appear in y but not z.

``dplyr::setdiff(y, z)``

Append z to y as new rows.

``dplyr::bind_rows(y, z)``

Append z to y as new columns. Caution: matches rows by position.

``dplyr::bind_cols(y, z)``

## Dplyr Group Data Cheat Sheet

Group data into rows with the same value of Species.

``dplyr::group_by(iris, Species)``

Remove grouping information from a data frame.

``dplyr::ungroup(iris)``

Compute separate summary rows for each group.

``iris %>% group_by(Species) %>% summarise(…)``

Compute new variables by group

``iris %>% group_by(Species) %>% mutate(…)``

## Dplyr Summarise Data Cheat Sheet

Summarise data into a single row of values.

``dplyr::summarise(iris, avg = mean(Sepal.Length))``

Apply the summary function to each column.

``dplyr::summarise_each(iris, funs(mean))``

Count the number of rows with each unique value of a variable (with or without weights).

``dplyr::count(iris, Species, wt = Sepal.Length)``

First value of a vector.

``dplyr::first``

Last value of a vector.

``dplyr::last``

Nth value of a vector.

``dplyr::nth``

of values in a vector.

``dplyr::n``

of distinct values in a vector.

``dplyr::n_distinct``

Minimum value in a vector.

``min``

Maximum value in a vector.

``max``

Mean value of a vector.

``mean``

Median value of a vector.

``median``

The variance of a vector.

``var``

The standard deviation of a vector.

``sd``

## Dplyr Subset Variables Cheat Sheet

Select columns by name or helper function.

``dplyr::select(iris, Sepal.Width, Petal.Length, Species)``

Select columns whose name contains a character string.

``select(iris, contains("."))``

Select columns whose name ends with a character string.

``select(iris, ends_with("Length"))``

Select every column.

``select(iris, everything())``

Select columns whose name matches a regular expression.

``select(iris, matches(".t."))``

Select columns named x1, x2, x3, x4, x5.

``select(iris, num_range("x", 1:5))``

Select columns whose names are in a group of names.

``select(iris, one_of(c("Species", "Genus")))``

Select columns whose name starts with a character string.

``select(iris, starts_with("Sepal"))``

Select all columns between Sepal. Length and Petal.Width (inclusive).

``select(iris, Sepal.Length:Petal.Width)``

Select all columns except Species.

``select(iris, -Species)``

## Dplyr Subset Observations Cheat Sheet

Extract rows that meet logical criteria.

``dplyr::filter(iris, Sepal.Length > 7)``

Remove duplicate rows.

``dplyr::distinct(iris)``

Randomly select a fraction of rows.

``dplyr::sample_frac(iris, 0.5, replace = TRUE)``

Randomly select n rows.

``dplyr::sample_n(iris, 10, replace = TRUE)``

Select rows by position.

``dplyr::slice(iris, 10:15)``

Select and order top n entries (by the group if grouped data).

``dplyr::top_n(storms, 2, date)``

## Dplyr Reshaping Data Cheat Sheet

Gather columns into rows.

``tidyr::gather(cases, "year", "n", 2:4)``

``tidyr::spread(pollution, size, amount)``

Combine vectors into data frame (optimized).

``dplyr::data_frame(a = 1:3, b = 4:6)``

Order rows by values of a column (low to high).

``dplyr::arrange(mtcars, mpg)``

Order rows by values of a column (high to low).

``dplyr::arrange(mtcars, desc(mpg)) ``

Rename the columns of a data frame.

``dplyr::rename(tb, y = year)``

Separate one column into several.

``tidyr::separate(storms, date, c("y", "m", "d"))``

Unite several columns into one.

``tidyr::unite(data, col, ..., sep)``

## Dplyr Syntax Cheat Sheet

Converts data to tbl class. tbl’s are easier to examine than
data frames. R displays only the data that fits onscreen:

``dplyr::tbl_df(iris)``

Information-dense summary of tbl data.

``dplyr::glimpse(iris)``

View data set in spreadsheet-like display (note capital V).

``utils::View(iris)``

Passes object on lef hand side as first argument (or .
argument) of function on righthand side.

``dplyr::%>%``