Package 'tidytable' reference manual

Title:	Tidy Interface to 'data.table'
Description:	A tidy interface to 'data.table', giving users the speed of 'data.table' while using tidyverse-like syntax.
Authors:	Mark Fairbanks [aut, cre], Abdessabour Moutik [ctb], Matt Carlson [ctb], Ivan Leung [ctb], Ross Kennedy [ctb], Robert On [ctb], Alexander Sevostianov [ctb], Koen ter Berg [ctb]
Maintainer:	Mark Fairbanks <[email protected]>
License:	MIT + file LICENSE
Version:	0.11.0.9
Built:	2024-07-02 03:31:45 UTC
Source:	https://github.com/markfairbanks/tidytable

Fast `%in%` and `⁠%notin%⁠` operators

Description

Check whether values in a vector are in or not in another vector.

Built using data.table::'%chin%' and vctrs::vec_in() for performance.

Usage

x %in% y

x %notin% y
x %in% y

x %notin% y

Arguments

`x`	A vector of values to check if they exist in y
`y`	A vector of values to check if x values exist in

Details

Falls back to base::'%in%' when x and y don't share a common type. This means that the behaviour of base::'%in%' is preserved (e.g. "1" %in% c(1, 2) is TRUE) but loses the speedup provided by vctrs::vec_in().

Examples

df <- tidytable(x = 1:4, y = 1:4)

df %>%
  filter(x %in% c(2, 4))

df %>%
  filter(x %notin% c(2, 4))
df <- tidytable(x = 1:4, y = 1:4)

df %>%
  filter(x %in% c(2, 4))

df %>%
  filter(x %notin% c(2, 4))

Apply a function across a selection of columns

Description

Apply a function across a selection of columns. For use in arrange(), mutate(), and summarize().

Usage

across(.cols = everything(), .fns = NULL, ..., .names = NULL)
across(.cols = everything(), .fns = NULL, ..., .names = NULL)

Arguments

`.cols`	vector `c()` of unquoted column names. `tidyselect` compatible.
`.fns`	Function to apply. Can be a purrr-style lambda. Can pass also list of functions.
`...`	Other arguments for the passed function
`.names`	A glue specification that helps with renaming output columns. `{.col}` stands for the selected column, and `{.fn}` stands for the name of the function being applied. The default (`NULL`) is equivalent to `"{.col}"` for a single function case and `"{.col}_{.fn}"` when a list is used for `.fns`.

Examples

df <- data.table(
  x = rep(1, 3),
  y = rep(2, 3),
  z = c("a", "a", "b")
)

df %>%
  mutate(across(c(x, y), ~ .x * 2))

df %>%
  summarize(across(where(is.numeric), ~ mean(.x)),
            .by = z)

df %>%
  arrange(across(c(y, z)))
df <- data.table(
  x = rep(1, 3),
  y = rep(2, 3),
  z = c("a", "a", "b")
)

df %>%
  mutate(across(c(x, y), ~ .x * 2))

df %>%
  summarize(across(where(is.numeric), ~ mean(.x)),
            .by = z)

df %>%
  arrange(across(c(y, z)))

Add a count column to the data frame

Description

Add a count column to the data frame.

df %>% add_count(a, b) is equivalent to using df %>% mutate(n = n(), .by = c(a, b))

Usage

add_count(.df, ..., wt = NULL, sort = FALSE, name = NULL)

add_tally(.df, wt = NULL, sort = FALSE, name = NULL)
add_count(.df, ..., wt = NULL, sort = FALSE, name = NULL)

add_tally(.df, wt = NULL, sort = FALSE, name = NULL)

Arguments

`.df`	A data.frame or data.table
`...`	Columns to group by. `tidyselect` compatible.
`wt`	Frequency weights. Can be `NULL` or a variable: If `NULL` (the default), counts the number of rows in each group. If a variable, computes `sum(wt)` for each group.
`sort`	If `TRUE`, will show the largest groups at the top.
`name`	The name of the new column in the output. If omitted, it will default to `n`.

Examples

df <- data.table(
  a = c("a", "a", "b"),
  b = 1:3
)

df %>%
  add_count(a)
df <- data.table(
  a = c("a", "a", "b"),
  b = 1:3
)

df %>%
  add_count(a)

Arrange/reorder rows

Description

Order rows in ascending or descending order.

Usage

arrange(.df, ...)
arrange(.df, ...)

Arguments

`.df`	A data.frame or data.table
`...`	Variables to arrange by

Examples

df <- data.table(
  a = 1:3,
  b = 4:6,
  c = c("a", "a", "b")
)

df %>%
  arrange(c, -a)

df %>%
  arrange(c, desc(a))
df <- data.table(
  a = 1:3,
  b = 4:6,
  c = c("a", "a", "b")
)

df %>%
  arrange(c, -a)

df %>%
  arrange(c, desc(a))

Coerce an object to a data.table/tidytable

Description

A tidytable object is simply a data.table with nice printing features.

Note that all tidytable functions automatically convert data.frames & data.tables to tidytables in the background. As such this function will rarely need to be used by the user.

Usage

as_tidytable(x, ..., .name_repair = "unique", .keep_rownames = FALSE)
as_tidytable(x, ..., .name_repair = "unique", .keep_rownames = FALSE)

Arguments

`x`	An R object
`...`	Additional arguments to be passed to or from other methods.
`.name_repair`	Treatment of duplicate names. See `?vctrs::vec_as_names` for options/details.
`.keep_rownames`	Default is `FALSE`. If `TRUE`, adds the input object's names as a separate column named `"rn"`. `.keep_rownames = "id"` names the column "id" instead.

Examples

df <- data.frame(x = -2:2, y = c(rep("a", 3), rep("b", 2)))

df %>%
  as_tidytable()
df <- data.frame(x = -2:2, y = c(rep("a", 3), rep("b", 2)))

df %>%
  as_tidytable()

Do the values from x fall between the left and right bounds?

Description

between() utilizes data.table::between() in the background

Usage

between(x, left, right)
between(x, left, right)

Arguments

`x`	A numeric vector
`left`, `right`	Boundary values

Examples

df <- data.table(
  x = 1:5,
  y = 1:5
)

# Typically used in a filter()
df %>%
  filter(between(x, 2, 4))

df %>%
  filter(x %>% between(2, 4))

# Can also use the %between% operator
df %>%
  filter(x %between% c(2, 4))
df <- data.table(
  x = 1:5,
  y = 1:5
)

# Typically used in a filter()
df %>%
  filter(between(x, 2, 4))

df %>%
  filter(x %>% between(2, 4))

# Can also use the %between% operator
df %>%
  filter(x %between% c(2, 4))

Bind data.tables by row and column

Description

Bind multiple data.tables into one row-wise or col-wise.

Usage

bind_cols(..., .name_repair = "unique")

bind_rows(..., .id = NULL)
bind_cols(..., .name_repair = "unique")

bind_rows(..., .id = NULL)

Arguments

`...`	data.tables or data.frames to bind
`.name_repair`	Treatment of duplicate names. See `?vctrs::vec_as_names` for options/details.
`.id`	If TRUE, an integer column is made as a group id

Examples

# Binding data together by row
df1 <- data.table(x = 1:3, y = 10:12)
df2 <- data.table(x = 4:6, y = 13:15)

df1 %>%
  bind_rows(df2)

# Can pass a list of data.tables
df_list <- list(df1, df2)

bind_rows(df_list)

# Binding data together by column
df1 <- data.table(a = 1:3, b = 4:6)
df2 <- data.table(c = 7:9)

df1 %>%
  bind_cols(df2)

# Can pass a list of data frames
bind_cols(list(df1, df2))
# Binding data together by row
df1 <- data.table(x = 1:3, y = 10:12)
df2 <- data.table(x = 4:6, y = 13:15)

df1 %>%
  bind_rows(df2)

# Can pass a list of data.tables
df_list <- list(df1, df2)

bind_rows(df_list)

# Binding data together by column
df1 <- data.table(a = 1:3, b = 4:6)
df2 <- data.table(c = 7:9)

df1 %>%
  bind_cols(df2)

# Can pass a list of data frames
bind_cols(list(df1, df2))

Combine values from multiple columns

Description

c_across() works inside of mutate_rowwise(). It uses tidyselect so you can easily select multiple variables.

Usage

c_across(cols = everything())
c_across(cols = everything())

Arguments

cols

Columns to transform.

Examples

df <- data.table(x = runif(6), y = runif(6), z = runif(6))

df %>%
  mutate_rowwise(row_mean = mean(c_across(x:z)))
df <- data.table(x = runif(6), y = runif(6), z = runif(6))

df %>%
  mutate_rowwise(row_mean = mean(c_across(x:z)))

`data.table::fcase()` with vectorized default

Description

This function allows you to use multiple if/else statements in one call.

It is called like data.table::fcase(), but allows the user to use a vector as the default argument.

Usage

case(..., default = NA, ptype = NULL, size = NULL)
case(..., default = NA, ptype = NULL, size = NULL)

Arguments

`...`	Sequence of condition/value designations
`default`	Default value. Set to NA by default.
`ptype`	Optional ptype to specify the output type.
`size`	Optional size to specify the output size.

Examples

df <- tidytable(x = 1:10)

df %>%
  mutate(case_x = case(x < 5, 1,
                       x < 7, 2,
                       default = 3))
df <- tidytable(x = 1:10)

df %>%
  mutate(case_x = case(x < 5, 1,
                       x < 7, 2,
                       default = 3))

Vectorized `switch()`

Description

Allows the user to succinctly create a new vector based off conditions of a single vector.

Usage

case_match(.x, ..., .default = NA, .ptype = NULL)
case_match(.x, ..., .default = NA, .ptype = NULL)

Arguments

`.x`	A vector
`...`	A sequence of two-sided formulas. The left hand side gives the old values, the right hand side gives the new value.
`.default`	The default value if all conditions evaluate to `FALSE`.
`.ptype`	Optional ptype to specify the output type.

Examples

df <- tidytable(x = c("a", "b", "c", "d"))

df %>%
  mutate(
    case_x = case_match(x,
                        c("a", "b") ~ "new_1",
                        "c" ~ "new_2",
                        .default = x)
  )
df <- tidytable(x = c("a", "b", "c", "d"))

df %>%
  mutate(
    case_x = case_match(x,
                        c("a", "b") ~ "new_1",
                        "c" ~ "new_2",
                        .default = x)
  )

Case when

Description

This function allows you to use multiple if/else statements in one call.

It is called like dplyr::case_when(), but utilizes data.table::fifelse() in the background for improved performance.

Usage

case_when(..., .default = NA, .ptype = NULL, .size = NULL)
case_when(..., .default = NA, .ptype = NULL, .size = NULL)

Arguments

`...`	A sequence of two-sided formulas. The left hand side gives the conditions, the right hand side gives the values.
`.default`	The default value if all conditions evaluate to `FALSE`.
`.ptype`	Optional ptype to specify the output type.
`.size`	Optional size to specify the output size.

Examples

df <- tidytable(x = 1:10)

df %>%
  mutate(case_x = case_when(x < 5 ~ 1,
                            x < 7 ~ 2,
                            TRUE ~ 3))
df <- tidytable(x = 1:10)

df %>%
  mutate(case_x = case_when(x < 5 ~ 1,
                            x < 7 ~ 2,
                            TRUE ~ 3))

Coalesce missing values

Description

Fill in missing values in a vector by pulling successively from other vectors.

Usage

coalesce(..., .ptype = NULL, .size = NULL)
coalesce(..., .ptype = NULL, .size = NULL)

Arguments

`...`	Input vectors. Supports dynamic dots.
`.ptype`	Optional ptype to override output type
`.size`	Optional size to override output size

Examples

# Use a single value to replace all missing values
x <- c(1:3, NA, NA)
coalesce(x, 0)

# Or match together a complete vector from missing pieces
y <- c(1, 2, NA, NA, 5)
z <- c(NA, NA, 3, 4, 5)
coalesce(y, z)

# Supply lists with dynamic dots
vecs <- list(
  c(1, 2, NA, NA, 5),
  c(NA, NA, 3, 4, 5)
)
coalesce(!!!vecs)
# Use a single value to replace all missing values
x <- c(1:3, NA, NA)
coalesce(x, 0)

# Or match together a complete vector from missing pieces
y <- c(1, 2, NA, NA, 5)
z <- c(NA, NA, 3, 4, 5)
coalesce(y, z)

# Supply lists with dynamic dots
vecs <- list(
  c(1, 2, NA, NA, 5),
  c(NA, NA, 3, 4, 5)
)
coalesce(!!!vecs)

Complete a data.table with missing combinations of data

Description

Turns implicit missing values into explicit missing values.

Usage

complete(.df, ..., fill = list(), .by = NULL)
complete(.df, ..., fill = list(), .by = NULL)

Arguments

`.df`	A data.frame or data.table
`...`	Columns to expand
`fill`	A named list of values to fill NAs with.
`.by`	Columns to group by

Examples

df <- data.table(x = 1:2, y = 1:2, z = 3:4)

df %>%
  complete(x, y)

df %>%
  complete(x, y, fill = list(z = 10))
df <- data.table(x = 1:2, y = 1:2, z = 3:4)

df %>%
  complete(x, y)

df %>%
  complete(x, y, fill = list(z = 10))

Generate a unique id for consecutive values

Description

Generate a unique id for runs of consecutive values

Usage

consecutive_id(...)
consecutive_id(...)

Arguments

...

Vectors of values

Examples

x <- c(1, 1, 2, 2, 1, 1)
consecutive_id(x)
x <- c(1, 1, 2, 2, 1, 1)
consecutive_id(x)

Context functions

Description

These functions give information about the "current" group.

cur_data() gives the current data for the current group
cur_column() gives the name of the current column (for use in across() only)
cur_group_id() gives a group identification number
cur_group_rows() gives the row indices for each group

Can be used inside summarize(), mutate(), & filter()

Usage

cur_column()

cur_data()

cur_group_id()

cur_group_rows()
cur_column()

cur_data()

cur_group_id()

cur_group_rows()

Examples

df <- data.table(
  x = 1:5,
  y = c("a", "a", "a", "b", "b")
)

df %>%
  mutate(
    across(c(x, y), ~ paste(cur_column(), .x))
  )

df %>%
  summarize(data = list(cur_data()),
            .by = y)

df %>%
  mutate(group_id = cur_group_id(),
         .by = y)

df %>%
  mutate(group_rows = cur_group_rows(),
         .by = y)
df <- data.table(
  x = 1:5,
  y = c("a", "a", "a", "b", "b")
)

df %>%
  mutate(
    across(c(x, y), ~ paste(cur_column(), .x))
  )

df %>%
  summarize(data = list(cur_data()),
            .by = y)

df %>%
  mutate(group_id = cur_group_id(),
         .by = y)

df %>%
  mutate(group_rows = cur_group_rows(),
         .by = y)

Count observations by group

Description

Returns row counts of the dataset.

tally() returns counts by group on a grouped tidytable.

count() returns counts by group on a grouped tidytable, or column names can be specified to return counts by group.

Usage

count(.df, ..., wt = NULL, sort = FALSE, name = NULL)

tally(.df, wt = NULL, sort = FALSE, name = NULL)
count(.df, ..., wt = NULL, sort = FALSE, name = NULL)

tally(.df, wt = NULL, sort = FALSE, name = NULL)

Arguments

`.df`	A data.frame or data.table
`...`	Columns to group by in `count()`. `tidyselect` compatible.
`wt`	Frequency weights. `tidyselect` compatible. Can be `NULL` or a variable: If `NULL` (the default), counts the number of rows in each group. If a variable, computes `sum(wt)` for each group.
`sort`	If `TRUE`, will show the largest groups at the top.
`name`	The name of the new column in the output. If omitted, it will default to `n`.

Examples

df <- data.table(
  x = c("a", "a", "b"),
  y = c("a", "a", "b"),
  z = 1:3
)

df %>%
  count()

df %>%
  count(x)

df %>%
  count(where(is.character))

df %>%
  count(x, wt = z, name = "x_sum")

df %>%
  count(x, sort = TRUE)

df %>%
  tally()

df %>%
  group_by(x) %>%
  tally()
df <- data.table(
  x = c("a", "a", "b"),
  y = c("a", "a", "b"),
  z = 1:3
)

df %>%
  count()

df %>%
  count(x)

df %>%
  count(where(is.character))

df %>%
  count(x, wt = z, name = "x_sum")

df %>%
  count(x, sort = TRUE)

df %>%
  tally()

df %>%
  group_by(x) %>%
  tally()

Cross join

Description

Cross join each row of x to every row in y.

Usage

cross_join(x, y, ..., suffix = c(".x", ".y"))
cross_join(x, y, ..., suffix = c(".x", ".y"))

Arguments

`x`	A data.frame or data.table
`y`	A data.frame or data.table
`...`	Other parameters passed on to methods
`suffix`	Append created for duplicated column names when using `full_join()`

Examples

df1 <- tidytable(x = 1:3)
df2 <- tidytable(y = 4:6)

cross_join(df1, df2)
df1 <- tidytable(x = 1:3)
df2 <- tidytable(y = 4:6)

cross_join(df1, df2)

Create a data.table from all unique combinations of inputs

Description

crossing() is similar to expand_grid() but de-duplicates and sorts its inputs.

Usage

crossing(..., .name_repair = "check_unique")
crossing(..., .name_repair = "check_unique")

Arguments

`...`	Variables to get unique combinations of
`.name_repair`	Treatment of problematic names. See `?vctrs::vec_as_names` for options/details

Examples

x <- 1:2
y <- 1:2

crossing(x, y)

crossing(stuff = x, y)
x <- 1:2
y <- 1:2

crossing(x, y)

crossing(stuff = x, y)

Descending order

Description

Arrange in descending order. Can be used inside of arrange()

Usage

desc(x)
desc(x)

Arguments

`x`	Variable to arrange in descending order

Examples

df <- data.table(
  a = 1:3,
  b = 4:6,
  c = c("a", "a", "b")
)

df %>%
  arrange(c, desc(a))
df <- data.table(
  a = 1:3,
  b = 4:6,
  c = c("a", "a", "b")
)

df %>%
  arrange(c, desc(a))

Select distinct/unique rows

Description

Retain only unique/distinct rows from an input df.

Usage

distinct(.df, ..., .keep_all = FALSE)
distinct(.df, ..., .keep_all = FALSE)

Arguments

`.df`	A data.frame or data.table
`...`	Columns to select before determining uniqueness. If omitted, will use all columns. `tidyselect` compatible.
`.keep_all`	Only relevant if columns are provided to ... arg. This keeps all columns, but only keeps the first row of each distinct values of columns provided to ... arg.

Examples

df <- tidytable(
  x = 1:3,
  y = 4:6,
  z = c("a", "a", "b")
)

df %>%
  distinct()

df %>%
  distinct(z)
df <- tidytable(
  x = 1:3,
  y = 4:6,
  z = c("a", "a", "b")
)

df %>%
  distinct()

df %>%
  distinct(z)

Drop rows containing missing values

Description

Drop rows containing missing values

Usage

drop_na(.df, ...)
drop_na(.df, ...)

Arguments

`.df`	A data.frame or data.table
`...`	Optional: A selection of columns. If empty, all variables are selected. `tidyselect` compatible.

Examples

df <- data.table(
  x = c(1, 2, NA),
  y = c("a", NA, "b")
)

df %>%
  drop_na()

df %>%
  drop_na(x)

df %>%
  drop_na(where(is.numeric))
df <- data.table(
  x = c(1, 2, NA),
  y = c("a", NA, "b")
)

df %>%
  drop_na()

df %>%
  drop_na(x)

df %>%
  drop_na(where(is.numeric))

Pipeable data.table call

Description

Pipeable data.table call.

This function does not use data.table's modify-by-reference.

Has experimental support for tidy evaluation for custom functions.

Usage

dt(.df, i, j, ...)
dt(.df, i, j, ...)

Arguments

`.df`	A data.frame or data.table
`i`	i position of a data.table call. See `?data.table::data.table`
`j`	j position of a data.table call. See `?data.table::data.table`
`...`	Other arguments passed to data.table call. See `?data.table::data.table`

Examples

df <- tidytable(
  x = 1:3,
  y = 4:6,
  z = c("a", "a", "b")
)

df %>%
  dt(, double_x := x * 2) %>%
  dt(order(-double_x))

# Experimental support for tidy evaluation for custom functions
add_one <- function(data, col) {
  data %>%
    dt(, new_col := {{ col }} + 1)
}

df %>%
  add_one(x)
df <- tidytable(
  x = 1:3,
  y = 4:6,
  z = c("a", "a", "b")
)

df %>%
  dt(, double_x := x * 2) %>%
  dt(order(-double_x))

# Experimental support for tidy evaluation for custom functions
add_one <- function(data, col) {
  data %>%
    dt(, new_col := {{ col }} + 1)
}

df %>%
  add_one(x)

Convert a vector to a data.table/tidytable

Description

Converts named and unnamed vectors to a data.table/tidytable.

Usage

enframe(x, name = "name", value = "value")
enframe(x, name = "name", value = "value")

Arguments

`x`	A vector
`name`	Name of the column that stores the names. If `name = NULL`, a one-column tidytable will be returned.
`value`	Name of the column that stores the values.

Examples

vec <- 1:3
names(vec) <- letters[1:3]

enframe(vec)
vec <- 1:3
names(vec) <- letters[1:3]

enframe(vec)

Expand a data.table to use all combinations of values

Description

Generates all combinations of variables found in a dataset.

expand() is useful in conjunction with joins:

use with right_join() to convert implicit missing values to explicit missing values
use with anti_join() to find out which combinations are missing

nesting() is a helper that only finds combinations already present in the dataset.

Usage

expand(.df, ..., .name_repair = "check_unique", .by = NULL)

nesting(..., .name_repair = "check_unique")
expand(.df, ..., .name_repair = "check_unique", .by = NULL)

nesting(..., .name_repair = "check_unique")

Arguments

`.df`	A data.frame or data.table
`...`	Columns to get combinations of
`.name_repair`	Treatment of duplicate names. See `?vctrs::vec_as_names` for options/details
`.by`	Columns to group by

Examples

df <- tidytable(x = c(1, 1, 2), y = c(1, 1, 2))

df %>%
  expand(x, y)

df %>%
  expand(nesting(x, y))
df <- tidytable(x = c(1, 1, 2), y = c(1, 1, 2))

df %>%
  expand(x, y)

df %>%
  expand(nesting(x, y))

Create a data.table from all combinations of inputs

Description

Create a data.table from all combinations of inputs

Usage

expand_grid(..., .name_repair = "check_unique")
expand_grid(..., .name_repair = "check_unique")

Arguments

`...`	Variables to get combinations of
`.name_repair`	Treatment of problematic names. See `?vctrs::vec_as_names` for options/details

Examples

x <- 1:2
y <- 1:2

expand_grid(x, y)

expand_grid(stuff = x, y)
x <- 1:2
y <- 1:2

expand_grid(x, y)

expand_grid(stuff = x, y)

Extract a character column into multiple columns using regex

Description

Superseded

extract() has been superseded by separate_wider_regex().

Given a regular expression with capturing groups, extract() turns each group into a new column. If the groups don't match, or the input is NA, the output will be NA. When you pass same name in the into argument it will merge the groups together. Whilst passing NA in the into arg will drop the group from the resulting tidytable

Usage

extract(
  .df,
  col,
  into,
  regex = "([[:alnum:]]+)",
  remove = TRUE,
  convert = FALSE,
  ...
)
extract(
  .df,
  col,
  into,
  regex = "([[:alnum:]]+)",
  remove = TRUE,
  convert = FALSE,
  ...
)

Arguments

`.df`	A data.table or data.frame
`col`	Column to extract from
`into`	New column names to split into. A character vector.
`regex`	A regular expression to extract the desired values. There should be one group (defined by `⁠()⁠`) for each element of `into`
`remove`	If TRUE, remove the input column from the output data.table
`convert`	If TRUE, runs `type.convert()` on the resulting column. Useful if the resulting column should be type integer/double.
`...`	Additional arguments passed on to methods.

Examples

df <- data.table(x = c(NA, "a-b-1", "a-d-3", "b-c-2", "d-e-7"))
df %>% extract(x, "A")
df %>% extract(x, c("A", "B"), "([[:alnum:]]+)-([[:alnum:]]+)")

# If no match, NA:
df %>% extract(x, c("A", "B"), "([a-d]+)-([a-d]+)")
# drop columns by passing NA
df %>% extract(x, c("A", NA, "B"), "([a-d]+)-([a-d]+)-(\\d+)")
# merge groups by passing same name
df %>% extract(x, c("A", "B", "A"), "([a-d]+)-([a-d]+)-(\\d+)")
df <- data.table(x = c(NA, "a-b-1", "a-d-3", "b-c-2", "d-e-7"))
df %>% extract(x, "A")
df %>% extract(x, c("A", "B"), "([[:alnum:]]+)-([[:alnum:]]+)")

# If no match, NA:
df %>% extract(x, c("A", "B"), "([a-d]+)-([a-d]+)")
# drop columns by passing NA
df %>% extract(x, c("A", NA, "B"), "([a-d]+)-([a-d]+)-(\\d+)")
# merge groups by passing same name
df %>% extract(x, c("A", "B", "A"), "([a-d]+)-([a-d]+)-(\\d+)")

Fill in missing values with previous or next value

Description

Fills missing values in the selected columns using the next or previous entry. Can be done by group.

Supports tidyselect

Usage

fill(.df, ..., .direction = c("down", "up", "downup", "updown"), .by = NULL)
fill(.df, ..., .direction = c("down", "up", "downup", "updown"), .by = NULL)

Arguments

`.df`	A data.frame or data.table
`...`	A selection of columns. `tidyselect` compatible.
`.direction`	Direction in which to fill missing values. Currently "down" (the default), "up", "downup" (first down then up), or "updown" (first up and then down)
`.by`	Columns to group by when filling should be done by group

Examples

df <- data.table(
  a = c(1, NA, 3, 4, 5),
  b = c(NA, 2, NA, NA, 5),
  groups = c("a", "a", "a", "b", "b")
)

df %>%
  fill(a, b)

df %>%
  fill(a, b, .by = groups)

df %>%
  fill(a, b, .direction = "downup", .by = groups)
df <- data.table(
  a = c(1, NA, 3, 4, 5),
  b = c(NA, 2, NA, NA, 5),
  groups = c("a", "a", "a", "b", "b")
)

df %>%
  fill(a, b)

df %>%
  fill(a, b, .by = groups)

df %>%
  fill(a, b, .direction = "downup", .by = groups)

Filter rows on one or more conditions

Description

Filters a dataset to choose rows where conditions are true.

Usage

filter(.df, ..., .by = NULL)
filter(.df, ..., .by = NULL)

Arguments

`.df`	A data.frame or data.table
`...`	Conditions to filter by
`.by`	Columns to group by if filtering with a summary function

Examples

df <- tidytable(
  a = 1:3,
  b = 4:6,
  c = c("a", "a", "b")
)

df %>%
  filter(a >= 2, b >= 4)

df %>%
  filter(b <= mean(b), .by = c)
df <- tidytable(
  a = 1:3,
  b = 4:6,
  c = c("a", "a", "b")
)

df %>%
  filter(a >= 2, b >= 4)

df %>%
  filter(b <= mean(b), .by = c)

Extract the first, last, or nth value from a vector

Description

Extract the first, last, or nth value from a vector.

Note: These are simple wrappers around vctrs::vec_slice().

Usage

first(x, default = NULL, na_rm = FALSE)

last(x, default = NULL, na_rm = FALSE)

nth(x, n, default = NULL, na_rm = FALSE)
first(x, default = NULL, na_rm = FALSE)

last(x, default = NULL, na_rm = FALSE)

nth(x, n, default = NULL, na_rm = FALSE)

Arguments

`x`	A vector
`default`	The default value if the value doesn't exist.
`na_rm`	If `TRUE` ignores missing values.
`n`	For `nth()`, a number specifying the position to grab.

Examples

vec <- letters

first(vec)
last(vec)
nth(vec, 4)
vec <- letters

first(vec)
last(vec)
nth(vec, 4)

Read/write files

Description

fread() is a simple wrapper around data.table::fread() that returns a tidytable instead of a data.table.

Usage

fread(...)
fread(...)

Arguments

...

Arguments passed on to data.table::fread

Examples

fake_csv <- "A,B
             1,2
             3,4"

fread(fake_csv)
fake_csv <- "A,B
             1,2
             3,4"

fread(fake_csv)

Convert character and factor columns to dummy variables

Description

Convert character and factor columns to dummy variables

Usage

get_dummies(
  .df,
  cols = where(~is.character(.x) | is.factor(.x)),
  prefix = TRUE,
  prefix_sep = "_",
  drop_first = FALSE,
  dummify_na = TRUE
)
get_dummies(
  .df,
  cols = where(~is.character(.x) | is.factor(.x)),
  prefix = TRUE,
  prefix_sep = "_",
  drop_first = FALSE,
  dummify_na = TRUE
)

Arguments

`.df`	A data.frame or data.table
`cols`	A single column or a vector of unquoted columns to dummify. Defaults to all character & factor columns using `c(where(is.character), where(is.factor))`. `tidyselect` compatible.
`prefix`	TRUE/FALSE - If TRUE, a prefix will be added to new column names
`prefix_sep`	Separator for new column names
`drop_first`	TRUE/FALSE - If TRUE, the first dummy column will be dropped
`dummify_na`	TRUE/FALSE - If TRUE, NAs will also get dummy columns

Examples

df <- tidytable(
  chr = c("a", "b", NA),
  fct = as.factor(c("a", NA, "c")),
  num = 1:3
)

# Automatically does all character/factor columns
df %>%
  get_dummies()

df %>%
  get_dummies(cols = chr)

df %>%
  get_dummies(cols = c(chr, fct), drop_first = TRUE)

df %>%
  get_dummies(prefix_sep = ".", dummify_na = FALSE)
df <- tidytable(
  chr = c("a", "b", NA),
  fct = as.factor(c("a", NA, "c")),
  num = 1:3
)

# Automatically does all character/factor columns
df %>%
  get_dummies()

df %>%
  get_dummies(cols = chr)

df %>%
  get_dummies(cols = c(chr, fct), drop_first = TRUE)

df %>%
  get_dummies(prefix_sep = ".", dummify_na = FALSE)

Grouping

Description

group_by() adds a grouping structure to a tidytable. Can use tidyselect syntax.
ungroup() removes grouping.

Usage

group_by(.df, ..., .add = FALSE)

ungroup(.df, ...)
group_by(.df, ..., .add = FALSE)

ungroup(.df, ...)

Arguments

`.df`	A data.frame or data.table
`...`	Columns to group by
`.add`	Should grouping cols specified be added to the current grouping

Examples

df <- data.table(
  a = 1:3,
  b = 4:6,
  c = c("a", "a", "b"),
  d = c("a", "a", "b")
)

df %>%
  group_by(c, d) %>%
  summarize(mean_a = mean(a)) %>%
  ungroup()

# Can also use tidyselect
df %>%
  group_by(where(is.character)) %>%
  summarize(mean_a = mean(a)) %>%
  ungroup()
df <- data.table(
  a = 1:3,
  b = 4:6,
  c = c("a", "a", "b"),
  d = c("a", "a", "b")
)

df %>%
  group_by(c, d) %>%
  summarize(mean_a = mean(a)) %>%
  ungroup()

# Can also use tidyselect
df %>%
  group_by(where(is.character)) %>%
  summarize(mean_a = mean(a)) %>%
  ungroup()

Selection helper for grouping columns

Description

Selection helper for grouping columns

Usage

group_cols()
group_cols()

Examples

df <- tidytable(
  x = c("a", "b", "c"),
  y = 1:3,
  z = 1:3
)

df %>%
  group_by(x) %>%
  select(group_cols(), y)
df <- tidytable(
  x = c("a", "b", "c"),
  y = 1:3,
  z = 1:3
)

df %>%
  group_by(x) %>%
  select(group_cols(), y)

Split data frame by groups

Description

Split data frame by groups. Returns a list.

Usage

group_split(.df, ..., .keep = TRUE, .named = FALSE)
group_split(.df, ..., .keep = TRUE, .named = FALSE)

Arguments

`.df`	A data.frame or data.table
`...`	Columns to group and split by. `tidyselect` compatible.
`.keep`	Should the grouping columns be kept
`.named`	experimental: Should the list be named with labels that identify the group

Examples

df <- tidytable(
  a = 1:3,
  b = 1:3,
  c = c("a", "a", "b"),
  d = c("a", "a", "b")
)

df %>%
  group_split(c, d)

df %>%
  group_split(c, d, .keep = FALSE)

df %>%
  group_split(c, d, .named = TRUE)
df <- tidytable(
  a = 1:3,
  b = 1:3,
  c = c("a", "a", "b"),
  d = c("a", "a", "b")
)

df %>%
  group_split(c, d)

df %>%
  group_split(c, d, .keep = FALSE)

df %>%
  group_split(c, d, .named = TRUE)

Get the grouping variables

Description

Get the grouping variables

Usage

group_vars(x)
group_vars(x)

Arguments

`x`	A grouped tidytable

Examples

df <- data.table(
  a = 1:3,
  b = 4:6,
  c = c("a", "a", "b"),
  d = c("a", "a", "b")
)

df %>%
  group_by(c, d) %>%
  group_vars()
df <- data.table(
  a = 1:3,
  b = 4:6,
  c = c("a", "a", "b"),
  d = c("a", "a", "b")
)

df %>%
  group_by(c, d) %>%
  group_vars()

Create conditions on a selection of columns

Description

Helpers to apply a filter across a selection of columns.

Usage

if_all(.cols = everything(), .fns = NULL, ...)

if_any(.cols = everything(), .fns = NULL, ...)
if_all(.cols = everything(), .fns = NULL, ...)

if_any(.cols = everything(), .fns = NULL, ...)

Arguments

`.cols`	Selection of columns
`.fns`	Function to create filter conditions
`...`	Other arguments passed to the function

Examples

iris %>%
  filter(if_any(ends_with("Width"), ~ .x > 4))

iris %>%
  filter(if_all(ends_with("Width"), ~ .x > 2))
iris %>%
  filter(if_any(ends_with("Width"), ~ .x > 4))

iris %>%
  filter(if_all(ends_with("Width"), ~ .x > 2))

Fast if_else

Description

Fast version of base::ifelse().

Usage

if_else(condition, true, false, missing = NA, ..., ptype = NULL, size = NULL)
if_else(condition, true, false, missing = NA, ..., ptype = NULL, size = NULL)

Arguments

`condition`	Conditions to test on
`true`	Values to return if conditions evaluate to `TRUE`
`false`	Values to return if conditions evaluate to `FALSE`
`missing`	Value to return if an element of test is `NA`
`...`	These dots are for future extensions and must be empty.
`ptype`	Optional ptype to override output type
`size`	Optional size to override output size

Examples

x <- 1:5
if_else(x < 3, 1, 0)

# Can also be used inside of mutate()
df <- data.table(x = x)

df %>%
  mutate(new_col = if_else(x < 3, 1, 0))
x <- 1:5
if_else(x < 3, 1, 0)

# Can also be used inside of mutate()
df <- data.table(x = x)

df %>%
  mutate(new_col = if_else(x < 3, 1, 0))

Run invisible garbage collection

Description

Run garbage collection without the gc() output. Can also be run in the middle of a long pipe chain. Useful for large datasets or when using parallel processing.

Usage

inv_gc(x)
inv_gc(x)

Arguments

`x`	Optional. If missing runs `gc()` silently. Else returns the same object unaltered.

Examples

# Can be run with no input
inv_gc()

df <- tidytable(col1 = 1, col2 = 2)

# Or can be used in the middle of a pipe chain (object is unaltered)
df %>%
  filter(col1 < 2, col2 < 4) %>%
  inv_gc() %>%
  select(col1)
# Can be run with no input
inv_gc()

df <- tidytable(col1 = 1, col2 = 2)

# Or can be used in the middle of a pipe chain (object is unaltered)
df %>%
  filter(col1 < 2, col2 < 4) %>%
  inv_gc() %>%
  select(col1)

Check if the tidytable is grouped

Description

Check if the tidytable is grouped

Usage

is_grouped_df(x)
is_grouped_df(x)

Arguments

x

An object

Examples

df <- data.table(
  a = 1:3,
  b = c("a", "a", "b")
)

df %>%
  group_by(b) %>%
  is_grouped_df()
df <- data.table(
  a = 1:3,
  b = c("a", "a", "b")
)

df %>%
  group_by(b) %>%
  is_grouped_df()

Test if the object is a tidytable

Description

This function returns TRUE for tidytables or subclasses of tidytables, and FALSE for all other objects.

Usage

is_tidytable(x)
is_tidytable(x)

Arguments

x

An object

Examples

df <- data.frame(x = 1:3, y = 1:3)

is_tidytable(df)

df <- tidytable(x = 1:3, y = 1:3)

is_tidytable(df)
df <- data.frame(x = 1:3, y = 1:3)

is_tidytable(df)

df <- tidytable(x = 1:3, y = 1:3)

is_tidytable(df)

Get lagging or leading values

Description

Find the "previous" or "next" values in a vector. Useful for comparing values behind or ahead of the current values.

Usage

lag(x, n = 1L, default = NA)

lead(x, n = 1L, default = NA)
lag(x, n = 1L, default = NA)

lead(x, n = 1L, default = NA)

Arguments

`x`	a vector of values
`n`	a positive integer of length 1, giving the number of positions to lead or lag by
`default`	value used for non-existent rows. Defaults to NA.

Examples

x <- 1:5

lag(x, 1)
lead(x, 1)

# Also works inside of `mutate()`
df <- tidytable(x = 1:5)

df %>%
  mutate(lag_x = lag(x))

x <- 1:5

lag(x, 1)
lead(x, 1)

# Also works inside of `mutate()`
df <- tidytable(x = 1:5)

df %>%
  mutate(lag_x = lag(x))

Join two data.tables together

Description

Join two data.tables together

Usage

left_join(x, y, by = NULL, suffix = c(".x", ".y"), ..., keep = FALSE)

right_join(x, y, by = NULL, suffix = c(".x", ".y"), ..., keep = FALSE)

inner_join(x, y, by = NULL, suffix = c(".x", ".y"), ..., keep = FALSE)

full_join(x, y, by = NULL, suffix = c(".x", ".y"), ..., keep = FALSE)

anti_join(x, y, by = NULL)

semi_join(x, y, by = NULL)
left_join(x, y, by = NULL, suffix = c(".x", ".y"), ..., keep = FALSE)

right_join(x, y, by = NULL, suffix = c(".x", ".y"), ..., keep = FALSE)

inner_join(x, y, by = NULL, suffix = c(".x", ".y"), ..., keep = FALSE)

full_join(x, y, by = NULL, suffix = c(".x", ".y"), ..., keep = FALSE)

anti_join(x, y, by = NULL)

semi_join(x, y, by = NULL)

Arguments

`x`	A data.frame or data.table
`y`	A data.frame or data.table
`by`	A character vector of variables to join by. If NULL, the default, the join will do a natural join, using all variables with common names across the two tables.
`suffix`	Append created for duplicated column names when using `full_join()`
`...`	Other parameters passed on to methods
`keep`	Should the join keys from both `x` and `y` be preserved in the output?

Examples

df1 <- data.table(x = c("a", "a", "b", "c"), y = 1:4)
df2 <- data.table(x = c("a", "b"), z = 5:6)

df1 %>% left_join(df2)
df1 %>% inner_join(df2)
df1 %>% right_join(df2)
df1 %>% full_join(df2)
df1 %>% anti_join(df2)
df1 <- data.table(x = c("a", "a", "b", "c"), y = 1:4)
df2 <- data.table(x = c("a", "b"), z = 5:6)

df1 %>% left_join(df2)
df1 %>% inner_join(df2)
df1 %>% right_join(df2)
df1 %>% full_join(df2)
df1 %>% anti_join(df2)

Apply a function to each element of a vector or list

Description

The map functions transform their input by applying a function to each element and returning a list/vector/data.table.

map() returns a list
⁠_lgl()⁠, ⁠_int⁠, ⁠_dbl⁠,⁠_chr⁠, ⁠_df⁠ variants return their specified type
⁠_dfr⁠ & ⁠_dfc⁠ Return all data frame results combined utilizing row or column binding

Usage

map(.x, .f, ...)

map_lgl(.x, .f, ...)

map_int(.x, .f, ...)

map_dbl(.x, .f, ...)

map_chr(.x, .f, ...)

map_dfc(.x, .f, ...)

map_dfr(.x, .f, ..., .id = NULL)

map_df(.x, .f, ..., .id = NULL)

walk(.x, .f, ...)

map_vec(.x, .f, ..., .ptype = NULL)

map2(.x, .y, .f, ...)

map2_lgl(.x, .y, .f, ...)

map2_int(.x, .y, .f, ...)

map2_dbl(.x, .y, .f, ...)

map2_chr(.x, .y, .f, ...)

map2_dfc(.x, .y, .f, ...)

map2_dfr(.x, .y, .f, ..., .id = NULL)

map2_df(.x, .y, .f, ..., .id = NULL)

map2_vec(.x, .y, .f, ..., .ptype = NULL)

pmap(.l, .f, ...)

pmap_lgl(.l, .f, ...)

pmap_int(.l, .f, ...)

pmap_dbl(.l, .f, ...)

pmap_chr(.l, .f, ...)

pmap_dfc(.l, .f, ...)

pmap_dfr(.l, .f, ..., .id = NULL)

pmap_df(.l, .f, ..., .id = NULL)

pmap_vec(.l, .f, ..., .ptype = NULL)
map(.x, .f, ...)

map_lgl(.x, .f, ...)

map_int(.x, .f, ...)

map_dbl(.x, .f, ...)

map_chr(.x, .f, ...)

map_dfc(.x, .f, ...)

map_dfr(.x, .f, ..., .id = NULL)

map_df(.x, .f, ..., .id = NULL)

walk(.x, .f, ...)

map_vec(.x, .f, ..., .ptype = NULL)

map2(.x, .y, .f, ...)

map2_lgl(.x, .y, .f, ...)

map2_int(.x, .y, .f, ...)

map2_dbl(.x, .y, .f, ...)

map2_chr(.x, .y, .f, ...)

map2_dfc(.x, .y, .f, ...)

map2_dfr(.x, .y, .f, ..., .id = NULL)

map2_df(.x, .y, .f, ..., .id = NULL)

map2_vec(.x, .y, .f, ..., .ptype = NULL)

pmap(.l, .f, ...)

pmap_lgl(.l, .f, ...)

pmap_int(.l, .f, ...)

pmap_dbl(.l, .f, ...)

pmap_chr(.l, .f, ...)

pmap_dfc(.l, .f, ...)

pmap_dfr(.l, .f, ..., .id = NULL)

pmap_df(.l, .f, ..., .id = NULL)

pmap_vec(.l, .f, ..., .ptype = NULL)

Arguments

`.x`	A list or vector
`.f`	A function
`...`	Other arguments to pass to a function
`.id`	Whether `map_dfr()` should add an id column to the finished dataset
`.ptype`	ptype for resulting vector in `map_vec()`
`.y`	A list or vector
`.l`	A list to use in `pmap`

Examples

map(c(1,2,3), ~ .x + 1)

map_dbl(c(1,2,3), ~ .x + 1)

map_chr(c(1,2,3), as.character)
map(c(1,2,3), ~ .x + 1)

map_dbl(c(1,2,3), ~ .x + 1)

map_chr(c(1,2,3), as.character)

Add/modify/delete columns

Description

With mutate() you can do 3 things:

Add new columns
Modify existing columns
Delete columns

Usage

mutate(
  .df,
  ...,
  .by = NULL,
  .keep = c("all", "used", "unused", "none"),
  .before = NULL,
  .after = NULL
)
mutate(
  .df,
  ...,
  .by = NULL,
  .keep = c("all", "used", "unused", "none"),
  .before = NULL,
  .after = NULL
)

Arguments

`.df`	A data.frame or data.table
`...`	Columns to add/modify
`.by`	Columns to group by
`.keep`	experimental: This is an experimental argument that allows you to control which columns from `.df` are retained in the output: `"all"`, the default, retains all variables. `"used"` keeps any variables used to make new variables; it's useful for checking your work as it displays inputs and outputs side-by-side. `"unused"` keeps only existing variables not used to make new variables. `"none"`, only keeps grouping keys (like `transmute()`).
`.before`, `.after`	Optionally indicate where new columns should be placed. Defaults to the right side of the data frame.

Examples

df <- data.table(
  a = 1:3,
  b = 4:6,
  c = c("a", "a", "b")
)

df %>%
  mutate(double_a = a * 2,
         a_plus_b = a + b)

df %>%
  mutate(double_a = a * 2,
         avg_a = mean(a),
         .by = c)

df %>%
  mutate(double_a = a * 2, .keep = "used")

df %>%
  mutate(double_a = a * 2, .after = a)
df <- data.table(
  a = 1:3,
  b = 4:6,
  c = c("a", "a", "b")
)

df %>%
  mutate(double_a = a * 2,
         a_plus_b = a + b)

df %>%
  mutate(double_a = a * 2,
         avg_a = mean(a),
         .by = c)

df %>%
  mutate(double_a = a * 2, .keep = "used")

df %>%
  mutate(double_a = a * 2, .after = a)

Add/modify columns by row

Description

Allows you to mutate "by row". this is most useful when a vectorized function doesn't exist.

Usage

mutate_rowwise(
  .df,
  ...,
  .keep = c("all", "used", "unused", "none"),
  .before = NULL,
  .after = NULL
)
mutate_rowwise(
  .df,
  ...,
  .keep = c("all", "used", "unused", "none"),
  .before = NULL,
  .after = NULL
)

Arguments

`.df`	A data.table or data.frame
`...`	Columns to add/modify
`.keep`	experimental: This is an experimental argument that allows you to control which columns from `.df` are retained in the output: `"all"`, the default, retains all variables. `"used"` keeps any variables used to make new variables; it's useful for checking your work as it displays inputs and outputs side-by-side. `"unused"` keeps only existing variables not used to make new variables. `"none"`, only keeps grouping keys (like `transmute()`).
`.before`, `.after`	Optionally indicate where new columns should be placed. Defaults to the right side of the data frame.

Examples

df <- data.table(x = 1:3, y = 1:3 * 2, z = 1:3 * 3)

# Compute the mean of x, y, z in each row
df %>%
  mutate_rowwise(row_mean = mean(c(x, y, z)))

# Use c_across() to more easily select many variables
df %>%
  mutate_rowwise(row_mean = mean(c_across(x:z)))
df <- data.table(x = 1:3, y = 1:3 * 2, z = 1:3 * 3)

# Compute the mean of x, y, z in each row
df %>%
  mutate_rowwise(row_mean = mean(c(x, y, z)))

# Use c_across() to more easily select many variables
df %>%
  mutate_rowwise(row_mean = mean(c_across(x:z)))

Number of observations in each group

Description

Helper function that can be used to find counts by group.

Can be used inside summarize(), mutate(), & filter()

Usage

n()
n()

Examples

df <- data.table(
  x = 1:3,
  y = 4:6,
  z = c("a","a","b")
 )

df %>%
  summarize(count = n(), .by = z)
df <- data.table(
  x = 1:3,
  y = 4:6,
  z = c("a","a","b")
 )

df %>%
  summarize(count = n(), .by = z)

Count the number of unique values in a vector

Description

This is a faster version of length(unique(x)) that calls data.table::uniqueN().

Usage

n_distinct(..., na.rm = FALSE)
n_distinct(..., na.rm = FALSE)

Arguments

`...`	vectors of values
`na.rm`	If `TRUE` missing values don't count

Examples

x <- sample(1:10, 1e5, rep = TRUE)
n_distinct(x)
x <- sample(1:10, 1e5, rep = TRUE)
n_distinct(x)

Convert values to `NA`

Description

Convert values to NA.

Usage

na_if(x, y)
na_if(x, y)

Arguments

`x`	A vector
`y`	Value to replace with `NA`

Examples

vec <- 1:3
na_if(vec, 3)
vec <- 1:3
na_if(vec, 3)

Nest columns into a list-column

Description

Nest columns into a list-column

Usage

nest(.df, ..., .by = NULL, .key = NULL, .names_sep = NULL)
nest(.df, ..., .by = NULL, .key = NULL, .names_sep = NULL)

Arguments

`.df`	A data.table or data.frame
`...`	Columns to be nested.
`.by`	Columns to nest by
`.key`	New column name if `.by` is used
`.names_sep`	If NULL, the names will be left alone. If a string, the names of the columns will be created by pasting together the inner column names and the outer column names.

Examples

df <- data.table(
  a = 1:3,
  b = 1:3,
  c = c("a", "a", "b"),
  d = c("a", "a", "b")
)

df %>%
  nest(data = c(a, b))

df %>%
  nest(data = where(is.numeric))

df %>%
  nest(.by = c(c, d))
df <- data.table(
  a = 1:3,
  b = 1:3,
  c = c("a", "a", "b"),
  d = c("a", "a", "b")
)

df %>%
  nest(data = c(a, b))

df %>%
  nest(data = where(is.numeric))

df %>%
  nest(.by = c(c, d))

Nest data.tables

Description

Nest data.tables by group.

Note: nest_by() does not return a rowwise tidytable.

Usage

nest_by(.df, ..., .key = "data", .keep = FALSE)
nest_by(.df, ..., .key = "data", .keep = FALSE)

Arguments

`.df`	A data.frame or data.table
`...`	Columns to group by. If empty nests the entire data.table. `tidyselect` compatible.
`.key`	Name of the new column created by nesting.
`.keep`	Should the grouping columns be kept in the list column.

Examples

df <- data.table(
  a = 1:5,
  b = 6:10,
  c = c(rep("a", 3), rep("b", 2)),
  d = c(rep("a", 3), rep("b", 2))
)

df %>%
  nest_by()

df %>%
  nest_by(c, d)

df %>%
  nest_by(where(is.character))

df %>%
  nest_by(c, d, .keep = TRUE)
df <- data.table(
  a = 1:5,
  b = 6:10,
  c = c(rep("a", 3), rep("b", 2)),
  d = c(rep("a", 3), rep("b", 2))
)

df %>%
  nest_by()

df %>%
  nest_by(c, d)

df %>%
  nest_by(where(is.character))

df %>%
  nest_by(c, d, .keep = TRUE)

Nest join

Description

Join the data from y as a list column onto x.

Usage

nest_join(x, y, by = NULL, keep = FALSE, name = NULL, ...)
nest_join(x, y, by = NULL, keep = FALSE, name = NULL, ...)

Arguments

`x`	A data.frame or data.table
`y`	A data.frame or data.table
`by`	A character vector of variables to join by. If NULL, the default, the join will do a natural join, using all variables with common names across the two tables.
`keep`	Should the join keys from both `x` and `y` be preserved in the output?
`name`	The name of the list-column created by the join. If `NULL` the name of `y` is used.
`...`	Other parameters passed on to methods

Examples

df1 <- tidytable(x = 1:3)
df2 <- tidytable(x = c(2, 3, 3), y = c("a", "b", "c"))

out <- nest_join(df1, df2)
out
out$df2
df1 <- tidytable(x = 1:3)
df2 <- tidytable(x = c(2, 3, 3), y = c("a", "b", "c"))

out <- nest_join(df1, df2)
out
out$df2

Create a tidytable from a list

Description

Create a tidytable from a list

Usage

new_tidytable(x = list())
new_tidytable(x = list())

Arguments

`x`	A named list of equal-length vectors. The lengths are not checked; it is the responsibility of the caller to make sure they are equal.

Examples

l <- list(x = 1:3, y = c("a", "a", "b"))

new_tidytable(l)
l <- list(x = 1:3, y = c("a", "a", "b"))

new_tidytable(l)

Selection version of `across()`

Description

Select a subset of columns from within functions like mutate(), summarize(), or filter().

Usage

pick(...)
pick(...)

Arguments

...

Columns to select. Tidyselect compatible.

Examples

df <- tidytable(
  x = 1:3,
  y = 4:6,
  z = c("a", "a", "b")
)

df %>%
  mutate(row_sum = rowSums(pick(x, y)))
df <- tidytable(
  x = 1:3,
  y = 4:6,
  z = c("a", "a", "b")
)

df %>%
  mutate(row_sum = rowSums(pick(x, y)))

Pivot data from wide to long

Description

pivot_longer() "lengthens" the data, increasing the number of rows and decreasing the number of columns.

Usage

pivot_longer(
  .df,
  cols = everything(),
  names_to = "name",
  values_to = "value",
  names_prefix = NULL,
  names_sep = NULL,
  names_pattern = NULL,
  names_ptypes = NULL,
  names_transform = NULL,
  names_repair = "check_unique",
  values_drop_na = FALSE,
  values_ptypes = NULL,
  values_transform = NULL,
  fast_pivot = FALSE,
  ...
)
pivot_longer(
  .df,
  cols = everything(),
  names_to = "name",
  values_to = "value",
  names_prefix = NULL,
  names_sep = NULL,
  names_pattern = NULL,
  names_ptypes = NULL,
  names_transform = NULL,
  names_repair = "check_unique",
  values_drop_na = FALSE,
  values_ptypes = NULL,
  values_transform = NULL,
  fast_pivot = FALSE,
  ...
)

Arguments

`.df`	A data.table or data.frame
`cols`	Columns to pivot. `tidyselect` compatible.
`names_to`	Name of the new "names" column. Must be a string.
`values_to`	Name of the new "values" column. Must be a string.
`names_prefix`	Remove matching text from the start of selected columns using regex.
`names_sep`	If `names_to` contains multiple values, `names_sep` takes the same specification as `separate()`.
`names_pattern`	If `names_to` contains multiple values, `names_pattern` takes the same specification as `extract()`, a regular expression containing matching groups.
`names_ptypes`, `values_ptypes`	A list of column name-prototype pairs. See “?vctrs::'theory-faq-coercion“' for more info on vctrs coercion.
`names_transform`, `values_transform`	A list of column name-function pairs. Use these arguments if you need to change the types of specific columns.
`names_repair`	Treatment of duplicate names. See `?vctrs::vec_as_names` for options/details.
`values_drop_na`	If TRUE, rows will be dropped that contain NAs.
`fast_pivot`	experimental: Fast pivoting. If `TRUE`, the `names_to` column will be returned as a `factor`, otherwise it will be a `character` column. Defaults to `FALSE` to match tidyverse semantics.
`...`	Additional arguments to passed on to methods.

Examples

df <- data.table(
  x = 1:3,
  y = 4:6,
  z = c("a", "b", "c")
)

df %>%
  pivot_longer(cols = c(x, y))

df %>%
  pivot_longer(cols = -z, names_to = "stuff", values_to = "things")
df <- data.table(
  x = 1:3,
  y = 4:6,
  z = c("a", "b", "c")
)

df %>%
  pivot_longer(cols = c(x, y))

df %>%
  pivot_longer(cols = -z, names_to = "stuff", values_to = "things")

Pivot data from long to wide

Description

"Widens" data, increasing the number of columns and decreasing the number of rows.

Usage

pivot_wider(
  .df,
  names_from = name,
  values_from = value,
  id_cols = NULL,
  names_sep = "_",
  names_prefix = "",
  names_glue = NULL,
  names_sort = FALSE,
  names_repair = "unique",
  values_fill = NULL,
  values_fn = NULL,
  unused_fn = NULL
)
pivot_wider(
  .df,
  names_from = name,
  values_from = value,
  id_cols = NULL,
  names_sep = "_",
  names_prefix = "",
  names_glue = NULL,
  names_sort = FALSE,
  names_repair = "unique",
  values_fill = NULL,
  values_fn = NULL,
  unused_fn = NULL
)

Arguments

`.df`	A data.frame or data.table
`names_from`	A pair of arguments describing which column (or columns) to get the name of the output column `name_from`, and which column (or columns) to get the cell values from `values_from`). `tidyselect` compatible.
`values_from`	A pair of arguments describing which column (or columns) to get the name of the output column `name_from`, and which column (or columns) to get the cell values from `values_from`. `tidyselect` compatible.
`id_cols`	A set of columns that uniquely identifies each observation. Defaults to all columns in the data table except for the columns specified in `names_from` and `values_from`. Typically used when you have additional variables that is directly related. `tidyselect` compatible.
`names_sep`	the separator between the names of the columns
`names_prefix`	prefix to add to the names of the new columns
`names_glue`	Instead of using `names_sep` and `names_prefix`, you can supply a glue specification that uses the `names_from` columns (and special `.value`) to create custom column names
`names_sort`	Should the resulting new columns be sorted.
`names_repair`	Treatment of duplicate names. See `?vctrs::vec_as_names` for options/details.
`values_fill`	If values are missing, what value should be filled in
`values_fn`	Should the data be aggregated before casting? If the formula doesn't identify a single observation for each cell, then aggregation defaults to length with a message.
`unused_fn`	Aggregation function to be applied to unused columns. Default is to ignore unused columns.

Examples

df <- tidytable(
  id = 1,
  names = c("a", "b", "c"),
  vals = 1:3
)

df %>%
  pivot_wider(names_from = names, values_from = vals)

df %>%
  pivot_wider(
    names_from = names, values_from = vals, names_prefix = "new_"
  )
df <- tidytable(
  id = 1,
  names = c("a", "b", "c"),
  vals = 1:3
)

df %>%
  pivot_wider(names_from = names, values_from = vals)

df %>%
  pivot_wider(
    names_from = names, values_from = vals, names_prefix = "new_"
  )

Pull out a single variable

Description

Pull a single variable from a data.table as a vector.

Usage

pull(.df, var = -1, name = NULL)
pull(.df, var = -1, name = NULL)

Arguments

.df

A data.frame or data.table

var

The column to pull from the data.table as:

a variable name
a positive integer giving the column position
a negative integer giving the column position counting from the right

name

Optional - specifies the column to be used as names for the vector.

Examples

df <- data.table(
  x = 1:3,
  y = 1:3
)

# Grab column by name
df %>%
  pull(y)

# Grab column by position
df %>%
  pull(1)

# Defaults to last column
df %>%
  pull()
df <- data.table(
  x = 1:3,
  y = 1:3
)

# Grab column by name
df %>%
  pull(y)

# Grab column by position
df %>%
  pull(1)

# Defaults to last column
df %>%
  pull()

Reframe a data frame

Description

Reframe a data frame. Note this is a simple alias for summarize() that always returns an ungrouped tidytable.

Usage

reframe(.df, ..., .by = NULL)
reframe(.df, ..., .by = NULL)

Arguments

`.df`	A data.frame or data.table
`...`	Aggregations to perform
`.by`	Columns to group by

Examples

mtcars %>%
  reframe(qs = quantile(disp, c(0.25, 0.75)),
          prob = c(0.25, 0.75),
          .by = cyl)
mtcars %>%
  reframe(qs = quantile(disp, c(0.25, 0.75)),
          prob = c(0.25, 0.75),
          .by = cyl)

Relocate a column to a new position

Description

Move a column or columns to a new position

Usage

relocate(.df, ..., .before = NULL, .after = NULL)
relocate(.df, ..., .before = NULL, .after = NULL)

Arguments

`.df`	A data.frame or data.table
`...`	A selection of columns to move. `tidyselect` compatible.
`.before`	Column to move selection before
`.after`	Column to move selection after

Examples

df <- data.table(
  a = 1:3,
  b = 1:3,
  c = c("a", "a", "b"),
  d = c("a", "a", "b")
)

df %>%
  relocate(c, .before = b)

df %>%
  relocate(a, b, .after = c)

df %>%
  relocate(where(is.numeric), .after = c)
df <- data.table(
  a = 1:3,
  b = 1:3,
  c = c("a", "a", "b"),
  d = c("a", "a", "b")
)

df %>%
  relocate(c, .before = b)

df %>%
  relocate(a, b, .after = c)

df %>%
  relocate(where(is.numeric), .after = c)

Rename variables by name

Description

Rename variables from a data.table.

Usage

rename(.df, ...)
rename(.df, ...)

Arguments

`.df`	A data.frame or data.table
`...`	`new_name = old_name` pairs to rename columns

Examples

df <- data.table(x = 1:3, y = 4:6)

df %>%
  rename(new_x = x,
         new_y = y)
df <- data.table(x = 1:3, y = 4:6)

df %>%
  rename(new_x = x,
         new_y = y)

Rename multiple columns

Description

Rename multiple columns with the same transformation

Usage

rename_with(.df, .fn = NULL, .cols = everything(), ...)
rename_with(.df, .fn = NULL, .cols = everything(), ...)

Arguments

`.df`	A data.table or data.frame
`.fn`	Function to transform the names with.
`.cols`	Columns to rename. Defaults to all columns. `tidyselect` compatible.
`...`	Other parameters to pass to the function

Examples

df <- data.table(
  x = 1,
  y = 2,
  double_x = 2,
  double_y = 4
)

df %>%
  rename_with(toupper)

df %>%
  rename_with(~ toupper(.x))

df %>%
  rename_with(~ toupper(.x), .cols = c(x, double_x))
df <- data.table(
  x = 1,
  y = 2,
  double_x = 2,
  double_y = 4
)

df %>%
  rename_with(toupper)

df %>%
  rename_with(~ toupper(.x))

df %>%
  rename_with(~ toupper(.x), .cols = c(x, double_x))

Replace missing values

Description

Replace NAs with specified values

Usage

replace_na(.x, replace)
replace_na(.x, replace)

Arguments

`.x`	A data.frame/data.table or a vector
`replace`	If `.x` is a data frame, a `list()` of replacement values for specified columns. If `.x` is a vector, a single replacement value.

Examples

df <- data.table(
  x = c(1, 2, NA),
  y = c(NA, 1, 2)
)

# Using replace_na() inside mutate()
df %>%
  mutate(x = replace_na(x, 5))

# Using replace_na() on a data frame
df %>%
  replace_na(list(x = 5, y = 0))
df <- data.table(
  x = c(1, 2, NA),
  y = c(NA, 1, 2)
)

# Using replace_na() inside mutate()
df %>%
  mutate(x = replace_na(x, 5))

# Using replace_na() on a data frame
df %>%
  replace_na(list(x = 5, y = 0))

Ranking functions

Description

Ranking functions:

row_number(): Gives other row number if empty. Equivalent to frank(ties.method = "first") if provided a vector.
min_rank(): Equivalent to frank(ties.method = "min")
dense_rank(): Equivalent to frank(ties.method = "dense")
percent_rank(): Ranks by percentage from 0 to 1
cume_dist(): Cumulative distribution

Usage

row_number(x)

min_rank(x)

dense_rank(x)

percent_rank(x)

cume_dist(x)
row_number(x)

min_rank(x)

dense_rank(x)

percent_rank(x)

cume_dist(x)

Arguments

`x`	A vector to rank

Examples

df <- data.table(x = rep(1, 3), y = c("a", "a", "b"))

df %>%
  mutate(row = row_number())
df <- data.table(x = rep(1, 3), y = c("a", "a", "b"))

df %>%
  mutate(row = row_number())

Convert to a rowwise tidytable

Description

Convert to a rowwise tidytable.

Usage

rowwise(.df)
rowwise(.df)

Arguments

.df

A data.frame or data.table

Examples

df <- tidytable(x = 1:3, y = 1:3 * 2, z = 1:3 * 3)

# Compute the mean of x, y, z in each row
df %>%
  rowwise() %>%
  mutate(row_mean = mean(c(x, y, z)))

# Use c_across() to more easily select many variables
df %>%
  rowwise() %>%
  mutate(row_mean = mean(c_across(x:z))) %>%
  ungroup()
df <- tidytable(x = 1:3, y = 1:3 * 2, z = 1:3 * 3)

# Compute the mean of x, y, z in each row
df %>%
  rowwise() %>%
  mutate(row_mean = mean(c(x, y, z)))

# Use c_across() to more easily select many variables
df %>%
  rowwise() %>%
  mutate(row_mean = mean(c_across(x:z))) %>%
  ungroup()

Select or drop columns

Description

Select or drop columns from a data.table

Usage

select(.df, ...)
select(.df, ...)

Arguments

`.df`	A data.frame or data.table
`...`	Columns to select or drop. Use named arguments, e.g. new_name = old_name, to rename selected variables. `tidyselect` compatible.

Examples

df <- data.table(
  x1 = 1:3,
  x2 = 1:3,
  y = c("a", "b", "c"),
  z = c("a", "b", "c")
)

df %>%
  select(x1, y)

df %>%
  select(x1:y)

df %>%
  select(-y, -z)

df %>%
  select(starts_with("x"), z)

df %>%
  select(where(is.character), x1)

df %>%
  select(new = x1, y)
df <- data.table(
  x1 = 1:3,
  x2 = 1:3,
  y = c("a", "b", "c"),
  z = c("a", "b", "c")
)

df %>%
  select(x1, y)

df %>%
  select(x1:y)

df %>%
  select(-y, -z)

df %>%
  select(starts_with("x"), z)

df %>%
  select(where(is.character), x1)

df %>%
  select(new = x1, y)

Separate a character column into multiple columns

Description

Superseded

separate() has been superseded by separate_wider_delim().

Separates a single column into multiple columns using a user supplied separator or regex.

If a separator is not supplied one will be automatically detected.

Note: Using automatic detection or regex will be slower than simple separators such as "," or ".".

Usage

separate(
  .df,
  col,
  into,
  sep = "[^[:alnum:]]+",
  remove = TRUE,
  convert = FALSE,
  ...
)
separate(
  .df,
  col,
  into,
  sep = "[^[:alnum:]]+",
  remove = TRUE,
  convert = FALSE,
  ...
)

Arguments

`.df`	A data frame
`col`	The column to split into multiple columns
`into`	New column names to split into. A character vector. Use `NA` to omit the variable in the output.
`sep`	Separator to split on. Can be specified or detected automatically
`remove`	If TRUE, remove the input column from the output data.table
`convert`	TRUE calls `type.convert()` with `as.is = TRUE` on new columns
`...`	Arguments passed on to methods

Examples

df <- data.table(x = c("a", "a.b", "a.b", NA))

# "sep" can be automatically detected (slower)
df %>%
  separate(x, into = c("c1", "c2"))

# Faster if "sep" is provided
df %>%
  separate(x, into = c("c1", "c2"), sep = ".")
df <- data.table(x = c("a", "a.b", "a.b", NA))

# "sep" can be automatically detected (slower)
df %>%
  separate(x, into = c("c1", "c2"))

# Faster if "sep" is provided
df %>%
  separate(x, into = c("c1", "c2"), sep = ".")

Split a string into rows

Description

If a column contains observations with multiple delimited values, separate them each into their own row.

Usage

separate_longer_delim(.df, cols, delim, ...)
separate_longer_delim(.df, cols, delim, ...)

Arguments

`.df`	A data.frame or data.table
`cols`	Columns to separate
`delim`	Separator delimiting collapsed values
`...`	These dots are for future extensions and must be empty.

Examples

df <- data.table(
  x = 1:3,
  y = c("a", "d,e,f", "g,h"),
  z = c("1", "2,3,4", "5,6")
)

df %>%
  separate_longer_delim(c(y, z), ",")
df <- data.table(
  x = 1:3,
  y = c("a", "d,e,f", "g,h"),
  z = c("1", "2,3,4", "5,6")
)

df %>%
  separate_longer_delim(c(y, z), ",")

Separate a collapsed column into multiple rows

Description

Superseded

separate_rows() has been superseded by separate_longer_delim().

If a column contains observations with multiple delimited values, separate them each into their own row.

Usage

separate_rows(.df, ..., sep = "[^[:alnum:].]+", convert = FALSE)
separate_rows(.df, ..., sep = "[^[:alnum:].]+", convert = FALSE)

Arguments

`.df`	A data.frame or data.table
`...`	Columns to separate across multiple rows. `tidyselect` compatible
`sep`	Separator delimiting collapsed values
`convert`	If TRUE, runs `type.convert()` on the resulting column. Useful if the resulting column should be type integer/double.

Examples

df <- data.table(
  x = 1:3,
  y = c("a", "d,e,f", "g,h"),
  z = c("1", "2,3,4", "5,6")
)

separate_rows(df, y, z)

separate_rows(df, y, z, convert = TRUE)
df <- data.table(
  x = 1:3,
  y = c("a", "d,e,f", "g,h"),
  z = c("1", "2,3,4", "5,6")
)

separate_rows(df, y, z)

separate_rows(df, y, z, convert = TRUE)

Separate a character column into multiple columns

Description

Separates a single column into multiple columns

Usage

separate_wider_delim(
  .df,
  cols,
  delim,
  ...,
  names = NULL,
  names_sep = NULL,
  names_repair = "check_unique",
  too_few = c("align_start", "error"),
  too_many = c("drop", "error"),
  cols_remove = TRUE
)
separate_wider_delim(
  .df,
  cols,
  delim,
  ...,
  names = NULL,
  names_sep = NULL,
  names_repair = "check_unique",
  too_few = c("align_start", "error"),
  too_many = c("drop", "error"),
  cols_remove = TRUE
)

Arguments

`.df`	A data frame
`cols`	Columns to separate
`delim`	Delimiter to separate on
`...`	These dots are for future extensions and must be empty.
`names`	New column names to separate into
`names_sep`	Names separator
`names_repair`	Treatment of duplicate names. See `?vctrs::vec_as_names` for options/details.
`too_few`	What to do when too few column names are supplied
`too_many`	What to do when too many column names are supplied
`cols_remove`	Should old columns be removed

Examples

df <- tidytable(x = c("a", "a_b", "a_b", NA))

df %>%
  separate_wider_delim(x, delim = "_", names = c("left", "right"))

df %>%
  separate_wider_delim(x, delim = "_", names_sep = "")
df <- tidytable(x = c("a", "a_b", "a_b", NA))

df %>%
  separate_wider_delim(x, delim = "_", names = c("left", "right"))

df %>%
  separate_wider_delim(x, delim = "_", names_sep = "")

Separate a character column into multiple columns using regex patterns

Description

Separate a character column into multiple columns using regex patterns

Usage

separate_wider_regex(
  .df,
  cols,
  patterns,
  ...,
  names_sep = NULL,
  names_repair = "check_unique",
  too_few = "error",
  cols_remove = TRUE
)
separate_wider_regex(
  .df,
  cols,
  patterns,
  ...,
  names_sep = NULL,
  names_repair = "check_unique",
  too_few = "error",
  cols_remove = TRUE
)

Arguments

`.df`	A data frame
`cols`	Columns to separate
`patterns`	patterns
`...`	These dots are for future extensions and must be empty.
`names_sep`	Names separator
`names_repair`	Treatment of duplicate names. See `?vctrs::vec_as_names` for options/details.
`too_few`	What to do when too few column names are supplied
`cols_remove`	Should old columns be removed

Examples

df <- tidytable(id = 1:3, x = c("m-123", "f-455", "f-123"))

df %>%
  separate_wider_regex(x, c(gender = ".", ".", unit = "\\d+"))
df <- tidytable(id = 1:3, x = c("m-123", "f-455", "f-123"))

df %>%
  separate_wider_regex(x, c(gender = ".", ".", unit = "\\d+"))

Choose rows in a data.table

Description

Choose rows in a data.table. Grouped data.tables grab rows within each group.

Usage

slice_head(.df, n = 5, ..., .by = NULL, by = NULL)

slice_tail(.df, n = 5, ..., .by = NULL, by = NULL)

slice_max(.df, order_by, n = 1, ..., with_ties = TRUE, .by = NULL, by = NULL)

slice_min(.df, order_by, n = 1, ..., with_ties = TRUE, .by = NULL, by = NULL)

slice(.df, ..., .by = NULL)

slice_sample(
  .df,
  n,
  prop,
  weight_by = NULL,
  replace = FALSE,
  .by = NULL,
  by = NULL
)
slice_head(.df, n = 5, ..., .by = NULL, by = NULL)

slice_tail(.df, n = 5, ..., .by = NULL, by = NULL)

slice_max(.df, order_by, n = 1, ..., with_ties = TRUE, .by = NULL, by = NULL)

slice_min(.df, order_by, n = 1, ..., with_ties = TRUE, .by = NULL, by = NULL)

slice(.df, ..., .by = NULL)

slice_sample(
  .df,
  n,
  prop,
  weight_by = NULL,
  replace = FALSE,
  .by = NULL,
  by = NULL
)

Arguments

`.df`	A data.frame or data.table
`n`	Number of rows to grab
`...`	Integer row values
`.by`, `by`	Columns to group by
`order_by`	Variable to arrange by
`with_ties`	Should ties be kept together. The default `TRUE` may return can return multiple rows if they are equal. Use `FALSE` to ignore ties.
`prop`	The proportion of rows to select
`weight_by`	Sampling weights
`replace`	Should sampling be performed with (`TRUE`) or without (`FALSE`, default) replacement

Examples

df <- data.table(
  x = 1:4,
  y = 5:8,
  z = c("a", "a", "a", "b")
)

df %>%
  slice(1:3)

df %>%
  slice(1, 3)

df %>%
  slice(1:2, .by = z)

df %>%
  slice_head(1, .by = z)

df %>%
  slice_tail(1, .by = z)

df %>%
  slice_max(order_by = x, .by = z)

df %>%
  slice_min(order_by = y, .by = z)
df <- data.table(
  x = 1:4,
  y = 5:8,
  z = c("a", "a", "a", "b")
)

df %>%
  slice(1:3)

df %>%
  slice(1, 3)

df %>%
  slice(1:2, .by = z)

df %>%
  slice_head(1, .by = z)

df %>%
  slice_tail(1, .by = z)

df %>%
  slice_max(order_by = x, .by = z)

df %>%
  slice_min(order_by = y, .by = z)

Aggregate data using summary statistics

Description

Aggregate data using summary statistics such as mean or median. Can be calculated by group.

Usage

summarize(
  .df,
  ...,
  .by = NULL,
  .sort = TRUE,
  .groups = "drop_last",
  .unpack = FALSE
)

summarise(
  .df,
  ...,
  .by = NULL,
  .sort = TRUE,
  .groups = "drop_last",
  .unpack = FALSE
)
summarize(
  .df,
  ...,
  .by = NULL,
  .sort = TRUE,
  .groups = "drop_last",
  .unpack = FALSE
)

summarise(
  .df,
  ...,
  .by = NULL,
  .sort = TRUE,
  .groups = "drop_last",
  .unpack = FALSE
)

Arguments

`.df`	A data.frame or data.table
`...`	Aggregations to perform
`.by`	Columns to group by. A single column can be passed with `.by = d`. Multiple columns can be passed with `.by = c(c, d)` `tidyselect` can be used: Single predicate: `.by = where(is.character)` Multiple predicates: `.by = c(where(is.character), where(is.factor))` A combination of predicates and column names: `.by = c(where(is.character), b)`
`.sort`	experimental: Default `TRUE`. If FALSE the original order of the grouping variables will be preserved.
`.groups`	Grouping structure of the result "drop_last": Drop the last level of grouping "drop": Drop all groups "keep": Keep all groups
`.unpack`	experimental: Default `FALSE`. Should unnamed data frame inputs be unpacked. The user must opt in to this option as it can lead to a reduction in performance.

Examples

df <- data.table(
  a = 1:3,
  b = 4:6,
  c = c("a", "a", "b"),
  d = c("a", "a", "b")
)

df %>%
  summarize(avg_a = mean(a),
            max_b = max(b),
            .by = c)

df %>%
  summarize(avg_a = mean(a),
            .by = c(c, d))
df <- data.table(
  a = 1:3,
  b = 4:6,
  c = c("a", "a", "b"),
  d = c("a", "a", "b")
)

df %>%
  summarize(avg_a = mean(a),
            max_b = max(b),
            .by = c)

df %>%
  summarize(avg_a = mean(a),
            .by = c(c, d))

Build a data.table/tidytable

Description

Constructs a data.table, but one with nice printing features.

Usage

tidytable(..., .name_repair = "unique")
tidytable(..., .name_repair = "unique")

Arguments

`...`	A set of name-value pairs
`.name_repair`	Treatment of duplicate names. See `?vctrs::vec_as_names` for options/details.

Examples

tidytable(x = 1:3, y = c("a", "a", "b"))
tidytable(x = 1:3, y = c("a", "a", "b"))

Select top (or bottom) n rows (by value)

Description

Select the top or bottom entries in each group, ordered by wt.

Usage

top_n(.df, n = 5, wt = NULL, .by = NULL)
top_n(.df, n = 5, wt = NULL, .by = NULL)

Arguments

`.df`	A data.frame or data.table
`n`	Number of rows to return
`wt`	Optional. The variable to use for ordering. If NULL uses the last column in the data.table.
`.by`	Columns to group by

Examples

df <- data.table(
  x = 1:5,
  y = 6:10,
  z = c(rep("a", 3), rep("b", 2))
)

df %>%
  top_n(2, wt = y)

df %>%
  top_n(2, wt = y, .by = z)
df <- data.table(
  x = 1:5,
  y = 6:10,
  z = c(rep("a", 3), rep("b", 2))
)

df %>%
  top_n(2, wt = y)

df %>%
  top_n(2, wt = y, .by = z)

Add new variables and drop all others

Description

Unlike mutate(), transmute() keeps only the variables that you create

Usage

transmute(.df, ..., .by = NULL)
transmute(.df, ..., .by = NULL)

Arguments

`.df`	A data.frame or data.table
`...`	Columns to create/modify
`.by`	Columns to group by

Examples

df <- data.table(
  a = 1:3,
  b = 4:6,
  c = c("a", "a", "b")
)

df %>%
  transmute(double_a = a * 2)
df <- data.table(
  a = 1:3,
  b = 4:6,
  c = c("a", "a", "b")
)

df %>%
  transmute(double_a = a * 2)

Rowwise tidytable creation

Description

Create a tidytable using a rowwise setup.

Usage

tribble(...)
tribble(...)

Arguments

...

Column names as formulas, values below. See example.

Examples

tribble(
  ~ x, ~ y,
  "a", 1,
  "b", 2,
  "c", 3
)
tribble(
  ~ x, ~ y,
  "a", 1,
  "b", 2,
  "c", 3
)

Uncount a data.table

Description

Uncount a data.table

Usage

uncount(.df, weights, .remove = TRUE, .id = NULL)
uncount(.df, weights, .remove = TRUE, .id = NULL)

Arguments

`.df`	A data.frame or data.table
`weights`	A column containing the weights to uncount by
`.remove`	If TRUE removes the selected `weights` column
`.id`	A string name for a new column containing a unique identifier for the newly uncounted rows.

Examples

df <- data.table(x = c("a", "b"), n = c(1, 2))

uncount(df, n)

uncount(df, n, .id = "id")
df <- data.table(x = c("a", "b"), n = c(1, 2))

uncount(df, n)

uncount(df, n, .id = "id")

Unite multiple columns by pasting strings together

Description

Convenience function to paste together multiple columns into one.

Usage

unite(.df, col = ".united", ..., sep = "_", remove = TRUE, na.rm = FALSE)
unite(.df, col = ".united", ..., sep = "_", remove = TRUE, na.rm = FALSE)

Arguments

`.df`	A data.frame or data.table
`col`	Name of the new column, as a string.
`...`	Selection of columns. If empty all variables are selected. `tidyselect` compatible.
`sep`	Separator to use between values
`remove`	If TRUE, removes input columns from the data.table.
`na.rm`	If TRUE, NA values will be not be part of the concatenation

Examples

df <- tidytable(
    a = c("a", "a", "a"),
    b = c("b", "b", "b"),
    c = c("c", "c", NA)
)

df %>%
  unite("new_col", b, c)

df %>%
  unite("new_col", where(is.character))

df %>%
  unite("new_col", b, c, remove = FALSE)

df %>%
  unite("new_col", b, c, na.rm = TRUE)

df %>%
  unite()
df <- tidytable(
    a = c("a", "a", "a"),
    b = c("b", "b", "b"),
    c = c("c", "c", NA)
)

df %>%
  unite("new_col", b, c)

df %>%
  unite("new_col", where(is.character))

df %>%
  unite("new_col", b, c, remove = FALSE)

df %>%
  unite("new_col", b, c, na.rm = TRUE)

df %>%
  unite()

Unnest list-columns

Description

Unnest list-columns.

Usage

unnest(
  .df,
  ...,
  keep_empty = FALSE,
  .drop = TRUE,
  names_sep = NULL,
  names_repair = "unique"
)
unnest(
  .df,
  ...,
  keep_empty = FALSE,
  .drop = TRUE,
  names_sep = NULL,
  names_repair = "unique"
)

Arguments

`.df`	A data.table
`...`	Columns to unnest If empty, unnests all list columns. `tidyselect` compatible.
`keep_empty`	Return `NA` for any `NULL` elements of the list column
`.drop`	Should list columns that were not unnested be dropped
`names_sep`	If NULL, the default, the inner column names will become the new outer column names. If a string, the name of the outer column will be appended to the beginning of the inner column names, with `names_sep` used as a separator.
`names_repair`	Treatment of duplicate names. See `?vctrs::vec_as_names` for options/details.

Examples

df1 <- tidytable(x = 1:3, y = 1:3)
df2 <- tidytable(x = 1:2, y = 1:2)
nested_df <-
  data.table(
    a = c("a", "b"),
    frame_list = list(df1, df2),
    vec_list = list(4:6, 7:8)
  )

nested_df %>%
  unnest(frame_list)

nested_df %>%
  unnest(frame_list, names_sep = "_")

nested_df %>%
  unnest(frame_list, vec_list)
df1 <- tidytable(x = 1:3, y = 1:3)
df2 <- tidytable(x = 1:2, y = 1:2)
nested_df <-
  data.table(
    a = c("a", "b"),
    frame_list = list(df1, df2),
    vec_list = list(4:6, 7:8)
  )

nested_df %>%
  unnest(frame_list)

nested_df %>%
  unnest(frame_list, names_sep = "_")

nested_df %>%
  unnest(frame_list, vec_list)

Unnest a list-column of vectors into regular columns

Description

Turns each element of a list-column into a row.

Usage

unnest_longer(
  .df,
  col,
  values_to = NULL,
  indices_to = NULL,
  indices_include = NULL,
  keep_empty = FALSE,
  names_repair = "check_unique",
  simplify = NULL,
  ptype = NULL,
  transform = NULL
)
unnest_longer(
  .df,
  col,
  values_to = NULL,
  indices_to = NULL,
  indices_include = NULL,
  keep_empty = FALSE,
  names_repair = "check_unique",
  simplify = NULL,
  ptype = NULL,
  transform = NULL
)

Arguments

`.df`	A data.table or data.frame
`col`	Column to unnest
`values_to`	Name of column to store values
`indices_to`	Name of column to store indices
`indices_include`	Should an index column be included? Defaults to `TRUE` when `col` has inner names.
`keep_empty`	Return `NA` for any `NULL` elements of the list column
`names_repair`	Treatment of duplicate names. See `?vctrs::vec_as_names` for options/details.
`simplify`	Currently not supported. Errors if not `NULL`.
`ptype`	Optionally a named list of ptypes declaring the desired output type of each component.
`transform`	Optionally a named list of transformation functions applied to each component.

Examples

df <- tidytable(
  x = 1:3,
  y = list(0, 1:3, 4:5)
)

df %>% unnest_longer(y)
df <- tidytable(
  x = 1:3,
  y = list(0, 1:3, 4:5)
)

df %>% unnest_longer(y)

Unnest a list-column of vectors into a wide data frame

Description

Unnest a list-column of vectors into a wide data frame

Usage

unnest_wider(
  .df,
  col,
  names_sep = NULL,
  simplify = NULL,
  names_repair = "check_unique",
  ptype = NULL,
  transform = NULL
)
unnest_wider(
  .df,
  col,
  names_sep = NULL,
  simplify = NULL,
  names_repair = "check_unique",
  ptype = NULL,
  transform = NULL
)

Arguments

`.df`	A data.table or data.frame
`col`	Column to unnest
`names_sep`	If `NULL`, the default, the names will be left as they are. If a string, the inner and outer names will be pasted together with `names_sep` as the separator.
`simplify`	Currently not supported. Errors if not `NULL`.
`names_repair`	Treatment of duplicate names. See `?vctrs::vec_as_names` for options/details.
`ptype`	Optionally a named list of ptypes declaring the desired output type of each component.
`transform`	Optionally a named list of transformation functions applied to each component.

Examples

df <- tidytable(
  x = 1:3,
  y = list(0, 1:3, 4:5)
)

# Automatically creates names
df %>% unnest_wider(y)

# But you can provide names_sep for increased naming control
df %>% unnest_wider(y, names_sep = "_")
df <- tidytable(
  x = 1:3,
  y = list(0, 1:3, 4:5)
)

# Automatically creates names
df %>% unnest_wider(y)

# But you can provide names_sep for increased naming control
df %>% unnest_wider(y, names_sep = "_")

Package 'tidytable'

Help Index

Fast %in% and ⁠%notin%⁠ operators

Description

Usage

Arguments

Details

Examples

Apply a function across a selection of columns

Description

Usage

Arguments

Examples

Add a count column to the data frame

Description

Usage

Arguments

Examples

Arrange/reorder rows

Description

Usage

Arguments

Examples

Coerce an object to a data.table/tidytable

Description

Usage

Arguments

Examples

Do the values from x fall between the left and right bounds?

Description

Usage

Arguments

Examples

Bind data.tables by row and column

Description

Usage

Arguments

Examples

Combine values from multiple columns

Description

Usage

Arguments

Examples

data.table::fcase() with vectorized default

Description

Usage

Arguments

Examples

Vectorized switch()

Description

Usage

Arguments

Examples

Case when

Description

Usage

Arguments

Examples

Coalesce missing values

Description

Usage

Arguments

Examples

Complete a data.table with missing combinations of data

Description

Usage

Arguments

Examples

Generate a unique id for consecutive values

Description

Usage

Arguments

Examples

Context functions

Description

Usage

Examples

Count observations by group

Description

Usage

Fast `%in%` and `⁠%notin%⁠` operators

`data.table::fcase()` with vectorized default

Vectorized `switch()`