Speedy functions (mainly from collapse and data.table) to manipulate data frames

The Tidyverse defines a coherent set of tools to manipulate data frames that use a non-standard evaluation and sometimes require extra care. These functions, like mutate() or summarise() are defined in the {dplyr} and {tidyr} packages. The {collapse} package proposes a couple of functions with similar interface, but with different and much faster code. For instance, fselect() is similar to select(), or fsummarise() is similar to summarise(). Not all functions are implemented, arguments and argument names differ, and the behavior may be very different, like frename() which uses old_name = new_name, while rename() uses new_name = old_name! The speedy functions all are prefixed with an "s", like smutate(), and build on the work initiated in {collapse} to propose a series of paired functions with the tidy ones. So, smutate() and mutate() are "speedy" and 'tidy" counterparts and they are used in a very similar, if not identical way. This notation using a "s" prefix is there to draw the attention on their particularities. Their classes are function and speedy_fn. Avoid mixing tidy, speedy and non-tidy/speedy functions in the same pipeline. This is a global page to present all the speedy functions in one place. It is not meant to be a clear and detailed help page of all individual "s" functions. Please, refer to the corresponding help page of the non-"s" paired function for more details! You can use the {svMisc}'s .?smutate syntax to go to the help page of the non-"s" function with a message.

Usage

list_speedy_functions()

sgroup_by(.data, ...)

sungroup(.data, ...)

srename(.data, ...)

srename_with(.data, .fn, .cols = everything(), ...)

sfilter(.data, ...)

sfilter_ungroup(.data, ...)

sselect(.data, ...)

smutate(.data, ..., .keep = "all")

smutate_ungroup(.data, ..., .keep = "all")

stransmute(.data, ...)

stransmute_ungroup(.data, ...)

ssummarise(.data, ...)

sfull_join(x, y, by = NULL, suffix = c(".x", ".y"), copy = FALSE, ...)

sleft_join(x, y, by = NULL, suffix = c(".x", ".y"), copy = FALSE, ...)

sright_join(x, y, by = NULL, suffix = c(".x", ".y"), copy = FALSE, ...)

sinner_join(x, y, by = NULL, suffix = c(".x", ".y"), copy = FALSE, ...)

sbind_rows(..., .id = NULL)

scount(
  x,
  ...,
  wt = NULL,
  sort = FALSE,
  name = NULL,
  .drop = dplyr::group_by_drop_default(x),
  sort_cat = TRUE,
  decreasing = FALSE
)

stally(
  x,
  wt = NULL,
  sort = FALSE,
  name = NULL,
  sort_cat = TRUE,
  decreasing = FALSE
)

sadd_count(
  x,
  ...,
  wt = NULL,
  sort = FALSE,
  name = NULL,
  .drop = NULL,
  sort_cat = TRUE,
  decreasing = FALSE
)

sadd_tally(
  x,
  wt = NULL,
  sort = FALSE,
  name = NULL,
  sort_cat = TRUE,
  decreasing = FALSE
)

sbind_cols(
  ...,
  .name_repair = c("unique", "universal", "check_unique", "minimal")
)

sarrange(.data, ..., .by_group = FALSE)

spull(.data, var = -1, name = NULL, ...)

sdistinct(.data, ..., .keep_all = FALSE)

sdrop_na(data, ...)

sreplace_na(data, replace, ...)

spivot_longer(data, cols, names_to = "name", values_to = "value", ...)

spivot_wider(data, names_from = name, values_from = value, ...)

suncount(data, weights, .remove = TRUE, .id = NULL)

sunite(data, col, ..., sep = "_", remove = TRUE, na.rm = FALSE)

sseparate(
  data,
  col,
  into,
  sep = "[^[:alnum:]]+",
  remove = TRUE,
  convert = FALSE,
  ...
)

sseparate_rows(data, ..., sep = "[^[:alnum:].]+", convert = FALSE)

sfill(data, ..., .direction = c("down", "up", "downup", "updown"))

sextract(
  data,
  col,
  into,
  regex = "([[:alnum:]]+)",
  remove = TRUE,
  convert = FALSE,
  ...
)

Arguments

.data: A data frame (data.frame, data.table or tibble's tbl_df)
...: Arguments dependent to the context of the function and most of the time, not evaluated in a standard way (cf. the tidyverse approach).
.fn: A function to use.
.cols: The list of the column where to apply the transformation. For the moment, only all existing columns, which means .cols = everything() is implemented
.keep: Which columns to keep. The default is "all", possible values are "used", "unused", or "none" (see mutate()).
x: A data frame (data.frame, data.table or tibble's tbl_df).
y: A second data frame.
by: A list of names of the columns to use for joining the two data frames.
suffix: The suffix to the column names to use to differentiate the columns that come from the first or the second data frame. By default it is c(".x", ".y").
copy: This argument is there for compatibility with the "t" matching functions, but it is not used here.
.id: The name of the column for the origin id, either names if all other arguments are named, or numbers.
wt: Frequency weights. Can be NULL or a variable. Use data masking.
sort: If TRUE largest group will be shown on top.
name: The name of the new column in the output (n by default, and no existing column must have this name, or an error is generated).4
.drop: Are levels with no observations dropped (TRUE by default).
sort_cat: Are levels sorted (TRUE by default).
decreasing: Is sorting done in decreasing order (FALSE by default)?
.name_repair: How should the name be "repaired" to avoid duplicate column names? See dplyr::bind_cols() for more details.
.by_group: Logical. If TRUE rows are first arranger by the grouping variables in any. FALSE by default.
var: A variable specified as a name, a positive or a negative integer (counting from the end). The default is -1 and returns last variable.
.keep_all: If TRUE keep all variables in .data.
data: A data frame, or for replace_na() a vector or a data frame.
replace: If data is a vector, a unique value to replace NAs, otherwise, a list of values, one per column of the data frame.
cols: A selection of the columns using tidy-select syntax, seetidyr::pivot_longer().
names_to: A character vector with the name or names of the columns for the names.
values_to: A string with the name of the column that receives the values.
names_from: The column or columns containing the names (use tidy selection and do not quote the names).
values_from: Idem for the column or columns that contain the values.
weights: A vector of weight to use to "uncount" data.
.remove: If TRUE, and weights is the name of a column, that column is removed from data.
col: The name quoted or not of the new column with united variable.
sep: Separator to use between values for united or separated columns.
remove: If TRUE the initial columns that are separated are also removed from data.
na.rm: If TRUE, NAs are eliminated before uniting the values.
into: Name of the new column to put separated variables. Use NA for items to drop.
convert: If 'TRUE resulting values are converted into numeric, integer or logical.
.direction: Direction in which to fill missing data: "down" (by default), "up", or "downup" (first down, then up), "updown" (the opposite).
regex: A regular expression used to extract the desired values (use one group with ( and ) for each element of into).

Value

See corresponding "non-s" function for the full help page with indication of the return values.

Note

The ssummarise() function does not support n() as does dplyr::summarise(). You can use fn() instead, but then, you must give a variable name as argument. The fn() alternative can also be used in summarise() for homogeneous syntax between the two. From {dplyr}, the slice() and slice_xxx() functions are not added yet because they are not available for {dbplyr}. Also anti_join(), semi_join() and nest_join() are not implemented yet. From {tidyr} expand(), chop(), unchop(), nest(), unnest(), unnest_longer(), unnest_wider(), hoist(), pack() and unpack() are not implemented yet.

Examples

# TODO...