
Speedy functions (mainly from collapse and data.table) to manipulate data frames
Source:R/speedy_functions.R
speedy_functions.Rd
The Tidyverse defines a coherent set of tools to manipulate
data frames that use a non-standard evaluation and sometimes require extra
care. These functions, like mutate()
or summarise()
are defined in the
{dplyr} and {tidyr} packages. The {collapse} package proposes a couple
of functions with similar interface, but with different and much faster code.
For instance, fselect()
is similar to select()
, or fsummarise()
is
similar to summarise()
. Not all functions are implemented, arguments and
argument names differ, and the behavior may be very different, like
frename()
which uses old_name = new_name
, while rename()
uses
new_name = old_name
! The speedy functions all are prefixed with an "s",
like smutate()
, and build on the work initiated in {collapse} to propose
a series of paired functions with the tidy ones. So, smutate()
and
mutate()
are "speedy" and 'tidy" counterparts and they are used in a very
similar, if not identical way. This notation using a "s" prefix is there to
draw the attention on their particularities. Their classes are function
and speedy_fn. Avoid mixing tidy, speedy and non-tidy/speedy functions in
the same pipeline.
This is a global page to present all the speedy functions in one place.
It is not meant to be a clear and detailed help page of all individual "s"
functions. Please, refer to the corresponding help page of the non-"s" paired
function for more details! You can use the {svMisc}'s .?smutate
syntax to
go to the help page of the non-"s" function with a message.
Usage
list_speedy_functions()
sgroup_by(.data, ...)
sungroup(.data, ...)
srename(.data, ...)
srename_with(.data, .fn, .cols = everything(), ...)
sfilter(.data, ...)
sfilter_ungroup(.data, ...)
sselect(.data, ...)
smutate(.data, ..., .keep = "all")
smutate_ungroup(.data, ..., .keep = "all")
stransmute(.data, ...)
stransmute_ungroup(.data, ...)
ssummarise(.data, ...)
sfull_join(x, y, by = NULL, suffix = c(".x", ".y"), copy = FALSE, ...)
sleft_join(x, y, by = NULL, suffix = c(".x", ".y"), copy = FALSE, ...)
sright_join(x, y, by = NULL, suffix = c(".x", ".y"), copy = FALSE, ...)
sinner_join(x, y, by = NULL, suffix = c(".x", ".y"), copy = FALSE, ...)
sbind_rows(..., .id = NULL)
scount(
x,
...,
wt = NULL,
sort = FALSE,
name = NULL,
.drop = dplyr::group_by_drop_default(x),
sort_cat = TRUE,
decreasing = FALSE
)
stally(
x,
wt = NULL,
sort = FALSE,
name = NULL,
sort_cat = TRUE,
decreasing = FALSE
)
sadd_count(
x,
...,
wt = NULL,
sort = FALSE,
name = NULL,
.drop = NULL,
sort_cat = TRUE,
decreasing = FALSE
)
sadd_tally(
x,
wt = NULL,
sort = FALSE,
name = NULL,
sort_cat = TRUE,
decreasing = FALSE
)
sbind_cols(
...,
.name_repair = c("unique", "universal", "check_unique", "minimal")
)
sarrange(.data, ..., .by_group = FALSE)
spull(.data, var = -1, name = NULL, ...)
sdistinct(.data, ..., .keep_all = FALSE)
sdrop_na(data, ...)
sreplace_na(data, replace, ...)
spivot_longer(data, cols, names_to = "name", values_to = "value", ...)
spivot_wider(data, names_from = name, values_from = value, ...)
suncount(data, weights, .remove = TRUE, .id = NULL)
sunite(data, col, ..., sep = "_", remove = TRUE, na.rm = FALSE)
sseparate(
data,
col,
into,
sep = "[^[:alnum:]]+",
remove = TRUE,
convert = FALSE,
...
)
sseparate_rows(data, ..., sep = "[^[:alnum:].]+", convert = FALSE)
sfill(data, ..., .direction = c("down", "up", "downup", "updown"))
sextract(
data,
col,
into,
regex = "([[:alnum:]]+)",
remove = TRUE,
convert = FALSE,
...
)
Arguments
- .data
A data frame (data.frame, data.table or tibble's tbl_df)
- ...
Arguments dependent to the context of the function and most of the time, not evaluated in a standard way (cf. the tidyverse approach).
- .fn
A function to use.
- .cols
The list of the column where to apply the transformation. For the moment, only all existing columns, which means
.cols = everything()
is implemented- .keep
Which columns to keep. The default is
"all"
, possible values are"used"
,"unused"
, or"none"
(seemutate()
).- x
A data frame (data.frame, data.table or tibble's tbl_df).
- y
A second data frame.
- by
A list of names of the columns to use for joining the two data frames.
- suffix
The suffix to the column names to use to differentiate the columns that come from the first or the second data frame. By default it is
c(".x", ".y")
.- copy
This argument is there for compatibility with the "t" matching functions, but it is not used here.
- .id
The name of the column for the origin id, either names if all other arguments are named, or numbers.
- wt
Frequency weights. Can be
NULL
or a variable. Use data masking.- sort
If
TRUE
largest group will be shown on top.- name
The name of the new column in the output (
n
by default, and no existing column must have this name, or an error is generated).4- .drop
Are levels with no observations dropped (
TRUE
by default).- sort_cat
Are levels sorted (
TRUE
by default).- decreasing
Is sorting done in decreasing order (
FALSE
by default)?- .name_repair
How should the name be "repaired" to avoid duplicate column names? See
dplyr::bind_cols()
for more details.- .by_group
Logical. If
TRUE
rows are first arranger by the grouping variables in any.FALSE
by default.- var
A variable specified as a name, a positive or a negative integer (counting from the end). The default is
-1
and returns last variable.- .keep_all
If
TRUE
keep all variables in.data
.- data
A data frame, or for
replace_na()
a vector or a data frame.- replace
If
data
is a vector, a unique value to replaceNA
s, otherwise, a list of values, one per column of the data frame.- cols
A selection of the columns using tidy-select syntax, see
tidyr::pivot_longer()
.- names_to
A character vector with the name or names of the columns for the names.
- values_to
A string with the name of the column that receives the values.
- names_from
The column or columns containing the names (use tidy selection and do not quote the names).
- values_from
Idem for the column or columns that contain the values.
- weights
A vector of weight to use to "uncount"
data
.- .remove
If
TRUE
, andweights
is the name of a column, that column is removed fromdata
.- col
The name quoted or not of the new column with united variable.
- sep
Separator to use between values for united or separated columns.
- remove
If
TRUE
the initial columns that are separated are also removed fromdata
.- na.rm
If
TRUE
,NA
s are eliminated before uniting the values.- into
Name of the new column to put separated variables. Use
NA
for items to drop.- convert
If
'TRUE
resulting values are converted into numeric, integer or logical.- .direction
Direction in which to fill missing data:
"down"
(by default),"up"
, or"downup"
(first down, then up),"updown"
(the opposite).- regex
A regular expression used to extract the desired values (use one group with
(
and)
for each element ofinto
).
Value
See corresponding "non-s" function for the full help page with indication of the return values.
Note
The ssummarise()
function does not support n()
as does
dplyr::summarise()
. You can use fn()
instead, but then, you must give a
variable name as argument. The fn()
alternative can also be used in
summarise()
for homogeneous syntax between the two.
From {dplyr}, the slice()
and slice_xxx()
functions are not added yet
because they are not available for {dbplyr}. Also anti_join()
,
semi_join()
and nest_join()
are not implemented yet.
From {tidyr} expand()
, chop()
, unchop()
, nest()
, unnest()
,
unnest_longer()
, unnest_wider()
, hoist()
, pack()
and unpack()
are
not implemented yet.