vignettes/svAssert.Rmd
svAssert.RmdThe {svAssert} package provides tools for defensive programming in R.
It implements fast, but versatile assertions partly based on
{checkmate}. They issue meaningful and rich-formatted error messages
using rlang::abort() and cli::cli_abort() in
case an assertion fails. cli::cli_abort() is called by an
enhanced stop_() function that is using the base R
mechanism for message translation in various natural languages.
Furthermore, {svAssert} also allows to translate messages from other
packages that do not implement translation.
Let’s pretend you would like to pass, among other arguments, a
numeric vector x to a function that has to calculate
somewhere a logarithm. Here is the only relevant part of your
function:
my_calc <- function(x, other_args, ...) {
# Some code ...
y <- x # suppose some calculation on x here
# Some more code
(ylog <- log(y))
# Even more code ...
}Now, you test it.
my_calc(1:10)
#> [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.7917595 1.9459101
#> [8] 2.0794415 2.1972246 2.3025851OK, it works… but what about wrong inputs?
my_calc("text")
#> Error in `log()`:
#> ! non-numeric argument to mathematical function
my_calc(NULL)
#> Error in `log()`:
#> ! non-numeric argument to mathematical functionThis one is most probably incorrect:
my_calc(FALSE)
#> [1] -Inf… and here, you got the calculation, but with a warning and several
NaN values (let’s say this is not
acceptable for your application and leads to a crash later on):
my_calc(-5:5)
#> Warning in log(y): NaNs produced
#> [1] NaN NaN NaN NaN NaN -Inf 0.0000000
#> [8] 0.6931472 1.0986123 1.3862944 1.6094379Note that the errors and warnings are referring to
log(), not my_calc(). They also refer to an
y argument that does not appears in the call to
my_calc(). To decrypt the error message, you must
delve into the code of my_calc()… not nice! Moreover, your
function does not catch patent errors, like giving a
logical, or a negative number.
Defensive programming aims to safeguard this by
catching problematic cases and issuing meaningful error message for the
end-user, considering it should not be necessary to understand the
internal workings of my_calc() to understand what the
problem is. In other terms, you should catch the error as soon as
possible (at the very beginning of your my_calc()
function), and issue an understandable error for the user.
if (cond) stop()
You can use an if (cond) stop(...) construct to check
conditions and stop execution if a condition is not met with a better
error message, as in my_calc2() (comments about some more
code eliminated, and note that we also test for no missing data).
my_calc2 <- function(x, other_args, ...) {
# Assertions on 'x'
if (!is.numeric(x) || anyNA(x) || any(x < 0)) {
stop("Argument 'x' must be a non negative numeric vector.")
}
y <- x
(ylog <- log(y))
}
my_calc2("text")
#> Error in `my_calc2()`:
#> ! Argument 'x' must be a non negative numeric vector.
my_calc2(FALSE)
#> Error in `my_calc2()`:
#> ! Argument 'x' must be a non negative numeric vector.
my_calc2(-5:5)
#> Error in `my_calc2()`:
#> ! Argument 'x' must be a non negative numeric vector.
my_calc2(c(1, NA, 3))
#> Error in `my_calc2()`:
#> ! Argument 'x' must be a non negative numeric vector.stopifnot()
This is much better, but we got only a generic error message. Some
more details on why the assertion failed would be welcome.
stopifnot() both simplifies the code and provides some more
details on the reason of the failure:
my_calc3 <- function(x, other_args, ...) {
stopifnot(is.numeric(x), !anyNA(x), all(x >= 0))
y <- x
(ylog <- log(y))
}
my_calc3("text")
#> Error in `my_calc3()`:
#> ! is.numeric(x) is not TRUE
my_calc3(FALSE)
#> Error in `my_calc3()`:
#> ! is.numeric(x) is not TRUE
my_calc3(-5:5)
#> Error in `my_calc3()`:
#> ! all(x >= 0) is not TRUE
my_calc3(c(1, NA, 3))
#> Error in `my_calc3()`:
#> ! !anyNA(x) is not TRUEThis is even better. However, it is a pity that
stopifnot() does not allow for custom error messages. Also,
a little bit more info would be welcome. For instance, it could be
useful to indicate the class of x in case it fails
is.numeric(x), or perhaps, to point to the first element
that is NA in the vector. This is where {svAssert} comes
into play.
Here is how you could do the job with {svAssert}1.
my_calc4 <- function(x, other_args, ...) {
is_numeric(x, lower = 0, any.missing = FALSE) || stop_is_numeric(x)
y <- x
(ylog <- log(y))
}
my_calc4("text")
#> Error in `my_calc4()`:
#> ! Can't use argument `x` (a vector of type <character> and 1 element).
#> ℹ Must be of type 'numeric', not 'character'
my_calc4(FALSE)
#> Error in `my_calc4()`:
#> ! Can't use argument `x` (a vector of type <logical> and 1 element).
#> ℹ Must be of type 'numeric', not 'logical'
my_calc4(-5:5)
#> Error in `my_calc4()`:
#> ! Can't use argument `x` (a vector of type <integer> and 11 elements).
#> ℹ Element 1 is not >= 0
my_calc4(c(1, NA, 3))
#> Error in `my_calc4()`:
#> ! Can't use argument `x` (a vector of type <double> and 3 elements).
#> ℹ Contains missing values (element 2)Now we got a little bit more information in the error messages.
{svAssert} assertions are composed of two parts: an
is_xxx() that does the tests as fast as possible, and a
stop_is_xxx() that computes and throws the error message.
It has two advantages over stopifnot(), or fully-fledged
assertion, like assert_xxxx() in {checkmate}:
It decouples the test from the error message, for maximum
flexibility. The is_xxx() functions return solely
TRUE or FALSE, and are also usable in any
if (cond) ... else ... construct in a different context
(control flow). You can also use whatever code you like to throw the
error, if the provided one with stop_is_xxx() does not fit
your needs. Note, however, that the stop_is_xxx() functions
have additional arguments, like msg=, that allows quite
extensive adaptations of the error message.
With two paired functions specialized in their respective tasks,
many basic tests can be done as quickly as possible. To assert only if
x is numeric for instance, nothing beats
is.numeric(x) || stop(...) in term of speed of execution
when the assertion is successful. This is because
is_numeric() is a primitive in R and runs significantly
faster than any regular function call and || (also a
primitive) never runs stop(...) when x
is numeric.
All-in-one assertions, like assert_xxx() in {checkmate}
allow to write shorter code, but are less flexible. You have little
freedom to customize the error message2. Checkmate’s
assert_numeric() does the job, and is based on the same C
code as is_numeric() for the tests.
assert_numeric <- checkmate::assert_numeric
my_calc5 <- function(x, other_args, ...) {
assert_numeric(x, lower = 0, any.missing = FALSE)
y <- x
(ylog <- log(y))
}
my_calc5("text")
#> Error in `my_calc5()`:
#> ! Assertion on 'x' failed: Must be of type 'numeric', not 'character'.
my_calc5(FALSE)
#> Error in `my_calc5()`:
#> ! Assertion on 'x' failed: Must be of type 'numeric', not 'logical'.
my_calc5(-5:5)
#> Error in `my_calc5()`:
#> ! Assertion on 'x' failed: Element 1 is not >= 0.
my_calc5(c(1, NA, 3))
#> Error in `my_calc5()`:
#> ! Assertion on 'x' failed: Contains missing values (element 2).stopifnot_()
{svAssert} also provides stopifnot_(), a drop-in
replacement of base stopifnot() that can use the
stop_is_xxx() functions for more meaningful error messages.
Here is how you could use it in my_calc6():
assert_numeric <- checkmate::assert_numeric
my_calc6 <- function(x, other_args, ...) {
stopifnot_(is.numeric(x), !anyNA(x), all(x >= 0))
y <- x
(ylog <- log(y))
}
my_calc6("text")
#> Error in `stopifnot_()`:
#> `is.numeric(x)` is not "TRUE"
my_calc6(FALSE)
#> Error in `stopifnot_()`:
#> `is.numeric(x)` is not "TRUE"
my_calc6(-5:5)
#> Error in `stopifnot_()`:
#> ! Invalid `mod` parameter in `stop_less_than()`: "!any".
#> ℹ Allowed values are '', 'any', and 'all'.
#> ℹ This is an internal error, please report it to the package authors.
my_calc6(c(1, NA, 3))
#> Error in `stopifnot_()`:
#> `!anyNA(x)` is not "TRUE"It is important that the assertions do not impact too much on the
performances (speed and memory consumption) of the function when the
inputs are correct. Let’s compare our different versions of
my_calc() when assertions pass.
Now, the comparison:
x <- runif(10, min = 1, max = 100)
bench::mark(
reference = my_calc(x),
if_stop = my_calc2(x),
stopifnot = my_calc3(x),
svAssert = my_calc4(x),
checkmate = my_calc5(x),
stopifnot_ = my_calc6(x)
)[, c("expression", "min", "median", "itr/sec", "mem_alloc", "gc/sec")]
#> # A tibble: 6 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 reference 601.05ns 650.99ns 1404630. 0B 0
#> 2 if_stop 1.19µs 1.36µs 670597. 0B 67.1
#> 3 stopifnot 3.88µs 4.42µs 217281. 0B 21.7
#> 4 svAssert 2.87µs 3.21µs 299584. 0B 30.0
#> 5 checkmate 3.99µs 4.52µs 209887. 0B 21.0
#> 6 stopifnot_ 3.47µs 4.02µs 238356. 0B 23.8Minimum impact is here with if (...) stop(...),
providing you only use quick primitive functions like
is.numeric() / anyNA(), or simple comparisons
like x < 0 for testing small objects.
The second best is {svAssert}, but checkmate’s
assert_numeric() and
stopifnot()/stopifnot_() are not far away, and
honestly, quite good3. With such a small x, there is
no memory impact.
Here is the same tests with a much larger vector:
x <- runif(1e5, min = 1, max = 100)
bench::mark(iterations = 100,
reference = my_calc(x),
if_stop = my_calc2(x),
stopifnot = my_calc3(x),
svAssert = my_calc4(x),
checkmate = my_calc5(x),
stopifnot_ = my_calc6(x)
)[, c("expression", "min", "median", "itr/sec", "mem_alloc", "gc/sec")]
#> # A tibble: 6 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 reference 514µs 817.53µs 1250. 781.3KB 12.6
#> 2 if_stop 763µs 1.22ms 884. 1.15MB 27.3
#> 3 stopifnot 778µs 1.28ms 833. 1.15MB 25.8
#> 4 svAssert 612µs 915.21µs 1133. 781.3KB 11.4
#> 5 checkmate 612µs 917.17µs 1145. 781.3KB 23.4
#> 6 stopifnot_ 759µs 1.21ms 880. 1.15MB 27.2The results are quite different. Now, both {svAssert} and {checkmate}
are clearly better, both in term of speed and in memory use (still
negligible). if (...) stop(...) and
stopifnot()/stopifnot_() take significantly
more time and have to allocate memory, it is for the
x < 0/x >= 0 tests. The trend is
increasing with even larger x.
All in all, {svAssert} offers both quick and efficient options
for assertions. Only for functions that have to be
extremely fast on very small objects, the
if (...) stop (...) could be considered as a preferred
alternative.
TODO: implement a simpler is_num() function that could
be competitive with if (...) stop( ...).
The {svAssert} package provides a mechanism to translate error message after they are thrown for functions and packages that do not implement natively these translations. {checkmate} seems to be one example where their authors are reluctant to translation, see issue #234. Let’s switch to French in R.
Sys.setLanguage("fr")Here an a few error messages we got with {checkmate}
assert_numeric(), still in English unfortunately:
my_calc5(FALSE)
#> Error in `my_calc5()`:
#> ! Assertion on 'x' failed: Must be of type 'numeric', not 'logical'.
my_calc5(-5:5)
#> Error in `my_calc5()`:
#> ! Assertion on 'x' failed: Element 1 is not >= 0.Despite {svAssert} is_numeric() is internally based on
{checkmate} code, and thus, receives the same untranslated messages,
error_numeric() manages to do the translation in French,
including for such messages that contain contextual parts (but the
{rlang} part of the message -Error in- is not translated yet for
now):
my_calc4(FALSE)#> Error in `my_calc4()`:
#> ! Argument `x` inapproprié (a vector of type <logical> and 1 element}).
#> ℹ Doit être de type <numeric>, et non <logical>.
my_calc4(-5:5)#> Error in `my_calc4()`:
#> ! Argument `x` inapproprié (a vector of type <integer> and 11
#> elements}).
#> ℹ L'élément 1 n'est pas >= 0
Also note that the translated message adopt the better
rlang::abort() layout (list with two bullets here).
The cond || stop(...) pattern is usually
considered as bad code in R, as it is less explicit than
if (cond) stop(...). Here we consider it as a
distinctive mark of a couple “test || stop_test” that forms an
assertion. Of course, you are free to use if
instead if you are not convinced.↩︎
{checkmate} also provides check_xxx() and
test_xxx() functions. You can use test_xxx()
in place of is_xxx() but then, you loose the contextual
information. The check_xxx() functions return either
TRUE in case of success, or a string with the contextual
error message in case of failure, but you have to use more complex
construct to manage it, something like:
if (msg <- check_numeric(...)) stop(msg).↩︎
There are many other implementations of assertions on CRAN that we do not review here. Some of them have huge impact on the performances! Always balance performance with features while you decide the way you make your assertions. You probably do not want to end up with a function that is significantly slower, uses more memory, or both, than the one with just the code that perform the actual computation.↩︎