loading...

The {svAssert} package provides tools for defensive programming in R. It implements fast, but versatile assertions partly based on {checkmate}. They issue meaningful and rich-formatted error messages using rlang::abort() and cli::cli_abort() in case an assertion fails. cli::cli_abort() is called by an enhanced stop_() function that is using the base R mechanism for message translation in various natural languages. Furthermore, {svAssert} also allows to translate messages from other packages that do not implement translation.

Quick but versatile assertions

Let’s pretend you would like to pass, among other arguments, a numeric vector x to a function that has to calculate somewhere a logarithm. Here is the only relevant part of your function:

my_calc <- function(x, other_args, ...) {
  
  # Some code ...
  
  y <- x # suppose some calculation on x here
  
  # Some more code
  
  (ylog <- log(y))
  
  # Even more code ...

}

Now, you test it.

my_calc(1:10)
#>  [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.7917595 1.9459101
#>  [8] 2.0794415 2.1972246 2.3025851

OK, it works… but what about wrong inputs?

my_calc("text")
#> Error in `log()`:
#> ! non-numeric argument to mathematical function
my_calc(NULL)
#> Error in `log()`:
#> ! non-numeric argument to mathematical function

This one is most probably incorrect:

my_calc(FALSE)
#> [1] -Inf

… and here, you got the calculation, but with a warning and several NaN values (let’s say this is not acceptable for your application and leads to a crash later on):

my_calc(-5:5)
#> Warning in log(y): NaNs produced
#>  [1]       NaN       NaN       NaN       NaN       NaN      -Inf 0.0000000
#>  [8] 0.6931472 1.0986123 1.3862944 1.6094379

Note that the errors and warnings are referring to log(), not my_calc(). They also refer to an y argument that does not appears in the call to my_calc(). To decrypt the error message, you must delve into the code of my_calc()… not nice! Moreover, your function does not catch patent errors, like giving a logical, or a negative number.

Defensive programming aims to safeguard this by catching problematic cases and issuing meaningful error message for the end-user, considering it should not be necessary to understand the internal workings of my_calc() to understand what the problem is. In other terms, you should catch the error as soon as possible (at the very beginning of your my_calc() function), and issue an understandable error for the user.

Base R solution with if (cond) stop()

You can use an if (cond) stop(...) construct to check conditions and stop execution if a condition is not met with a better error message, as in my_calc2() (comments about some more code eliminated, and note that we also test for no missing data).

my_calc2 <- function(x, other_args, ...) {
  # Assertions on 'x'
  if (!is.numeric(x) || anyNA(x) || any(x < 0)) {
    stop("Argument 'x' must be a non negative numeric vector.")
  }
  
  y <- x
  (ylog <- log(y))
}
my_calc2("text")
#> Error in `my_calc2()`:
#> ! Argument 'x' must be a non negative numeric vector.
my_calc2(FALSE)
#> Error in `my_calc2()`:
#> ! Argument 'x' must be a non negative numeric vector.
my_calc2(-5:5)
#> Error in `my_calc2()`:
#> ! Argument 'x' must be a non negative numeric vector.
my_calc2(c(1, NA, 3))
#> Error in `my_calc2()`:
#> ! Argument 'x' must be a non negative numeric vector.

Base R with stopifnot()

This is much better, but we got only a generic error message. Some more details on why the assertion failed would be welcome. stopifnot() both simplifies the code and provides some more details on the reason of the failure:

my_calc3 <- function(x, other_args, ...) {
  stopifnot(is.numeric(x), !anyNA(x), all(x >= 0))
  
  y <- x
  (ylog <- log(y))
}
my_calc3("text")
#> Error in `my_calc3()`:
#> ! is.numeric(x) is not TRUE
my_calc3(FALSE)
#> Error in `my_calc3()`:
#> ! is.numeric(x) is not TRUE
my_calc3(-5:5)
#> Error in `my_calc3()`:
#> ! all(x >= 0) is not TRUE
my_calc3(c(1, NA, 3))
#> Error in `my_calc3()`:
#> ! !anyNA(x) is not TRUE

This is even better. However, it is a pity that stopifnot() does not allow for custom error messages. Also, a little bit more info would be welcome. For instance, it could be useful to indicate the class of x in case it fails is.numeric(x), or perhaps, to point to the first element that is NA in the vector. This is where {svAssert} comes into play.

{svAssert} assertions

Here is how you could do the job with {svAssert}1.

my_calc4 <- function(x, other_args, ...) {
  is_numeric(x, lower = 0, any.missing = FALSE) || stop_is_numeric(x)
  
  y <- x
  (ylog <- log(y))
}
my_calc4("text")
#> Error in `my_calc4()`:
#> ! Can't use argument `x` (a vector of type <character> and 1 element).
#>  Must be of type 'numeric', not 'character'
my_calc4(FALSE)
#> Error in `my_calc4()`:
#> ! Can't use argument `x` (a vector of type <logical> and 1 element).
#>  Must be of type 'numeric', not 'logical'
my_calc4(-5:5)
#> Error in `my_calc4()`:
#> ! Can't use argument `x` (a vector of type <integer> and 11 elements).
#>  Element 1 is not >= 0
my_calc4(c(1, NA, 3))
#> Error in `my_calc4()`:
#> ! Can't use argument `x` (a vector of type <double> and 3 elements).
#>  Contains missing values (element 2)

Now we got a little bit more information in the error messages. {svAssert} assertions are composed of two parts: an is_xxx() that does the tests as fast as possible, and a stop_is_xxx() that computes and throws the error message. It has two advantages over stopifnot(), or fully-fledged assertion, like assert_xxxx() in {checkmate}:

  1. It decouples the test from the error message, for maximum flexibility. The is_xxx() functions return solely TRUE or FALSE, and are also usable in any if (cond) ... else ... construct in a different context (control flow). You can also use whatever code you like to throw the error, if the provided one with stop_is_xxx() does not fit your needs. Note, however, that the stop_is_xxx() functions have additional arguments, like msg=, that allows quite extensive adaptations of the error message.

  2. With two paired functions specialized in their respective tasks, many basic tests can be done as quickly as possible. To assert only if x is numeric for instance, nothing beats is.numeric(x) || stop(...) in term of speed of execution when the assertion is successful. This is because is_numeric() is a primitive in R and runs significantly faster than any regular function call and || (also a primitive) never runs stop(...) when x is numeric.

Assertions with {checkmate}

All-in-one assertions, like assert_xxx() in {checkmate} allow to write shorter code, but are less flexible. You have little freedom to customize the error message2. Checkmate’s assert_numeric() does the job, and is based on the same C code as is_numeric() for the tests.

assert_numeric <- checkmate::assert_numeric
my_calc5 <- function(x, other_args, ...) {
  assert_numeric(x, lower = 0, any.missing = FALSE)
  
  y <- x
  (ylog <- log(y))
}
my_calc5("text")
#> Error in `my_calc5()`:
#> ! Assertion on 'x' failed: Must be of type 'numeric', not 'character'.
my_calc5(FALSE)
#> Error in `my_calc5()`:
#> ! Assertion on 'x' failed: Must be of type 'numeric', not 'logical'.
my_calc5(-5:5)
#> Error in `my_calc5()`:
#> ! Assertion on 'x' failed: Element 1 is not >= 0.
my_calc5(c(1, NA, 3))
#> Error in `my_calc5()`:
#> ! Assertion on 'x' failed: Contains missing values (element 2).

Enhanced stopifnot_()

{svAssert} also provides stopifnot_(), a drop-in replacement of base stopifnot() that can use the stop_is_xxx() functions for more meaningful error messages. Here is how you could use it in my_calc6():

assert_numeric <- checkmate::assert_numeric
my_calc6 <- function(x, other_args, ...) {
  stopifnot_(is.numeric(x), !anyNA(x), all(x >= 0))
  
  y <- x
  (ylog <- log(y))
}
my_calc6("text")
#> Error in `stopifnot_()`:
#> `is.numeric(x)` is not "TRUE"
my_calc6(FALSE)
#> Error in `stopifnot_()`:
#> `is.numeric(x)` is not "TRUE"
my_calc6(-5:5)
#> Error in `stopifnot_()`:
#> ! Invalid `mod` parameter in `stop_less_than()`: "!any".
#>  Allowed values are '', 'any', and 'all'.
#>  This is an internal error, please report it to the package authors.
my_calc6(c(1, NA, 3))
#> Error in `stopifnot_()`:
#> `!anyNA(x)` is not "TRUE"

Impact on performances

It is important that the assertions do not impact too much on the performances (speed and memory consumption) of the function when the inputs are correct. Let’s compare our different versions of my_calc() when assertions pass.

Now, the comparison:

x <- runif(10, min = 1, max = 100)
bench::mark(
  reference  = my_calc(x),
  if_stop    = my_calc2(x),
  stopifnot  = my_calc3(x),
  svAssert   = my_calc4(x),
  checkmate  = my_calc5(x),
  stopifnot_ = my_calc6(x)
)[, c("expression", "min", "median", "itr/sec", "mem_alloc", "gc/sec")]
#> # A tibble: 6 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 reference  601.05ns 650.99ns  1404630.        0B      0  
#> 2 if_stop      1.19µs   1.36µs   670597.        0B     67.1
#> 3 stopifnot    3.88µs   4.42µs   217281.        0B     21.7
#> 4 svAssert     2.87µs   3.21µs   299584.        0B     30.0
#> 5 checkmate    3.99µs   4.52µs   209887.        0B     21.0
#> 6 stopifnot_   3.47µs   4.02µs   238356.        0B     23.8

Minimum impact is here with if (...) stop(...), providing you only use quick primitive functions like is.numeric() / anyNA(), or simple comparisons like x < 0 for testing small objects. The second best is {svAssert}, but checkmate’s assert_numeric() and stopifnot()/stopifnot_() are not far away, and honestly, quite good3. With such a small x, there is no memory impact.

Here is the same tests with a much larger vector:

x <- runif(1e5, min = 1, max = 100)
bench::mark(iterations = 100,
  reference  = my_calc(x),
  if_stop    = my_calc2(x),
  stopifnot  = my_calc3(x),
  svAssert   = my_calc4(x),
  checkmate  = my_calc5(x),
  stopifnot_ = my_calc6(x)
)[, c("expression", "min", "median", "itr/sec", "mem_alloc", "gc/sec")]
#> # A tibble: 6 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 reference     514µs 817.53µs     1250.   781.3KB     12.6
#> 2 if_stop       763µs   1.22ms      884.    1.15MB     27.3
#> 3 stopifnot     778µs   1.28ms      833.    1.15MB     25.8
#> 4 svAssert      612µs 915.21µs     1133.   781.3KB     11.4
#> 5 checkmate     612µs 917.17µs     1145.   781.3KB     23.4
#> 6 stopifnot_    759µs   1.21ms      880.    1.15MB     27.2

The results are quite different. Now, both {svAssert} and {checkmate} are clearly better, both in term of speed and in memory use (still negligible). if (...) stop(...) and stopifnot()/stopifnot_() take significantly more time and have to allocate memory, it is for the x < 0/x >= 0 tests. The trend is increasing with even larger x.

All in all, {svAssert} offers both quick and efficient options for assertions. Only for functions that have to be extremely fast on very small objects, the if (...) stop (...) could be considered as a preferred alternative.

TODO: implement a simpler is_num() function that could be competitive with if (...) stop( ...).

Error message translation

The {svAssert} package provides a mechanism to translate error message after they are thrown for functions and packages that do not implement natively these translations. {checkmate} seems to be one example where their authors are reluctant to translation, see issue #234. Let’s switch to French in R.

Here an a few error messages we got with {checkmate} assert_numeric(), still in English unfortunately:

my_calc5(FALSE)
#> Error in `my_calc5()`:
#> ! Assertion on 'x' failed: Must be of type 'numeric', not 'logical'.
my_calc5(-5:5)
#> Error in `my_calc5()`:
#> ! Assertion on 'x' failed: Element 1 is not >= 0.

Despite {svAssert} is_numeric() is internally based on {checkmate} code, and thus, receives the same untranslated messages, error_numeric() manages to do the translation in French, including for such messages that contain contextual parts (but the {rlang} part of the message -Error in- is not translated yet for now):

my_calc4(FALSE)
#> Error in `my_calc4()`:
#> ! Argument `x` inapproprié (a vector of type <logical> and 1 element}).
#>  Doit être de type <numeric>, et non <logical>.
my_calc4(-5:5)
#> Error in `my_calc4()`:
#> ! Argument `x` inapproprié (a vector of type <integer> and 11
#>   elements}).
#>  L'élément 1 n'est pas >= 0

Also note that the translated message adopt the better rlang::abort() layout (list with two bullets here).


  1. The cond || stop(...) pattern is usually considered as bad code in R, as it is less explicit than if (cond) stop(...). Here we consider it as a distinctive mark of a couple “test || stop_test” that forms an assertion. Of course, you are free to use if instead if you are not convinced.↩︎

  2. {checkmate} also provides check_xxx() and test_xxx() functions. You can use test_xxx() in place of is_xxx() but then, you loose the contextual information. The check_xxx() functions return either TRUE in case of success, or a string with the contextual error message in case of failure, but you have to use more complex construct to manage it, something like: if (msg <- check_numeric(...)) stop(msg).↩︎

  3. There are many other implementations of assertions on CRAN that we do not review here. Some of them have huge impact on the performances! Always balance performance with features while you decide the way you make your assertions. You probably do not want to end up with a function that is significantly slower, uses more memory, or both, than the one with just the code that perform the actual computation.↩︎