loading...

Read and return an R object from data on disk, from URL, or from packages.

read(
  file,
  type = NULL,
  header = "#",
  header.max = 50L,
  skip = 0L,
  locale = default_locale(),
  lang = getOption("SciViews_lang", "en"),
  lang_encoding = "UTF-8",
  as_dataframe = FALSE,
  as_labelled = FALSE,
  comments = NULL,
  package = NULL,
  sidecar_file = TRUE,
  fun_list = NULL,
  hfun = NULL,
  fun = NULL,
  data,
  cache_file = NULL,
  method = "auto",
  quiet = FALSE,
  force = FALSE,
  ...
)

type_from_extension(file, full = FALSE)

hread_text(file, header.max, skip = 0L, locale = default_locale(), ...)

hread_xls(file, header.max, skip = 0L, locale = default_locale(), ...)

hread_xlsx(file, header.max, skip = 0L, locale = default_locale(), ...)

# S3 method for class 'read_function_subset'
.DollarNames(x, pattern = "")

Arguments

file

The path to the file to read, or the name of the dataset to get from an R package (in that case, you must provide the package= argument).

type

The type (format) of data to read.

header

The character to use for the header and other comments.

header.max

The maximum of lines to consider for the header.

skip

The number of lines to skip at the beginning of the file.

locale

A readr locale object with all the data regarding required to correctly interpret country-related items. The default value matches R defaults as US English + UTF-8 encoding, and it is advised to be used as much as possible.

lang

The language to use (mainly for comment, label and units), but also for factor levels or other character strings if a translation exists and if the language is spelled with uppercase characters (e.g., "FR"). The default value can be set with, e.g., options(SciViews_lang = "fr") for French.

lang_encoding

Encoding used by R scripts for translation. They should all be encoded as UTF-8, which is the default. However, this argument allows to specify a different encoding if needed.

as_dataframe

Deprecated: now use options(SciViews.as_dtx = as_XXX) to specify if you want a data.frame (as_dtf), a data.table (as_dtt, by default), or a tibble (as_dtbl). Do we try to convert the resulting object into a dataframe (inheriting from data.frame, tbl and tbl_db alias tibble)? If FALSE, no conversion is attempted. Note that now, whatever you indicate, it is always assumed to be FALSE as part of the deprecation!

as_labelled

Are variable converted into 'labelled' objects. This allows to keep labels and units when the vector is manipulated, but it can lead to incompatibilities with some R code (hence, it is FALSE by default).

comments

Comments to add in the created object.

package

The package where to look for the dataset. If file= is not provided, a list of available datasets in the package is displayed.

sidecar_file

If TRUE and a file with same name as file= + .R is found in the same directory, it is considered as code to import these data and it is sourced with local = TRUE, chdir = TRUE and verbose = FALSE. That script must create an object named dataset, which is the result that is returned by the function. It is advised to encode this script in UTF-8, which is the default value, but it is possible to specify a different encoding through the lang_encoding= parameter.

fun_list

The table with correspondence of the types, read, and write functions.

hfun

The function to read the header (lines starting with a special mark, usually '#' at the beginning of the file). This function must have the same arguments as hread_text() and should return a character string with the first header.max lines.

fun

The function to delegate reading of the data. If NULL (default), The function is chosen from fun_list.

data

A synonym to file= (the name makes more sense when the dataset is loaded from a package). You cannot use data= and file= at the same time.

cache_file

The path to a local file to use as a cache when file is downloaded (http://, https://, ftp://, or file:// protocols). If cache_file already exists, data are read from this cache, except if force = TRUE, see here under. Otherwise, data are saved in it before being used. If cache_file = NULL (the default), a temporary file is used and data are read from the Internet every time. This cache mechanism is particularly useful to provide data associated with a git repository. Put cache_file in .gitignore and use cache_file= in the code (and force = FALSE). That way, the data are downloaded once in a freshly cloned repository, and they are not included in the versioning system (useful for large datasets).

method

The downloading method used ("auto" by default), see utils::download.file().

quiet

In case we have to download files, do it silently (TRUE) or do we provide feedback and a progression bar (FALSE, by default)?

force

If TRUE and an URL is provided for file= and a path for cache_file=, then the content is downloaded all the time, even if the cache file already exists (it overwrites it). By default, it is FALSE, which is the most useful setting to make good use of the cache mechanism.

...

Further arguments passed to the function fun=.

full

Do we return the full extension, like csv.tar.gz (TRUE), or only the main extension, like csv (FALSE, by default).

x

An object.

pattern

A regular expression to list matching names.

Value

An R object with the data (its class depends on the data being read).

Details

read() allows for a unique entry point to read various kinds of data, but it delegates the actual work to various other functions dispatched across several R packages. See getOption("read_write").

Author

Philippe Grosjean phgrosjean@sciviews.org

Examples

# Use of read() as a more flexible substitute to data() (can change dataset
# name and syntax more similar to read R datasets and datasets from files)
read() # List all available datasets in your installed version of R
# List datasets in one particular package
read(package = "data.io")

# Read one dataset from this package, possibly changing its name
(urchin <- read("urchin_bio", package = "data.io"))
#> # A data.trame: [421 × 19]
#>    origin  diameter1 diameter2 height buoyant_weight weight solid_parts
#>    <fct>       <dbl>     <dbl>  <dbl>          <dbl>  <dbl>       <dbl>
#>  1 Fishery       9.9      10.2    5               NA  0.522       0.478
#>  2 Fishery      10.5      10.6    5.7             NA  0.642       0.589
#>  3 Fishery      10.8      10.8    5.2             NA  0.734       0.677
#>  4 Fishery       9.6       9.3    4.6             NA  0.370       0.344
#>  5 Fishery      10.4      10.7    4.8             NA  0.610       0.559
#>  6 Fishery      10.5      11.1    5               NA  0.610       0.551
#>  7 Fishery      11        11      5.2             NA  0.672       0.605
#>  8 Fishery      11.1      11.2    5.7             NA  0.703       0.628
#>  9 Fishery       9.4       9.2    4.6             NA  0.413       0.375
#> 10 Fishery      10.1       9.5    4.7             NA  0.449       0.398
#> # ℹ 411 more rows
#> # ℹ 12 more variables: integuments <dbl>, dry_integuments <dbl>,
#> #   digestive_tract <dbl>, dry_digestive_tract <dbl>, gonads <dbl>,
#> #   dry_gonads <dbl>, skeleton <dbl>, lantern <dbl>, test <dbl>, spines <dbl>,
#> #   maturity <int>, sex <fct>
# Same, but using labels in French
(urchin <- read("urchin_bio", package = "data.io", lang = "fr"))
#> # A data.trame: [421 × 19]
#>    origin  diameter1 diameter2 height buoyant_weight weight solid_parts
#>    <fct>       <dbl>     <dbl>  <dbl>          <dbl>  <dbl>       <dbl>
#>  1 Fishery       9.9      10.2    5               NA  0.522       0.478
#>  2 Fishery      10.5      10.6    5.7             NA  0.642       0.589
#>  3 Fishery      10.8      10.8    5.2             NA  0.734       0.677
#>  4 Fishery       9.6       9.3    4.6             NA  0.370       0.344
#>  5 Fishery      10.4      10.7    4.8             NA  0.610       0.559
#>  6 Fishery      10.5      11.1    5               NA  0.610       0.551
#>  7 Fishery      11        11      5.2             NA  0.672       0.605
#>  8 Fishery      11.1      11.2    5.7             NA  0.703       0.628
#>  9 Fishery       9.4       9.2    4.6             NA  0.413       0.375
#> 10 Fishery      10.1       9.5    4.7             NA  0.449       0.398
#> # ℹ 411 more rows
#> # ℹ 12 more variables: integuments <dbl>, dry_integuments <dbl>,
#> #   digestive_tract <dbl>, dry_digestive_tract <dbl>, gonads <dbl>,
#> #   dry_gonads <dbl>, skeleton <dbl>, lantern <dbl>, test <dbl>, spines <dbl>,
#> #   maturity <int>, sex <fct>
# ... and also the levels of factors in French (note: uppercase FR)
(urchin <- read("urchin_bio", package = "data.io", lang = "FR"))
#> # A data.trame: [421 × 19]
#>    origin   diameter1 diameter2 height buoyant_weight weight solid_parts
#>    <fct>        <dbl>     <dbl>  <dbl>          <dbl>  <dbl>       <dbl>
#>  1 Pêcherie       9.9      10.2    5               NA  0.522       0.478
#>  2 Pêcherie      10.5      10.6    5.7             NA  0.642       0.589
#>  3 Pêcherie      10.8      10.8    5.2             NA  0.734       0.677
#>  4 Pêcherie       9.6       9.3    4.6             NA  0.370       0.344
#>  5 Pêcherie      10.4      10.7    4.8             NA  0.610       0.559
#>  6 Pêcherie      10.5      11.1    5               NA  0.610       0.551
#>  7 Pêcherie      11        11      5.2             NA  0.672       0.605
#>  8 Pêcherie      11.1      11.2    5.7             NA  0.703       0.628
#>  9 Pêcherie       9.4       9.2    4.6             NA  0.413       0.375
#> 10 Pêcherie      10.1       9.5    4.7             NA  0.449       0.398
#> # ℹ 411 more rows
#> # ℹ 12 more variables: integuments <dbl>, dry_integuments <dbl>,
#> #   digestive_tract <dbl>, dry_digestive_tract <dbl>, gonads <dbl>,
#> #   dry_gonads <dbl>, skeleton <dbl>, lantern <dbl>, test <dbl>, spines <dbl>,
#> #   maturity <int>, sex <fct>

# Read one dataset from another package, but with labels and comments
data(iris) # The R way: you got the initial datasets
# Same result, using read()
ir2 <- read("iris", package = "datasets", lang = NULL)
# ir2 records that it comes from datasets::iris
attr(comment(ir2), "src")
#> [1] "datasets::iris"
# otherwise, it is identical to iris, except is may be a data.table or a
# tibble, depending on user preferences
comment(ir2) <- NULL
# Force coercion into a data.frame
ir2 <- svBase::as_dtf(ir2)
identical(iris, ir2)
#> [1] TRUE
# More interesting: you can get an enhanced version of iris with read():
# (note that variable names ar in snake-case now!)
(ir3 <- read("iris", package = "datasets"))
#> # A data.trame: [150 × 5]
#>    sepal_length sepal_width petal_length petal_width species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # ℹ 140 more rows
class(ir3)
#> [1] "data.trame" "data.frame"
comment(ir3)
#> [1] "The 'iris' from 'datasets', but with variables names in snake_case"
#> [2] "(Sepal.Length -> sepal_length, Species -> species)."               
#> attr(,"lang")
#> [1] "en"
#> attr(,"lang_encoding")
#> [1] "UTF-8"
#> attr(,"src")
#> [1] "datasets::iris"
ir3$sepal_length
#>   [1] 5.1 4.9 4.7 4.6 5.0 5.4 4.6 5.0 4.4 4.9 5.4 4.8 4.8 4.3 5.8 5.7 5.4 5.1
#>  [19] 5.7 5.1 5.4 5.1 4.6 5.1 4.8 5.0 5.0 5.2 5.2 4.7 4.8 5.4 5.2 5.5 4.9 5.0
#>  [37] 5.5 4.9 4.4 5.1 5.0 4.5 4.4 5.0 5.1 4.8 5.1 4.6 5.3 5.0 7.0 6.4 6.9 5.5
#>  [55] 6.5 5.7 6.3 4.9 6.6 5.2 5.0 5.9 6.0 6.1 5.6 6.7 5.6 5.8 6.2 5.6 5.9 6.1
#>  [73] 6.3 6.1 6.4 6.6 6.8 6.7 6.0 5.7 5.5 5.5 5.8 6.0 5.4 6.0 6.7 6.3 5.6 5.5
#>  [91] 5.5 6.1 5.8 5.0 5.6 5.7 5.7 6.2 5.1 5.7 6.3 5.8 7.1 6.3 6.5 7.6 4.9 7.3
#> [109] 6.7 7.2 6.5 6.4 6.8 5.7 5.8 6.4 6.5 7.7 7.7 6.0 6.9 5.6 7.7 6.3 6.7 7.2
#> [127] 6.2 6.1 6.4 7.2 7.4 7.9 6.4 6.3 6.1 7.7 6.3 6.4 6.0 6.9 6.7 6.9 5.8 6.8
#> [145] 6.7 6.7 6.3 6.5 6.2 5.9
#> attr(,"label")
#> [1] "Length of the sepals"
#> attr(,"units")
#> [1] "cm"
# ... and you can get it in French too!
(ir_fr <- read("iris", package = "datasets", lang = "fr"))
#> # A data.trame: [150 × 5]
#>    sepal_length sepal_width petal_length petal_width species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # ℹ 140 more rows
class(ir_fr)
#> [1] "data.trame" "data.frame"
comment(ir_fr)
#> [1] "Jeu de données 'iris' de 'datasets', mais avec noms de variables modifiées"
#> [2] "(Sepal.Length -> sepal_length, Species -> species)."                       
#> attr(,"lang")
#> [1] "fr"
#> attr(,"lang_encoding")
#> [1] "UTF-8"
#> attr(,"src")
#> [1] "datasets::iris"
ir_fr$sepal_length
#>   [1] 5.1 4.9 4.7 4.6 5.0 5.4 4.6 5.0 4.4 4.9 5.4 4.8 4.8 4.3 5.8 5.7 5.4 5.1
#>  [19] 5.7 5.1 5.4 5.1 4.6 5.1 4.8 5.0 5.0 5.2 5.2 4.7 4.8 5.4 5.2 5.5 4.9 5.0
#>  [37] 5.5 4.9 4.4 5.1 5.0 4.5 4.4 5.0 5.1 4.8 5.1 4.6 5.3 5.0 7.0 6.4 6.9 5.5
#>  [55] 6.5 5.7 6.3 4.9 6.6 5.2 5.0 5.9 6.0 6.1 5.6 6.7 5.6 5.8 6.2 5.6 5.9 6.1
#>  [73] 6.3 6.1 6.4 6.6 6.8 6.7 6.0 5.7 5.5 5.5 5.8 6.0 5.4 6.0 6.7 6.3 5.6 5.5
#>  [91] 5.5 6.1 5.8 5.0 5.6 5.7 5.7 6.2 5.1 5.7 6.3 5.8 7.1 6.3 6.5 7.6 4.9 7.3
#> [109] 6.7 7.2 6.5 6.4 6.8 5.7 5.8 6.4 6.5 7.7 7.7 6.0 6.9 5.6 7.7 6.3 6.7 7.2
#> [127] 6.2 6.1 6.4 7.2 7.4 7.9 6.4 6.3 6.1 7.7 6.3 6.4 6.0 6.9 6.7 6.9 5.8 6.8
#> [145] 6.7 6.7 6.3 6.5 6.2 5.9
#> attr(,"label")
#> [1] "Longueur des sépales"
#> attr(,"units")
#> [1] "cm"

# Sometimes, datasets are more deeply reworked. For instance, trees has
# variables in imperial units (in, ft, and cubic ft), but it is automatically
# reworked by read() into metric variables (m or m^3):
data(trees)
head(trees)
#>   Girth Height Volume
#> 1   8.3     70   10.3
#> 2   8.6     65   10.3
#> 3   8.8     63   10.2
#> 4  10.5     72   16.4
#> 5  10.7     81   18.8
#> 6  10.8     83   19.7
(trees2 <- read("trees", package = "datasets"))
#> # A data.trame: [31 × 3]
#>    diameter height volume
#>       <dbl>  <dbl>  <dbl>
#>  1    0.211   21.3  0.292
#>  2    0.218   19.8  0.292
#>  3    0.224   19.2  0.289
#>  4    0.267   21.9  0.464
#>  5    0.272   24.7  0.532
#>  6    0.274   25.3  0.558
#>  7    0.279   20.1  0.442
#>  8    0.279   22.9  0.515
#>  9    0.282   24.4  0.64 
#> 10    0.284   22.9  0.563
#> # ℹ 21 more rows
comment(trees2)
#> [1] "The 'trees' from 'datasets' but with variables renamed and in m or m^3"
#> [2] "(Girth [in] -> diameter [m], Height [ft] -> height [m],"               
#> [3] "Volume [ft^3] -> volume [m^3])."                                       
#> attr(,"lang")
#> [1] "en"
#> attr(,"lang_encoding")
#> [1] "UTF-8"
#> attr(,"src")
#> [1] "datasets::trees"
trees2$volume
#>  [1] 0.292 0.292 0.289 0.464 0.532 0.558 0.442 0.515 0.640 0.563 0.685 0.595
#> [13] 0.606 0.603 0.541 0.629 0.957 0.776 0.728 0.705 0.977 0.898 1.028 1.085
#> [25] 1.206 1.569 1.577 1.651 1.458 1.444 2.180
#> attr(,"label")
#> [1] "Volume of timber"
#> attr(,"units")
#> [1] "m^3"
# \donttest{
# Read from a Github Gist (need to specify the type here!)
# (ble <- read$csv("http://tinyurl.com/Biostat-Ble"))

# Various versions of the famous iris dataset
(iris <- read(data_example("iris.csv")))
#> # A data.trame: [150 × 5]
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <chr>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # ℹ 140 more rows
(iris <- read(data_example("iris.csv.zip")))
#> # A data.trame: [150 × 5]
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <chr>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # ℹ 140 more rows
(iris <- read(data_example("iris.csv.gz")))
#> # A data.trame: [150 × 5]
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <chr>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # ℹ 140 more rows
(iris <- read(data_example("iris.csv.bz2")))
#> # A data.trame: [150 × 5]
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <chr>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # ℹ 140 more rows
(iris <- read(data_example("iris.tsv")))
#> # A data.trame: [150 × 5]
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <chr>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # ℹ 140 more rows
(iris <- read(data_example("iris.xls")))
#> New names:
#>  `` -> `...1`
#> # A data.trame: [150 × 5]
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <chr>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # ℹ 140 more rows
(iris <- read(data_example("iris.xlsx")))
#> New names:
#>  `` -> `...1`
#> # A data.trame: [150 × 5]
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <chr>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # ℹ 140 more rows
(iris <- read(data_example("iris.rds")))
#> # A data.trame: [150 × 5]
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # ℹ 140 more rows
#(iris <- read(data_example("iris.syd"))) ##
#(iris <- read(data_example("iris.csvy"))) ##
#(iris <- read(data_example("iris.csvy.zip"))) ##

# A file with an header both in English (default) and in French
(iris <- read(data_example("iris_short_header.csv")))
#> # A data.trame: [150 × 5]
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>    <labelled>   <labelled>  <labelled>   <labelled>  <fct>  
#>  1 5.1          3.5         1.4          0.2         setosa 
#>  2 4.9          3.0         1.4          0.2         setosa 
#>  3 4.7          3.2         1.3          0.2         setosa 
#>  4 4.6          3.1         1.5          0.2         setosa 
#>  5 5.0          3.6         1.4          0.2         setosa 
#>  6 5.4          3.9         1.7          0.4         setosa 
#>  7 4.6          3.4         1.4          0.3         setosa 
#>  8 5.0          3.4         1.5          0.2         setosa 
#>  9 4.4          2.9         1.4          0.2         setosa 
#> 10 4.9          3.1         1.5          0.1         setosa 
#> # ℹ 140 more rows
(iris_fr <- read(data_example("iris_short_header.csv"), lang = "fr"))
#> # A data.trame: [150 × 5]
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>    <labelled>   <labelled>  <labelled>   <labelled>  <fct>  
#>  1 5.1          3.5         1.4          0.2         setosa 
#>  2 4.9          3.0         1.4          0.2         setosa 
#>  3 4.7          3.2         1.3          0.2         setosa 
#>  4 4.6          3.1         1.5          0.2         setosa 
#>  5 5.0          3.6         1.4          0.2         setosa 
#>  6 5.4          3.9         1.7          0.4         setosa 
#>  7 4.6          3.4         1.4          0.3         setosa 
#>  8 5.0          3.4         1.5          0.2         setosa 
#>  9 4.4          2.9         1.4          0.2         setosa 
#> 10 4.9          3.1         1.5          0.1         setosa 
#> # ℹ 140 more rows
# Headers are also recognized in xls/xlsx files
(iris_fr <- read(data_example("iris_short_header.xls"), lang = "fr"))
#> New names:
#>  `` -> `...1`
#> # A data.trame: [150 × 5]
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>    <labelled>   <labelled>  <labelled>   <labelled>  <fct>  
#>  1 5.1          3.5         1.4          0.2         setosa 
#>  2 4.9          3.0         1.4          0.2         setosa 
#>  3 4.7          3.2         1.3          0.2         setosa 
#>  4 4.6          3.1         1.5          0.2         setosa 
#>  5 5.0          3.6         1.4          0.2         setosa 
#>  6 5.4          3.9         1.7          0.4         setosa 
#>  7 4.6          3.4         1.4          0.3         setosa 
#>  8 5.0          3.4         1.5          0.2         setosa 
#>  9 4.4          2.9         1.4          0.2         setosa 
#> 10 4.9          3.1         1.5          0.1         setosa 
#> # ℹ 140 more rows

# Read a file with a sidecar file (same name + '.R')
(iris <- read(data_example("iris_sidecar.csv"))) # lang = "en" by default
#> # A data.trame: [150 × 5]
#>    sepal_length sepal_width petal_length petal_width species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # ℹ 140 more rows
(iris <- read(data_example("iris_sidecar.csv"), lang = "EN")) # Full lang
#> # A data.trame: [150 × 5]
#>    sepal_length sepal_width petal_length petal_width species  
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>    
#>  1          5.1         3.5          1.4         0.2 I. setosa
#>  2          4.9         3            1.4         0.2 I. setosa
#>  3          4.7         3.2          1.3         0.2 I. setosa
#>  4          4.6         3.1          1.5         0.2 I. setosa
#>  5          5           3.6          1.4         0.2 I. setosa
#>  6          5.4         3.9          1.7         0.4 I. setosa
#>  7          4.6         3.4          1.4         0.3 I. setosa
#>  8          5           3.4          1.5         0.2 I. setosa
#>  9          4.4         2.9          1.4         0.2 I. setosa
#> 10          4.9         3.1          1.5         0.1 I. setosa
#> # ℹ 140 more rows
(iris <- read(data_example("iris_sidecar.csv"), lang = "en_us")) # US (in)
#> # A data.trame: [150 × 5]
#>    sepal_length sepal_width petal_length petal_width species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1         2.01        1.38        0.551      0.0787 setosa 
#>  2         1.93        1.18        0.551      0.0787 setosa 
#>  3         1.85        1.26        0.512      0.0787 setosa 
#>  4         1.81        1.22        0.591      0.0787 setosa 
#>  5         1.97        1.42        0.551      0.0787 setosa 
#>  6         2.13        1.54        0.669      0.157  setosa 
#>  7         1.81        1.34        0.551      0.118  setosa 
#>  8         1.97        1.34        0.591      0.0787 setosa 
#>  9         1.73        1.14        0.551      0.0787 setosa 
#> 10         1.93        1.22        0.591      0.0394 setosa 
#> # ℹ 140 more rows
(iris <- read(data_example("iris_sidecar.csv"), lang = "fr")) # French
#> # A data.trame: [150 × 5]
#>    sepal_length sepal_width petal_length petal_width species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # ℹ 140 more rows
(iris <- read(data_example("iris_sidecar.csv"), lang = "FR_BE")) # Belgian
#> # A data.trame: [150 × 5]
#>    sepal_length sepal_width petal_length petal_width species  
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>    
#>  1          5.1         3.5          1.4         0.2 I. setosa
#>  2          4.9         3            1.4         0.2 I. setosa
#>  3          4.7         3.2          1.3         0.2 I. setosa
#>  4          4.6         3.1          1.5         0.2 I. setosa
#>  5          5           3.6          1.4         0.2 I. setosa
#>  6          5.4         3.9          1.7         0.4 I. setosa
#>  7          4.6         3.4          1.4         0.3 I. setosa
#>  8          5           3.4          1.5         0.2 I. setosa
#>  9          4.4         2.9          1.4         0.2 I. setosa
#> 10          4.9         3.1          1.5         0.1 I. setosa
#> # ℹ 140 more rows
(iris <- read(data_example("iris_sidecar.csv"), lang = NULL)) # No labels
#> # A data.trame: [150 × 5]
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # ℹ 140 more rows

# Require the feather package
#(iris <- read(data_example("iris.feather"))) # Not available for all Win

# Challenging datasets from the readr package
library(readr)
(mtcars <- read(readr_example("mtcars.csv")))
#> # A data.trame: [32 × 11]
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <int> <dbl> <int> <dbl> <dbl> <dbl> <int> <int> <int> <int>
#>  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
#>  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
#>  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
#>  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
#>  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#>  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
#>  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
#>  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
#>  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
#> 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
#> # ℹ 22 more rows
(mtcars <- read(readr_example("mtcars.csv.zip")))
#> # A data.trame: [32 × 11]
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <int> <dbl> <int> <dbl> <dbl> <dbl> <int> <int> <int> <int>
#>  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
#>  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
#>  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
#>  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
#>  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#>  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
#>  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
#>  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
#>  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
#> 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
#> # ℹ 22 more rows
(mtcars <- read(readr_example("mtcars.csv.bz2")))
#> # A data.trame: [32 × 11]
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <int> <dbl> <int> <dbl> <dbl> <dbl> <int> <int> <int> <int>
#>  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
#>  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
#>  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
#>  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
#>  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#>  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
#>  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
#>  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
#>  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
#> 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
#> # ℹ 22 more rows
(challenge <- read(readr_example("challenge.csv")))
#> # A data.trame: [2,000 × 2]
#>        x y      
#>    <dbl> <IDate>
#>  1   404 NA     
#>  2  4172 NA     
#>  3  3004 NA     
#>  4   787 NA     
#>  5    37 NA     
#>  6  2332 NA     
#>  7  2489 NA     
#>  8  1449 NA     
#>  9  3665 NA     
#> 10  3863 NA     
#> # ℹ 1,990 more rows
# Or using readr::read_csv()... There are differences!
(challenge2 <- read$csv_alt(readr_example("challenge.csv"), guess_max = 1001))
#> Rows: 2000 Columns: 2
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> dbl  (1): x
#> date (1): y
#> 
#>  Use `spec()` to retrieve the full column specification for this data.
#>  Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A data.trame: [2,000 × 2]
#>        x y     
#>    <dbl> <date>
#>  1   404 NA    
#>  2  4172 NA    
#>  3  3004 NA    
#>  4   787 NA    
#>  5    37 NA    
#>  6  2332 NA    
#>  7  2489 NA    
#>  8  1449 NA    
#>  9  3665 NA    
#> 10  3863 NA    
#> # ℹ 1,990 more rows
sapply(challenge, class)
#> $x
#> [1] "numeric"
#> 
#> $y
#> [1] "IDate" "Date" 
#> 
sapply(challenge2, class)
#>         x         y 
#> "numeric"    "Date" 
(massey <- read(readr_example("massey-rating.txt")))
#> [1] "UCC PAY LAZ KPK  RT   COF BIH DII ENG ACU Rank Team            Conf\n  1   1   1   1   1     1   1   1   1   1    1 Ohio St          B10 \n  2   2   2   2   2     2   2   2   4   2    2 Oregon           P12 \n  3   4   3   4   3     4   3   4   2   3    3 Alabama          SEC \n  4   3   4   3   4     3   5   3   3   4    4 TCU              B12 \n  6   6   6   5   5     7   6   5   6  11    5 Michigan St      B10 \n  7   7   7   6   7     6  11   8   7   8    6 Georgia          SEC \n  5   5   5   7   6     8   4   6   5   5    7 Florida St       ACC \n  8   8   9   9  10     5   7   7  10   7    8 Baylor           B12 \n  9  11   8  13  11    11  12   9  14   9    9 Georgia Tech     ACC \n 13  10  13  11   8     9  10  11   9  10   10 Mississippi      SEC \n"
# By default, the type cannot be guessed from the extension
# This is a space-separated vaules file (ssv)
(massey <- read(readr_example("massey-rating.txt"), type = "ssv"))
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   UCC = col_double(),
#>   PAY = col_double(),
#>   LAZ = col_double(),
#>   KPK = col_double(),
#>   RT = col_double(),
#>   COF = col_double(),
#>   BIH = col_double(),
#>   DII = col_double(),
#>   ENG = col_double(),
#>   ACU = col_double(),
#>   Rank = col_double(),
#>   Team = col_character(),
#>   Conf = col_character()
#> )
#> Warning: 10 parsing failures.
#> row col   expected     actual                                                              file
#>   1  -- 13 columns 15 columns '/home/runner/work/_temp/Library/readr/extdata/massey-rating.txt'
#>   2  -- 13 columns 14 columns '/home/runner/work/_temp/Library/readr/extdata/massey-rating.txt'
#>   3  -- 13 columns 14 columns '/home/runner/work/_temp/Library/readr/extdata/massey-rating.txt'
#>   4  -- 13 columns 14 columns '/home/runner/work/_temp/Library/readr/extdata/massey-rating.txt'
#>   5  -- 13 columns 15 columns '/home/runner/work/_temp/Library/readr/extdata/massey-rating.txt'
#> ... ... .......... .......... .................................................................
#> See problems(...) for more details.
#> # A data.trame: [10 × 13]
#>      UCC   PAY   LAZ   KPK    RT   COF   BIH   DII   ENG   ACU  Rank Team  Conf 
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
#>  1     1     1     1     1     1     1     1     1     1     1     1 Ohio  St   
#>  2     2     2     2     2     2     2     2     2     4     2     2 Oreg… P12  
#>  3     3     4     3     4     3     4     3     4     2     3     3 Alab… SEC  
#>  4     4     3     4     3     4     3     5     3     3     4     4 TCU   B12  
#>  5     6     6     6     5     5     7     6     5     6    11     5 Mich… St   
#>  6     7     7     7     6     7     6    11     8     7     8     6 Geor… SEC  
#>  7     5     5     5     7     6     8     4     6     5     5     7 Flor… St   
#>  8     8     8     9     9    10     5     7     7    10     7     8 Bayl… B12  
#>  9     9    11     8    13    11    11    12     9    14     9     9 Geor… Tech 
#> 10    13    10    13    11     8     9    10    11     9    10    10 Miss… SEC  
# or ...
(massey <- read$ssv(readr_example("massey-rating.txt")))
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   UCC = col_double(),
#>   PAY = col_double(),
#>   LAZ = col_double(),
#>   KPK = col_double(),
#>   RT = col_double(),
#>   COF = col_double(),
#>   BIH = col_double(),
#>   DII = col_double(),
#>   ENG = col_double(),
#>   ACU = col_double(),
#>   Rank = col_double(),
#>   Team = col_character(),
#>   Conf = col_character()
#> )
#> Warning: 10 parsing failures.
#> row col   expected     actual                                                              file
#>   1  -- 13 columns 15 columns '/home/runner/work/_temp/Library/readr/extdata/massey-rating.txt'
#>   2  -- 13 columns 14 columns '/home/runner/work/_temp/Library/readr/extdata/massey-rating.txt'
#>   3  -- 13 columns 14 columns '/home/runner/work/_temp/Library/readr/extdata/massey-rating.txt'
#>   4  -- 13 columns 14 columns '/home/runner/work/_temp/Library/readr/extdata/massey-rating.txt'
#>   5  -- 13 columns 15 columns '/home/runner/work/_temp/Library/readr/extdata/massey-rating.txt'
#> ... ... .......... .......... .................................................................
#> See problems(...) for more details.
#> # A data.trame: [10 × 13]
#>      UCC   PAY   LAZ   KPK    RT   COF   BIH   DII   ENG   ACU  Rank Team  Conf 
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
#>  1     1     1     1     1     1     1     1     1     1     1     1 Ohio  St   
#>  2     2     2     2     2     2     2     2     2     4     2     2 Oreg… P12  
#>  3     3     4     3     4     3     4     3     4     2     3     3 Alab… SEC  
#>  4     4     3     4     3     4     3     5     3     3     4     4 TCU   B12  
#>  5     6     6     6     5     5     7     6     5     6    11     5 Mich… St   
#>  6     7     7     7     6     7     6    11     8     7     8     6 Geor… SEC  
#>  7     5     5     5     7     6     8     4     6     5     5     7 Flor… St   
#>  8     8     8     9     9    10     5     7     7    10     7     8 Bayl… B12  
#>  9     9    11     8    13    11    11    12     9    14     9     9 Geor… Tech 
#> 10    13    10    13    11     8     9    10    11     9    10    10 Miss… SEC  
(epa <- read$ssv(readr_example("epa78.txt"), col_names = FALSE))
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   X1 = col_character(),
#>   X2 = col_character(),
#>   X3 = col_character(),
#>   X4 = col_character(),
#>   X5 = col_double()
#> )
#> Warning: 17 parsing failures.
#> row col  expected     actual                                                      file
#>   2  -- 5 columns 10 columns '/home/runner/work/_temp/Library/readr/extdata/epa78.txt'
#>   3  -- 5 columns 6 columns  '/home/runner/work/_temp/Library/readr/extdata/epa78.txt'
#>   4  -- 5 columns 3 columns  '/home/runner/work/_temp/Library/readr/extdata/epa78.txt'
#>   5  -- 5 columns 8 columns  '/home/runner/work/_temp/Library/readr/extdata/epa78.txt'
#>   6  -- 5 columns 8 columns  '/home/runner/work/_temp/Library/readr/extdata/epa78.txt'
#> ... ... ......... .......... .........................................................
#> See problems(...) for more details.
#> # A data.trame: [20 × 5]
#>    X1      X2     X3       X4           X5
#>    <chr>   <chr>  <chr>    <chr>     <dbl>
#>  1 ALFA    ROMEO  ALFA     ROMEO  78010003
#>  2 ALFETTA 03     81       8            74
#>  3 SPIDER  2000   01       SPIDER     2000
#>  4 AMC     AMC    78020002 NA           NA
#>  5 GREMLIN 03     79       9            79
#>  6 PACER   04     89       11           89
#>  7 PACER   WAGON  07       90           26
#>  8 CONCORD 04     88       12           90
#>  9 CONCORD WAGON  07       91           30
#> 10 MATADOR COUPE  05       97           14
#> 11 MATADOR SEDAN  06       110          20
#> 12 MATADOR WAGON  09       112          50
#> 13 ASTON   MARTIN ASTON    MARTIN 78040002
#> 14 ASTON   MARTIN ASTON    MARTIN 78040053
#> 15 AUDI    AUDI   78050002 NA           NA
#> 16 FOX     03     84       11           84
#> 17 FOX     WAGON  07       83           40
#> 18 5000    04     90       15           90
#> 19 AVANTI  AVANTI 78065002 NA           NA
#> 20 AVANTI  II     02       75            8
(example_log <- read(readr_example("example.log")))
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   X1 = col_character(),
#>   X2 = col_logical(),
#>   X3 = col_character(),
#>   X4 = col_character(),
#>   X5 = col_character(),
#>   X6 = col_double(),
#>   X7 = col_double()
#> )
#> # A data.trame: [2 × 7]
#>   X1           X2    X3                   X4                   X5       X6    X7
#>   <chr>        <lgl> <chr>                <chr>                <chr> <dbl> <dbl>
#> 1 172.21.13.45 NA    "Microsoft\\JohnDoe" 08/Apr/2001:17:39:0… GET …   200  3401
#> 2 127.0.0.1    NA    "frank"              10/Oct/2000:13:55:3… GET …   200  2326
# There are different ways to specify columns for fixed-width files (fwf)
# See ?read_fwf in package readr
(fwf_sample <- read$fwf(readr_example("fwf-sample.txt"),
   col_positions =  fwf_cols(name = 20, state = 10, ssn = 12)))
#> Rows: 3 Columns: 3
#> ── Column specification ────────────────────────────────────────────────────────
#> 
#> chr (3): name, state, ssn
#> 
#>  Use `spec()` to retrieve the full column specification for this data.
#>  Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A data.trame: [3 × 3]
#>   name          state ssn         
#>   <chr>         <chr> <chr>       
#> 1 John Smith    WA    418-Y11-4111
#> 2 Mary Hartford CA    319-Z19-4341
#> 3 Evan Nolan    IL    219-532-c301

# Various examples of Excel datasets from readxl
library(readxl)
(xl <- read(readxl_example("datasets.xls")))
#> New names:
#>  `` -> `...1`
#> # A data.trame: [32 × 11]
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
#>  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
#>  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
#>  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
#>  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#>  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
#>  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
#>  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
#>  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
#> 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
#> # ℹ 22 more rows
(xl <- read(readxl_example("datasets.xlsx"), sheet = "mtcars"))
#> New names:
#>  `` -> `...1`
#> # A data.trame: [32 × 11]
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4
#>  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4
#>  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
#>  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
#>  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#>  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
#>  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
#>  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2
#>  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
#> 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
#> # ℹ 22 more rows
(xl <- read(readxl_example("datasets.xlsx"), sheet = 3))
#> New names:
#>  `` -> `...1`
#> # A data.trame: [1,000 × 5]
#>      lat  long depth   mag stations
#>    <dbl> <dbl> <dbl> <dbl>    <dbl>
#>  1 -20.4  182.   562   4.8       41
#>  2 -20.6  181.   650   4.2       15
#>  3 -26    184.    42   5.4       43
#>  4 -18.0  182.   626   4.1       19
#>  5 -20.4  182.   649   4         11
#>  6 -19.7  184.   195   4         12
#>  7 -11.7  166.    82   4.8       43
#>  8 -28.1  182.   194   4.4       15
#>  9 -28.7  182.   211   4.7       35
#> 10 -17.5  180.   622   4.3       19
#> # ℹ 990 more rows
# Accomodate a column with disparate types via col_type = "list"
(clip <- read(readxl_example("clippy.xls"), col_types = c("text", "list")))
#> New names:
#>  `` -> `...1`
#> # A data.trame: [4 × 2]
#>   name                 value     
#>   <chr>                <list>    
#> 1 Name                 <chr [1]> 
#> 2 Species              <chr [1]> 
#> 3 Approx date of death <dttm [1]>
#> 4 Weight in grams      <dbl [1]> 
(clip <- read(readxl_example("clippy.xlsx"), col_types = c("text", "list")))
#> New names:
#>  `` -> `...1`
#> # A data.trame: [4 × 2]
#>   name                 value     
#>   <chr>                <list>    
#> 1 Name                 <chr [1]> 
#> 2 Species              <chr [1]> 
#> 3 Approx date of death <dttm [1]>
#> 4 Weight in grams      <dbl [1]> 
tibble::deframe(clip)
#> $Name
#> [1] "Clippy"
#> 
#> $Species
#> [1] "paperclip"
#> 
#> $`Approx date of death`
#> [1] "2007-01-01 UTC"
#> 
#> $`Weight in grams`
#> [1] 0.9
#> 
# Read from a specific range in a sheet
(xl <- read(readxl_example("datasets.xlsx"), range = "mtcars!B1:D5"))
#> New names:
#>  `` -> `...1`
#> # A data.trame: [4 × 3]
#>     cyl  disp    hp
#>   <dbl> <dbl> <dbl>
#> 1     6   160   110
#> 2     6   160   110
#> 3     4   108    93
#> 4     6   258   110
(deaths <- read(readxl_example("deaths.xls"), range = cell_rows(5:15)))
#> New names:
#>  `` -> `...1`
#> # A data.trame: [10 × 6]
#>    Name      Profession   Age `Has kids` `Date of birth`     `Date of death`    
#>    <chr>     <chr>      <dbl> <lgl>      <dttm>              <dttm>             
#>  1 David Bo… musician      69 TRUE       1947-01-08 00:00:00 2016-01-10 00:00:00
#>  2 Carrie F… actor         60 TRUE       1956-10-21 00:00:00 2016-12-27 00:00:00
#>  3 Chuck Be… musician      90 TRUE       1926-10-18 00:00:00 2017-03-18 00:00:00
#>  4 Bill Pax… actor         61 TRUE       1955-05-17 00:00:00 2017-02-25 00:00:00
#>  5 Prince    musician      57 TRUE       1958-06-07 00:00:00 2016-04-21 00:00:00
#>  6 Alan Ric… actor         69 FALSE      1946-02-21 00:00:00 2016-01-14 00:00:00
#>  7 Florence… actor         82 TRUE       1934-02-14 00:00:00 2016-11-24 00:00:00
#>  8 Harper L… author        89 FALSE      1926-04-28 00:00:00 2016-02-19 00:00:00
#>  9 Zsa Zsa … actor         99 TRUE       1917-02-06 00:00:00 2016-12-18 00:00:00
#> 10 George M… musician      53 FALSE      1963-06-25 00:00:00 2016-12-25 00:00:00
(deaths <- read(readxl_example("deaths.xlsx"), range = cell_rows(5:15)))
#> New names:
#>  `` -> `...1`
#> # A data.trame: [10 × 6]
#>    Name      Profession   Age `Has kids` `Date of birth`     `Date of death`    
#>    <chr>     <chr>      <dbl> <lgl>      <dttm>              <dttm>             
#>  1 David Bo… musician      69 TRUE       1947-01-08 00:00:00 2016-01-10 00:00:00
#>  2 Carrie F… actor         60 TRUE       1956-10-21 00:00:00 2016-12-27 00:00:00
#>  3 Chuck Be… musician      90 TRUE       1926-10-18 00:00:00 2017-03-18 00:00:00
#>  4 Bill Pax… actor         61 TRUE       1955-05-17 00:00:00 2017-02-25 00:00:00
#>  5 Prince    musician      57 TRUE       1958-06-07 00:00:00 2016-04-21 00:00:00
#>  6 Alan Ric… actor         69 FALSE      1946-02-21 00:00:00 2016-01-14 00:00:00
#>  7 Florence… actor         82 TRUE       1934-02-14 00:00:00 2016-11-24 00:00:00
#>  8 Harper L… author        89 FALSE      1926-04-28 00:00:00 2016-02-19 00:00:00
#>  9 Zsa Zsa … actor         99 TRUE       1917-02-06 00:00:00 2016-12-18 00:00:00
#> 10 George M… musician      53 FALSE      1963-06-25 00:00:00 2016-12-25 00:00:00
(type_me <- read(readxl_example("type-me.xls"), sheet = "logical_coercion",
  col_types = c("logical", "text")))
#> New names:
#>  `` -> `...1`
#> Warning: Expecting logical in A5 / R5C1: got a date
#> Warning: Expecting logical in A8 / R8C1: got 'cabbage'
#> # A data.trame: [10 × 2]
#>    `maybe boolean?` description                         
#>    <lgl>            <chr>                               
#>  1 NA               "empty"                             
#>  2 FALSE            "0 (numeric)"                       
#>  3 TRUE             "1 (numeric)"                       
#>  4 NA               "datetime"                          
#>  5 TRUE             "boolean true"                      
#>  6 FALSE            "boolean false"                     
#>  7 NA               "\"cabbage\""                       
#>  8 TRUE             "the string \"true\""               
#>  9 FALSE            "the letter \"F\""                  
#> 10 FALSE            "\"False\" preceded by single quote"
(type_me <- read(readxl_example("type-me.xlsx"), sheet = "numeric_coercion",
  col_types = c("numeric", "text")))
#> New names:
#>  `` -> `...1`
#> Warning: Coercing boolean to numeric in A3 / R3C1
#> Warning: Coercing boolean to numeric in A4 / R4C1
#> Warning: Expecting numeric in A5 / R5C1: got a date
#> Warning: Coercing text to numeric in A6 / R6C1: '123456'
#> Warning: Expecting numeric in A8 / R8C1: got 'cabbage'
#> # A data.trame: [7 × 2]
#>   `maybe numeric?` explanation            
#>              <dbl> <chr>                  
#> 1               NA "empty"                
#> 2                1 "boolean true"         
#> 3                0 "boolean false"        
#> 4            40534 "datetime"             
#> 5           123456 "the string \"123456\""
#> 6           123456 "the number 123456"    
#> 7               NA "\"cabbage\""          
(type_me <- read(readxl_example("type-me.xls"), sheet = "date_coercion",
  col_types = c("date", "text")))
#> New names:
#>  `` -> `...1`
#> Warning: Expecting date in A5 / R5C1: got boolean
#> Warning: Expecting date in A6 / R6C1: got 'cabbage'
#> Warning: Coercing numeric to date in A7 / R7C1
#> Warning: Coercing numeric to date in A8 / R8C1
#> # A data.trame: [7 × 2]
#>   `maybe a datetime?` explanation           
#>   <dttm>              <chr>                 
#> 1 NA                  "empty"               
#> 2 2016-05-23 00:00:00 "date only format"    
#> 3 2016-04-28 11:30:00 "date and time format"
#> 4 NA                  "boolean true"        
#> 5 NA                  "\"cabbage\""         
#> 6 1904-01-05 07:12:00 "4.3 (numeric)"       
#> 7 2012-01-02 00:00:00 "another numeric"     
(type_me <- read(readxl_example("type-me.xlsx"), sheet = "text_coercion",
  col_types = c("text", "text")))
#> New names:
#>  `` -> `...1`
#> # A data.trame: [6 × 2]
#>   text     explanation      
#>   <chr>    <chr>            
#> 1 NA       "empty"          
#> 2 cabbage  "\"cabbage\""    
#> 3 TRUE     "boolean true"   
#> 4 1.3      "numeric"        
#> 5 41175    "datetime"       
#> 6 36436153 "another numeric"
(xl <- read(readxl_example("geometry.xls"), col_names = FALSE))
#> New names:
#>  `` -> `...1`
#>  `` -> `...2`
#>  `` -> `...3`
#> # A data.trame: [4 × 3]
#>   ...1  ...2  ...3 
#>   <chr> <chr> <chr>
#> 1 B3    C3    D3   
#> 2 B4    C4    D4   
#> 3 B5    C5    D5   
#> 4 B6    C6    D6   
(xl <- read(readxl_example("geometry.xlsx"), range = cell_rows(4:8)))
#> # A data.trame: [4 × 3]
#>   B4    C4    D4   
#>   <chr> <chr> <chr>
#> 1 B5    C5    D5   
#> 2 B6    C6    D6   
#> 3 NA    NA    NA   
#> 4 NA    NA    NA   

# Various examples from haven
library(haven)
haven_example <- function(path)
  system.file("examples", path, package = "haven", mustWork = TRUE)
(iris2 <- read(haven_example("iris.dta"))) # Stata v. 8-14
#> # A data.trame: [150 × 5]
#>    sepallength sepalwidth petallength petalwidth species
#>          <dbl>      <dbl>       <dbl>      <dbl> <chr>  
#>  1        5.10       3.5         1.40      0.200 setosa 
#>  2        4.90       3           1.40      0.200 setosa 
#>  3        4.70       3.20        1.30      0.200 setosa 
#>  4        4.60       3.10        1.5       0.200 setosa 
#>  5        5          3.60        1.40      0.200 setosa 
#>  6        5.40       3.90        1.70      0.400 setosa 
#>  7        4.60       3.40        1.40      0.300 setosa 
#>  8        5          3.40        1.5       0.200 setosa 
#>  9        4.40       2.90        1.40      0.200 setosa 
#> 10        4.90       3.10        1.5       0.100 setosa 
#> # ℹ 140 more rows
(iris2 <- read(haven_example("iris.sav"))) # SPSS, TODO: labelled -> factor?
#> # A data.trame: [150 × 5]
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species   
#>           <dbl>       <dbl>        <dbl>       <dbl> <dbl+lbl> 
#>  1          5.1         3.5          1.4         0.2 1 [setosa]
#>  2          4.9         3            1.4         0.2 1 [setosa]
#>  3          4.7         3.2          1.3         0.2 1 [setosa]
#>  4          4.6         3.1          1.5         0.2 1 [setosa]
#>  5          5           3.6          1.4         0.2 1 [setosa]
#>  6          5.4         3.9          1.7         0.4 1 [setosa]
#>  7          4.6         3.4          1.4         0.3 1 [setosa]
#>  8          5           3.4          1.5         0.2 1 [setosa]
#>  9          4.4         2.9          1.4         0.2 1 [setosa]
#> 10          4.9         3.1          1.5         0.1 1 [setosa]
#> # ℹ 140 more rows
(pbc <- read(data_example("pbc.por"))) # SPSS, POR format
#> # A data.trame: [418 × 20]
#>      AGE   ALB ALKPHOS ASCITES  BILI  CHOL EDEMA EDTRT HEPMEG  TIME PLATELET
#>    <dbl> <dbl>   <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl>    <dbl>
#>  1  58.8  2.6    1718        1  14.5   261     1   1        1   400      190
#>  2  56.4  4.14   7395.       0   1.1   302     0   0        1  4500      221
#>  3  70.1  3.48    516        0   1.4   176     1   0.5      0  1012      151
#>  4  54.7  2.54   6122.       0   1.8   244     1   0.5      1  1925      183
#>  5  38.1  3.53    671        0   3.4   279     0   0        1  1504      136
#>  6  66.3  3.98    944        0   0.8   248     0   0        1  2503       -9
#>  7  55.5  4.09    824        0   1     322     0   0        1  1832      204
#>  8  53.1  4      4651.       0   0.3   280     0   0        0  2466      373
#>  9  42.5  3.08   2276        0   3.2   562     0   0        0  2400      251
#> 10  70.6  2.74    918        1  12.6   200     1   1        0    51      302
#> # ℹ 408 more rows
#> # ℹ 9 more variables: PROTIME <dbl>, SEX <dbl>, SGOT <dbl>, SPIDERS <dbl>,
#> #   STAGE <dbl>, STATUS <dbl>, TRT <dbl>, TRIG <dbl>, COPPER <dbl>
(iris2 <- read$sas(haven_example("iris.sas7bdat"))) # SAS file
#> # A data.trame: [150 × 5]
#>    Sepal_Length Sepal_Width Petal_Length Petal_Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <chr>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # ℹ 140 more rows
(afalfa <- read(data_example("afalfa.xpt"))) # SAS transport file
#> # A data.trame: [40 × 6]
#>    POP   SAMPLE   REP SEEDWT HARV1 HARV2
#>    <chr>  <dbl> <dbl>  <dbl> <dbl> <dbl>
#>  1 min        0     1     64  172.  180.
#>  2 min        1     1     54  138.  151.
#>  3 min        2     1     40  146.  129.
#>  4 min        3     1     45  170.  191.
#>  5 min        4     1     64  125.  173.
#>  6 MAX        5     1     75  179   235.
#>  7 MAX        6     1     45  166.  174.
#>  8 MAX        7     1     63  170.  156.
#>  9 MAX        8     1     65  193.  178.
#> 10 MAX        9     1     59  186.  179.
#> # ℹ 30 more rows

# Note that where completion is available, you have a completion list of file
# format after typing read$<tab>
# }