Read and return an R object from data on disk, from URL, or from packages.
read(
file,
type = NULL,
header = "#",
header.max = 50L,
skip = 0L,
locale = default_locale(),
lang = getOption("SciViews_lang", "en"),
lang_encoding = "UTF-8",
as_dataframe = FALSE,
as_labelled = FALSE,
comments = NULL,
package = NULL,
sidecar_file = TRUE,
fun_list = NULL,
hfun = NULL,
fun = NULL,
data,
cache_file = NULL,
method = "auto",
quiet = FALSE,
force = FALSE,
...
)
type_from_extension(file, full = FALSE)
hread_text(file, header.max, skip = 0L, locale = default_locale(), ...)
hread_xls(file, header.max, skip = 0L, locale = default_locale(), ...)
hread_xlsx(file, header.max, skip = 0L, locale = default_locale(), ...)
# S3 method for class 'read_function_subset'
.DollarNames(x, pattern = "")
The path to the file to read, or the name of the dataset to get
from an R package (in that case, you must provide the package=
argument).
The type (format) of data to read.
The character to use for the header and other comments.
The maximum of lines to consider for the header.
The number of lines to skip at the beginning of the file.
A readr locale object with all the data regarding required to correctly interpret country-related items. The default value matches R defaults as US English + UTF-8 encoding, and it is advised to be used as much as possible.
The language to use (mainly for comment, label and units), but
also for factor levels or other character strings if a translation exists
and if the language is spelled with uppercase characters (e.g., "FR"
).
The default value can be set with, e.g., options(SciViews_lang = "fr")
for
French.
Encoding used by R scripts for translation. They should
all be encoded as UTF-8
, which is the default. However, this argument
allows to specify a different encoding if needed.
Deprecated: now use options(SciViews.as_dtx = as_XXX)
to specify if you want a data.frame (as_dtf
), a data.table (as_dtt
, by
default), or a tibble (as_dtbl
). Do we try to convert the resulting
object into a dataframe
(inheriting from data.frame
, tbl
and tbl_db
alias tibble
)? If FALSE
, no conversion is attempted. Note that now,
whatever you indicate, it is always assumed to be FALSE
as part of the
deprecation!
Are variable converted into 'labelled' objects. This
allows to keep labels and units when the vector is manipulated, but it can
lead to incompatibilities with some R code (hence, it is FALSE
by default).
Comments to add in the created object.
The package where to look for the dataset. If file=
is not
provided, a list of available datasets in the package is displayed.
If TRUE
and a file with same name as file=
+ .R
is
found in the same directory, it is considered as code to import these data
and it is sourced with local = TRUE
, chdir = TRUE
and
verbose = FALSE
. That script must create an object named dataset
,
which is the result that is returned by the function. It is advised to
encode this script in UTF-8
, which is the default value, but it is
possible to specify a different encoding through the lang_encoding=
parameter.
The table with correspondence of the types, read, and write functions.
The function to read the header (lines starting with a special
mark, usually '#' at the beginning of the file). This function must have
the same arguments as hread_text()
and should return a character string
with the first header.max
lines.
The function to delegate reading of the data. If NULL
(default),
The function is chosen from fun_list
.
A synonym to file=
(the name makes more sense when the dataset
is loaded from a package). You cannot use data=
and file=
at the same
time.
The path to a local file to use as a cache when file is
downloaded (http://, https://, ftp://, or file:// protocols). If cache_file
already exists, data are read from this cache, except if force = TRUE
,
see here under. Otherwise, data are saved in it before being used. If
cache_file = NULL
(the default), a temporary file is used and data are
read from the Internet every time. This cache mechanism is particularly
useful to provide data associated with a git repository. Put cache_file in
.gitignore
and use cache_file=
in the code (and force = FALSE
). That
way, the data are downloaded once in a freshly cloned repository, and they
are not included in the versioning system (useful for large datasets).
The downloading method used ("auto"
by default), see
utils::download.file()
.
In case we have to download files, do it silently (TRUE
) or
do we provide feedback and a progression bar (FALSE
, by default)?
If TRUE
and an URL is provided for file=
and a path for
cache_file=
, then the content is downloaded all the time, even if the
cache file already exists (it overwrites it). By default, it is FALSE
,
which is the most useful setting to make good use of the cache mechanism.
Further arguments passed to the function fun=
.
Do we return the full extension, like csv.tar.gz
(TRUE
), or
only the main extension, like csv
(FALSE
, by default).
An object.
A regular expression to list matching names.
An R object with the data (its class depends on the data being read).
read()
allows for a unique entry point to read various kinds of
data, but it delegates the actual work to various other functions dispatched
across several R packages. See getOption("read_write")
.
# Use of read() as a more flexible substitute to data() (can change dataset
# name and syntax more similar to read R datasets and datasets from files)
read() # List all available datasets in your installed version of R
# List datasets in one particular package
read(package = "data.io")
# Read one dataset from this package, possibly changing its name
(urchin <- read("urchin_bio", package = "data.io"))
#> # A data.trame: [421 × 19]
#> origin diameter1 diameter2 height buoyant_weight weight solid_parts
#> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Fishery 9.9 10.2 5 NA 0.522 0.478
#> 2 Fishery 10.5 10.6 5.7 NA 0.642 0.589
#> 3 Fishery 10.8 10.8 5.2 NA 0.734 0.677
#> 4 Fishery 9.6 9.3 4.6 NA 0.370 0.344
#> 5 Fishery 10.4 10.7 4.8 NA 0.610 0.559
#> 6 Fishery 10.5 11.1 5 NA 0.610 0.551
#> 7 Fishery 11 11 5.2 NA 0.672 0.605
#> 8 Fishery 11.1 11.2 5.7 NA 0.703 0.628
#> 9 Fishery 9.4 9.2 4.6 NA 0.413 0.375
#> 10 Fishery 10.1 9.5 4.7 NA 0.449 0.398
#> # ℹ 411 more rows
#> # ℹ 12 more variables: integuments <dbl>, dry_integuments <dbl>,
#> # digestive_tract <dbl>, dry_digestive_tract <dbl>, gonads <dbl>,
#> # dry_gonads <dbl>, skeleton <dbl>, lantern <dbl>, test <dbl>, spines <dbl>,
#> # maturity <int>, sex <fct>
# Same, but using labels in French
(urchin <- read("urchin_bio", package = "data.io", lang = "fr"))
#> # A data.trame: [421 × 19]
#> origin diameter1 diameter2 height buoyant_weight weight solid_parts
#> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Fishery 9.9 10.2 5 NA 0.522 0.478
#> 2 Fishery 10.5 10.6 5.7 NA 0.642 0.589
#> 3 Fishery 10.8 10.8 5.2 NA 0.734 0.677
#> 4 Fishery 9.6 9.3 4.6 NA 0.370 0.344
#> 5 Fishery 10.4 10.7 4.8 NA 0.610 0.559
#> 6 Fishery 10.5 11.1 5 NA 0.610 0.551
#> 7 Fishery 11 11 5.2 NA 0.672 0.605
#> 8 Fishery 11.1 11.2 5.7 NA 0.703 0.628
#> 9 Fishery 9.4 9.2 4.6 NA 0.413 0.375
#> 10 Fishery 10.1 9.5 4.7 NA 0.449 0.398
#> # ℹ 411 more rows
#> # ℹ 12 more variables: integuments <dbl>, dry_integuments <dbl>,
#> # digestive_tract <dbl>, dry_digestive_tract <dbl>, gonads <dbl>,
#> # dry_gonads <dbl>, skeleton <dbl>, lantern <dbl>, test <dbl>, spines <dbl>,
#> # maturity <int>, sex <fct>
# ... and also the levels of factors in French (note: uppercase FR)
(urchin <- read("urchin_bio", package = "data.io", lang = "FR"))
#> # A data.trame: [421 × 19]
#> origin diameter1 diameter2 height buoyant_weight weight solid_parts
#> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Pêcherie 9.9 10.2 5 NA 0.522 0.478
#> 2 Pêcherie 10.5 10.6 5.7 NA 0.642 0.589
#> 3 Pêcherie 10.8 10.8 5.2 NA 0.734 0.677
#> 4 Pêcherie 9.6 9.3 4.6 NA 0.370 0.344
#> 5 Pêcherie 10.4 10.7 4.8 NA 0.610 0.559
#> 6 Pêcherie 10.5 11.1 5 NA 0.610 0.551
#> 7 Pêcherie 11 11 5.2 NA 0.672 0.605
#> 8 Pêcherie 11.1 11.2 5.7 NA 0.703 0.628
#> 9 Pêcherie 9.4 9.2 4.6 NA 0.413 0.375
#> 10 Pêcherie 10.1 9.5 4.7 NA 0.449 0.398
#> # ℹ 411 more rows
#> # ℹ 12 more variables: integuments <dbl>, dry_integuments <dbl>,
#> # digestive_tract <dbl>, dry_digestive_tract <dbl>, gonads <dbl>,
#> # dry_gonads <dbl>, skeleton <dbl>, lantern <dbl>, test <dbl>, spines <dbl>,
#> # maturity <int>, sex <fct>
# Read one dataset from another package, but with labels and comments
data(iris) # The R way: you got the initial datasets
# Same result, using read()
ir2 <- read("iris", package = "datasets", lang = NULL)
# ir2 records that it comes from datasets::iris
attr(comment(ir2), "src")
#> [1] "datasets::iris"
# otherwise, it is identical to iris, except is may be a data.table or a
# tibble, depending on user preferences
comment(ir2) <- NULL
# Force coercion into a data.frame
ir2 <- svBase::as_dtf(ir2)
identical(iris, ir2)
#> [1] TRUE
# More interesting: you can get an enhanced version of iris with read():
# (note that variable names ar in snake-case now!)
(ir3 <- read("iris", package = "datasets"))
#> # A data.trame: [150 × 5]
#> sepal_length sepal_width petal_length petal_width species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#> 7 4.6 3.4 1.4 0.3 setosa
#> 8 5 3.4 1.5 0.2 setosa
#> 9 4.4 2.9 1.4 0.2 setosa
#> 10 4.9 3.1 1.5 0.1 setosa
#> # ℹ 140 more rows
class(ir3)
#> [1] "data.trame" "data.frame"
comment(ir3)
#> [1] "The 'iris' from 'datasets', but with variables names in snake_case"
#> [2] "(Sepal.Length -> sepal_length, Species -> species)."
#> attr(,"lang")
#> [1] "en"
#> attr(,"lang_encoding")
#> [1] "UTF-8"
#> attr(,"src")
#> [1] "datasets::iris"
ir3$sepal_length
#> [1] 5.1 4.9 4.7 4.6 5.0 5.4 4.6 5.0 4.4 4.9 5.4 4.8 4.8 4.3 5.8 5.7 5.4 5.1
#> [19] 5.7 5.1 5.4 5.1 4.6 5.1 4.8 5.0 5.0 5.2 5.2 4.7 4.8 5.4 5.2 5.5 4.9 5.0
#> [37] 5.5 4.9 4.4 5.1 5.0 4.5 4.4 5.0 5.1 4.8 5.1 4.6 5.3 5.0 7.0 6.4 6.9 5.5
#> [55] 6.5 5.7 6.3 4.9 6.6 5.2 5.0 5.9 6.0 6.1 5.6 6.7 5.6 5.8 6.2 5.6 5.9 6.1
#> [73] 6.3 6.1 6.4 6.6 6.8 6.7 6.0 5.7 5.5 5.5 5.8 6.0 5.4 6.0 6.7 6.3 5.6 5.5
#> [91] 5.5 6.1 5.8 5.0 5.6 5.7 5.7 6.2 5.1 5.7 6.3 5.8 7.1 6.3 6.5 7.6 4.9 7.3
#> [109] 6.7 7.2 6.5 6.4 6.8 5.7 5.8 6.4 6.5 7.7 7.7 6.0 6.9 5.6 7.7 6.3 6.7 7.2
#> [127] 6.2 6.1 6.4 7.2 7.4 7.9 6.4 6.3 6.1 7.7 6.3 6.4 6.0 6.9 6.7 6.9 5.8 6.8
#> [145] 6.7 6.7 6.3 6.5 6.2 5.9
#> attr(,"label")
#> [1] "Length of the sepals"
#> attr(,"units")
#> [1] "cm"
# ... and you can get it in French too!
(ir_fr <- read("iris", package = "datasets", lang = "fr"))
#> # A data.trame: [150 × 5]
#> sepal_length sepal_width petal_length petal_width species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#> 7 4.6 3.4 1.4 0.3 setosa
#> 8 5 3.4 1.5 0.2 setosa
#> 9 4.4 2.9 1.4 0.2 setosa
#> 10 4.9 3.1 1.5 0.1 setosa
#> # ℹ 140 more rows
class(ir_fr)
#> [1] "data.trame" "data.frame"
comment(ir_fr)
#> [1] "Jeu de données 'iris' de 'datasets', mais avec noms de variables modifiées"
#> [2] "(Sepal.Length -> sepal_length, Species -> species)."
#> attr(,"lang")
#> [1] "fr"
#> attr(,"lang_encoding")
#> [1] "UTF-8"
#> attr(,"src")
#> [1] "datasets::iris"
ir_fr$sepal_length
#> [1] 5.1 4.9 4.7 4.6 5.0 5.4 4.6 5.0 4.4 4.9 5.4 4.8 4.8 4.3 5.8 5.7 5.4 5.1
#> [19] 5.7 5.1 5.4 5.1 4.6 5.1 4.8 5.0 5.0 5.2 5.2 4.7 4.8 5.4 5.2 5.5 4.9 5.0
#> [37] 5.5 4.9 4.4 5.1 5.0 4.5 4.4 5.0 5.1 4.8 5.1 4.6 5.3 5.0 7.0 6.4 6.9 5.5
#> [55] 6.5 5.7 6.3 4.9 6.6 5.2 5.0 5.9 6.0 6.1 5.6 6.7 5.6 5.8 6.2 5.6 5.9 6.1
#> [73] 6.3 6.1 6.4 6.6 6.8 6.7 6.0 5.7 5.5 5.5 5.8 6.0 5.4 6.0 6.7 6.3 5.6 5.5
#> [91] 5.5 6.1 5.8 5.0 5.6 5.7 5.7 6.2 5.1 5.7 6.3 5.8 7.1 6.3 6.5 7.6 4.9 7.3
#> [109] 6.7 7.2 6.5 6.4 6.8 5.7 5.8 6.4 6.5 7.7 7.7 6.0 6.9 5.6 7.7 6.3 6.7 7.2
#> [127] 6.2 6.1 6.4 7.2 7.4 7.9 6.4 6.3 6.1 7.7 6.3 6.4 6.0 6.9 6.7 6.9 5.8 6.8
#> [145] 6.7 6.7 6.3 6.5 6.2 5.9
#> attr(,"label")
#> [1] "Longueur des sépales"
#> attr(,"units")
#> [1] "cm"
# Sometimes, datasets are more deeply reworked. For instance, trees has
# variables in imperial units (in, ft, and cubic ft), but it is automatically
# reworked by read() into metric variables (m or m^3):
data(trees)
head(trees)
#> Girth Height Volume
#> 1 8.3 70 10.3
#> 2 8.6 65 10.3
#> 3 8.8 63 10.2
#> 4 10.5 72 16.4
#> 5 10.7 81 18.8
#> 6 10.8 83 19.7
(trees2 <- read("trees", package = "datasets"))
#> # A data.trame: [31 × 3]
#> diameter height volume
#> <dbl> <dbl> <dbl>
#> 1 0.211 21.3 0.292
#> 2 0.218 19.8 0.292
#> 3 0.224 19.2 0.289
#> 4 0.267 21.9 0.464
#> 5 0.272 24.7 0.532
#> 6 0.274 25.3 0.558
#> 7 0.279 20.1 0.442
#> 8 0.279 22.9 0.515
#> 9 0.282 24.4 0.64
#> 10 0.284 22.9 0.563
#> # ℹ 21 more rows
comment(trees2)
#> [1] "The 'trees' from 'datasets' but with variables renamed and in m or m^3"
#> [2] "(Girth [in] -> diameter [m], Height [ft] -> height [m],"
#> [3] "Volume [ft^3] -> volume [m^3])."
#> attr(,"lang")
#> [1] "en"
#> attr(,"lang_encoding")
#> [1] "UTF-8"
#> attr(,"src")
#> [1] "datasets::trees"
trees2$volume
#> [1] 0.292 0.292 0.289 0.464 0.532 0.558 0.442 0.515 0.640 0.563 0.685 0.595
#> [13] 0.606 0.603 0.541 0.629 0.957 0.776 0.728 0.705 0.977 0.898 1.028 1.085
#> [25] 1.206 1.569 1.577 1.651 1.458 1.444 2.180
#> attr(,"label")
#> [1] "Volume of timber"
#> attr(,"units")
#> [1] "m^3"
# \donttest{
# Read from a Github Gist (need to specify the type here!)
# (ble <- read$csv("http://tinyurl.com/Biostat-Ble"))
# Various versions of the famous iris dataset
(iris <- read(data_example("iris.csv")))
#> # A data.trame: [150 × 5]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#> 7 4.6 3.4 1.4 0.3 setosa
#> 8 5 3.4 1.5 0.2 setosa
#> 9 4.4 2.9 1.4 0.2 setosa
#> 10 4.9 3.1 1.5 0.1 setosa
#> # ℹ 140 more rows
(iris <- read(data_example("iris.csv.zip")))
#> # A data.trame: [150 × 5]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#> 7 4.6 3.4 1.4 0.3 setosa
#> 8 5 3.4 1.5 0.2 setosa
#> 9 4.4 2.9 1.4 0.2 setosa
#> 10 4.9 3.1 1.5 0.1 setosa
#> # ℹ 140 more rows
(iris <- read(data_example("iris.csv.gz")))
#> # A data.trame: [150 × 5]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#> 7 4.6 3.4 1.4 0.3 setosa
#> 8 5 3.4 1.5 0.2 setosa
#> 9 4.4 2.9 1.4 0.2 setosa
#> 10 4.9 3.1 1.5 0.1 setosa
#> # ℹ 140 more rows
(iris <- read(data_example("iris.csv.bz2")))
#> # A data.trame: [150 × 5]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#> 7 4.6 3.4 1.4 0.3 setosa
#> 8 5 3.4 1.5 0.2 setosa
#> 9 4.4 2.9 1.4 0.2 setosa
#> 10 4.9 3.1 1.5 0.1 setosa
#> # ℹ 140 more rows
(iris <- read(data_example("iris.tsv")))
#> # A data.trame: [150 × 5]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#> 7 4.6 3.4 1.4 0.3 setosa
#> 8 5 3.4 1.5 0.2 setosa
#> 9 4.4 2.9 1.4 0.2 setosa
#> 10 4.9 3.1 1.5 0.1 setosa
#> # ℹ 140 more rows
(iris <- read(data_example("iris.xls")))
#> New names:
#> • `` -> `...1`
#> # A data.trame: [150 × 5]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#> 7 4.6 3.4 1.4 0.3 setosa
#> 8 5 3.4 1.5 0.2 setosa
#> 9 4.4 2.9 1.4 0.2 setosa
#> 10 4.9 3.1 1.5 0.1 setosa
#> # ℹ 140 more rows
(iris <- read(data_example("iris.xlsx")))
#> New names:
#> • `` -> `...1`
#> # A data.trame: [150 × 5]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#> 7 4.6 3.4 1.4 0.3 setosa
#> 8 5 3.4 1.5 0.2 setosa
#> 9 4.4 2.9 1.4 0.2 setosa
#> 10 4.9 3.1 1.5 0.1 setosa
#> # ℹ 140 more rows
(iris <- read(data_example("iris.rds")))
#> # A data.trame: [150 × 5]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#> 7 4.6 3.4 1.4 0.3 setosa
#> 8 5 3.4 1.5 0.2 setosa
#> 9 4.4 2.9 1.4 0.2 setosa
#> 10 4.9 3.1 1.5 0.1 setosa
#> # ℹ 140 more rows
#(iris <- read(data_example("iris.syd"))) ##
#(iris <- read(data_example("iris.csvy"))) ##
#(iris <- read(data_example("iris.csvy.zip"))) ##
# A file with an header both in English (default) and in French
(iris <- read(data_example("iris_short_header.csv")))
#> # A data.trame: [150 × 5]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <labelled> <labelled> <labelled> <labelled> <fct>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3.0 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5.0 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#> 7 4.6 3.4 1.4 0.3 setosa
#> 8 5.0 3.4 1.5 0.2 setosa
#> 9 4.4 2.9 1.4 0.2 setosa
#> 10 4.9 3.1 1.5 0.1 setosa
#> # ℹ 140 more rows
(iris_fr <- read(data_example("iris_short_header.csv"), lang = "fr"))
#> # A data.trame: [150 × 5]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <labelled> <labelled> <labelled> <labelled> <fct>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3.0 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5.0 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#> 7 4.6 3.4 1.4 0.3 setosa
#> 8 5.0 3.4 1.5 0.2 setosa
#> 9 4.4 2.9 1.4 0.2 setosa
#> 10 4.9 3.1 1.5 0.1 setosa
#> # ℹ 140 more rows
# Headers are also recognized in xls/xlsx files
(iris_fr <- read(data_example("iris_short_header.xls"), lang = "fr"))
#> New names:
#> • `` -> `...1`
#> # A data.trame: [150 × 5]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <labelled> <labelled> <labelled> <labelled> <fct>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3.0 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5.0 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#> 7 4.6 3.4 1.4 0.3 setosa
#> 8 5.0 3.4 1.5 0.2 setosa
#> 9 4.4 2.9 1.4 0.2 setosa
#> 10 4.9 3.1 1.5 0.1 setosa
#> # ℹ 140 more rows
# Read a file with a sidecar file (same name + '.R')
(iris <- read(data_example("iris_sidecar.csv"))) # lang = "en" by default
#> # A data.trame: [150 × 5]
#> sepal_length sepal_width petal_length petal_width species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#> 7 4.6 3.4 1.4 0.3 setosa
#> 8 5 3.4 1.5 0.2 setosa
#> 9 4.4 2.9 1.4 0.2 setosa
#> 10 4.9 3.1 1.5 0.1 setosa
#> # ℹ 140 more rows
(iris <- read(data_example("iris_sidecar.csv"), lang = "EN")) # Full lang
#> # A data.trame: [150 × 5]
#> sepal_length sepal_width petal_length petal_width species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 5.1 3.5 1.4 0.2 I. setosa
#> 2 4.9 3 1.4 0.2 I. setosa
#> 3 4.7 3.2 1.3 0.2 I. setosa
#> 4 4.6 3.1 1.5 0.2 I. setosa
#> 5 5 3.6 1.4 0.2 I. setosa
#> 6 5.4 3.9 1.7 0.4 I. setosa
#> 7 4.6 3.4 1.4 0.3 I. setosa
#> 8 5 3.4 1.5 0.2 I. setosa
#> 9 4.4 2.9 1.4 0.2 I. setosa
#> 10 4.9 3.1 1.5 0.1 I. setosa
#> # ℹ 140 more rows
(iris <- read(data_example("iris_sidecar.csv"), lang = "en_us")) # US (in)
#> # A data.trame: [150 × 5]
#> sepal_length sepal_width petal_length petal_width species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 2.01 1.38 0.551 0.0787 setosa
#> 2 1.93 1.18 0.551 0.0787 setosa
#> 3 1.85 1.26 0.512 0.0787 setosa
#> 4 1.81 1.22 0.591 0.0787 setosa
#> 5 1.97 1.42 0.551 0.0787 setosa
#> 6 2.13 1.54 0.669 0.157 setosa
#> 7 1.81 1.34 0.551 0.118 setosa
#> 8 1.97 1.34 0.591 0.0787 setosa
#> 9 1.73 1.14 0.551 0.0787 setosa
#> 10 1.93 1.22 0.591 0.0394 setosa
#> # ℹ 140 more rows
(iris <- read(data_example("iris_sidecar.csv"), lang = "fr")) # French
#> # A data.trame: [150 × 5]
#> sepal_length sepal_width petal_length petal_width species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#> 7 4.6 3.4 1.4 0.3 setosa
#> 8 5 3.4 1.5 0.2 setosa
#> 9 4.4 2.9 1.4 0.2 setosa
#> 10 4.9 3.1 1.5 0.1 setosa
#> # ℹ 140 more rows
(iris <- read(data_example("iris_sidecar.csv"), lang = "FR_BE")) # Belgian
#> # A data.trame: [150 × 5]
#> sepal_length sepal_width petal_length petal_width species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 5.1 3.5 1.4 0.2 I. setosa
#> 2 4.9 3 1.4 0.2 I. setosa
#> 3 4.7 3.2 1.3 0.2 I. setosa
#> 4 4.6 3.1 1.5 0.2 I. setosa
#> 5 5 3.6 1.4 0.2 I. setosa
#> 6 5.4 3.9 1.7 0.4 I. setosa
#> 7 4.6 3.4 1.4 0.3 I. setosa
#> 8 5 3.4 1.5 0.2 I. setosa
#> 9 4.4 2.9 1.4 0.2 I. setosa
#> 10 4.9 3.1 1.5 0.1 I. setosa
#> # ℹ 140 more rows
(iris <- read(data_example("iris_sidecar.csv"), lang = NULL)) # No labels
#> # A data.trame: [150 × 5]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <fct>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#> 7 4.6 3.4 1.4 0.3 setosa
#> 8 5 3.4 1.5 0.2 setosa
#> 9 4.4 2.9 1.4 0.2 setosa
#> 10 4.9 3.1 1.5 0.1 setosa
#> # ℹ 140 more rows
# Require the feather package
#(iris <- read(data_example("iris.feather"))) # Not available for all Win
# Challenging datasets from the readr package
library(readr)
(mtcars <- read(readr_example("mtcars.csv")))
#> # A data.trame: [32 × 11]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <int> <dbl> <int> <dbl> <dbl> <dbl> <int> <int> <int> <int>
#> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
#> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
#> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
#> 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
#> 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
#> 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
#> 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
#> 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
#> 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
#> 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
#> # ℹ 22 more rows
(mtcars <- read(readr_example("mtcars.csv.zip")))
#> # A data.trame: [32 × 11]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <int> <dbl> <int> <dbl> <dbl> <dbl> <int> <int> <int> <int>
#> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
#> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
#> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
#> 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
#> 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
#> 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
#> 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
#> 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
#> 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
#> 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
#> # ℹ 22 more rows
(mtcars <- read(readr_example("mtcars.csv.bz2")))
#> # A data.trame: [32 × 11]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <int> <dbl> <int> <dbl> <dbl> <dbl> <int> <int> <int> <int>
#> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
#> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
#> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
#> 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
#> 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
#> 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
#> 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
#> 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
#> 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
#> 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
#> # ℹ 22 more rows
(challenge <- read(readr_example("challenge.csv")))
#> # A data.trame: [2,000 × 2]
#> x y
#> <dbl> <IDate>
#> 1 404 NA
#> 2 4172 NA
#> 3 3004 NA
#> 4 787 NA
#> 5 37 NA
#> 6 2332 NA
#> 7 2489 NA
#> 8 1449 NA
#> 9 3665 NA
#> 10 3863 NA
#> # ℹ 1,990 more rows
# Or using readr::read_csv()... There are differences!
(challenge2 <- read$csv_alt(readr_example("challenge.csv"), guess_max = 1001))
#> Rows: 2000 Columns: 2
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> dbl (1): x
#> date (1): y
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A data.trame: [2,000 × 2]
#> x y
#> <dbl> <date>
#> 1 404 NA
#> 2 4172 NA
#> 3 3004 NA
#> 4 787 NA
#> 5 37 NA
#> 6 2332 NA
#> 7 2489 NA
#> 8 1449 NA
#> 9 3665 NA
#> 10 3863 NA
#> # ℹ 1,990 more rows
sapply(challenge, class)
#> $x
#> [1] "numeric"
#>
#> $y
#> [1] "IDate" "Date"
#>
sapply(challenge2, class)
#> x y
#> "numeric" "Date"
(massey <- read(readr_example("massey-rating.txt")))
#> [1] "UCC PAY LAZ KPK RT COF BIH DII ENG ACU Rank Team Conf\n 1 1 1 1 1 1 1 1 1 1 1 Ohio St B10 \n 2 2 2 2 2 2 2 2 4 2 2 Oregon P12 \n 3 4 3 4 3 4 3 4 2 3 3 Alabama SEC \n 4 3 4 3 4 3 5 3 3 4 4 TCU B12 \n 6 6 6 5 5 7 6 5 6 11 5 Michigan St B10 \n 7 7 7 6 7 6 11 8 7 8 6 Georgia SEC \n 5 5 5 7 6 8 4 6 5 5 7 Florida St ACC \n 8 8 9 9 10 5 7 7 10 7 8 Baylor B12 \n 9 11 8 13 11 11 12 9 14 9 9 Georgia Tech ACC \n 13 10 13 11 8 9 10 11 9 10 10 Mississippi SEC \n"
# By default, the type cannot be guessed from the extension
# This is a space-separated vaules file (ssv)
(massey <- read(readr_example("massey-rating.txt"), type = "ssv"))
#>
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#> UCC = col_double(),
#> PAY = col_double(),
#> LAZ = col_double(),
#> KPK = col_double(),
#> RT = col_double(),
#> COF = col_double(),
#> BIH = col_double(),
#> DII = col_double(),
#> ENG = col_double(),
#> ACU = col_double(),
#> Rank = col_double(),
#> Team = col_character(),
#> Conf = col_character()
#> )
#> Warning: 10 parsing failures.
#> row col expected actual file
#> 1 -- 13 columns 15 columns '/home/runner/work/_temp/Library/readr/extdata/massey-rating.txt'
#> 2 -- 13 columns 14 columns '/home/runner/work/_temp/Library/readr/extdata/massey-rating.txt'
#> 3 -- 13 columns 14 columns '/home/runner/work/_temp/Library/readr/extdata/massey-rating.txt'
#> 4 -- 13 columns 14 columns '/home/runner/work/_temp/Library/readr/extdata/massey-rating.txt'
#> 5 -- 13 columns 15 columns '/home/runner/work/_temp/Library/readr/extdata/massey-rating.txt'
#> ... ... .......... .......... .................................................................
#> See problems(...) for more details.
#> # A data.trame: [10 × 13]
#> UCC PAY LAZ KPK RT COF BIH DII ENG ACU Rank Team Conf
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 1 1 1 1 1 1 1 1 1 1 1 Ohio St
#> 2 2 2 2 2 2 2 2 2 4 2 2 Oreg… P12
#> 3 3 4 3 4 3 4 3 4 2 3 3 Alab… SEC
#> 4 4 3 4 3 4 3 5 3 3 4 4 TCU B12
#> 5 6 6 6 5 5 7 6 5 6 11 5 Mich… St
#> 6 7 7 7 6 7 6 11 8 7 8 6 Geor… SEC
#> 7 5 5 5 7 6 8 4 6 5 5 7 Flor… St
#> 8 8 8 9 9 10 5 7 7 10 7 8 Bayl… B12
#> 9 9 11 8 13 11 11 12 9 14 9 9 Geor… Tech
#> 10 13 10 13 11 8 9 10 11 9 10 10 Miss… SEC
# or ...
(massey <- read$ssv(readr_example("massey-rating.txt")))
#>
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#> UCC = col_double(),
#> PAY = col_double(),
#> LAZ = col_double(),
#> KPK = col_double(),
#> RT = col_double(),
#> COF = col_double(),
#> BIH = col_double(),
#> DII = col_double(),
#> ENG = col_double(),
#> ACU = col_double(),
#> Rank = col_double(),
#> Team = col_character(),
#> Conf = col_character()
#> )
#> Warning: 10 parsing failures.
#> row col expected actual file
#> 1 -- 13 columns 15 columns '/home/runner/work/_temp/Library/readr/extdata/massey-rating.txt'
#> 2 -- 13 columns 14 columns '/home/runner/work/_temp/Library/readr/extdata/massey-rating.txt'
#> 3 -- 13 columns 14 columns '/home/runner/work/_temp/Library/readr/extdata/massey-rating.txt'
#> 4 -- 13 columns 14 columns '/home/runner/work/_temp/Library/readr/extdata/massey-rating.txt'
#> 5 -- 13 columns 15 columns '/home/runner/work/_temp/Library/readr/extdata/massey-rating.txt'
#> ... ... .......... .......... .................................................................
#> See problems(...) for more details.
#> # A data.trame: [10 × 13]
#> UCC PAY LAZ KPK RT COF BIH DII ENG ACU Rank Team Conf
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 1 1 1 1 1 1 1 1 1 1 1 Ohio St
#> 2 2 2 2 2 2 2 2 2 4 2 2 Oreg… P12
#> 3 3 4 3 4 3 4 3 4 2 3 3 Alab… SEC
#> 4 4 3 4 3 4 3 5 3 3 4 4 TCU B12
#> 5 6 6 6 5 5 7 6 5 6 11 5 Mich… St
#> 6 7 7 7 6 7 6 11 8 7 8 6 Geor… SEC
#> 7 5 5 5 7 6 8 4 6 5 5 7 Flor… St
#> 8 8 8 9 9 10 5 7 7 10 7 8 Bayl… B12
#> 9 9 11 8 13 11 11 12 9 14 9 9 Geor… Tech
#> 10 13 10 13 11 8 9 10 11 9 10 10 Miss… SEC
(epa <- read$ssv(readr_example("epa78.txt"), col_names = FALSE))
#>
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#> X1 = col_character(),
#> X2 = col_character(),
#> X3 = col_character(),
#> X4 = col_character(),
#> X5 = col_double()
#> )
#> Warning: 17 parsing failures.
#> row col expected actual file
#> 2 -- 5 columns 10 columns '/home/runner/work/_temp/Library/readr/extdata/epa78.txt'
#> 3 -- 5 columns 6 columns '/home/runner/work/_temp/Library/readr/extdata/epa78.txt'
#> 4 -- 5 columns 3 columns '/home/runner/work/_temp/Library/readr/extdata/epa78.txt'
#> 5 -- 5 columns 8 columns '/home/runner/work/_temp/Library/readr/extdata/epa78.txt'
#> 6 -- 5 columns 8 columns '/home/runner/work/_temp/Library/readr/extdata/epa78.txt'
#> ... ... ......... .......... .........................................................
#> See problems(...) for more details.
#> # A data.trame: [20 × 5]
#> X1 X2 X3 X4 X5
#> <chr> <chr> <chr> <chr> <dbl>
#> 1 ALFA ROMEO ALFA ROMEO 78010003
#> 2 ALFETTA 03 81 8 74
#> 3 SPIDER 2000 01 SPIDER 2000
#> 4 AMC AMC 78020002 NA NA
#> 5 GREMLIN 03 79 9 79
#> 6 PACER 04 89 11 89
#> 7 PACER WAGON 07 90 26
#> 8 CONCORD 04 88 12 90
#> 9 CONCORD WAGON 07 91 30
#> 10 MATADOR COUPE 05 97 14
#> 11 MATADOR SEDAN 06 110 20
#> 12 MATADOR WAGON 09 112 50
#> 13 ASTON MARTIN ASTON MARTIN 78040002
#> 14 ASTON MARTIN ASTON MARTIN 78040053
#> 15 AUDI AUDI 78050002 NA NA
#> 16 FOX 03 84 11 84
#> 17 FOX WAGON 07 83 40
#> 18 5000 04 90 15 90
#> 19 AVANTI AVANTI 78065002 NA NA
#> 20 AVANTI II 02 75 8
(example_log <- read(readr_example("example.log")))
#>
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#> X1 = col_character(),
#> X2 = col_logical(),
#> X3 = col_character(),
#> X4 = col_character(),
#> X5 = col_character(),
#> X6 = col_double(),
#> X7 = col_double()
#> )
#> # A data.trame: [2 × 7]
#> X1 X2 X3 X4 X5 X6 X7
#> <chr> <lgl> <chr> <chr> <chr> <dbl> <dbl>
#> 1 172.21.13.45 NA "Microsoft\\JohnDoe" 08/Apr/2001:17:39:0… GET … 200 3401
#> 2 127.0.0.1 NA "frank" 10/Oct/2000:13:55:3… GET … 200 2326
# There are different ways to specify columns for fixed-width files (fwf)
# See ?read_fwf in package readr
(fwf_sample <- read$fwf(readr_example("fwf-sample.txt"),
col_positions = fwf_cols(name = 20, state = 10, ssn = 12)))
#> Rows: 3 Columns: 3
#> ── Column specification ────────────────────────────────────────────────────────
#>
#> chr (3): name, state, ssn
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A data.trame: [3 × 3]
#> name state ssn
#> <chr> <chr> <chr>
#> 1 John Smith WA 418-Y11-4111
#> 2 Mary Hartford CA 319-Z19-4341
#> 3 Evan Nolan IL 219-532-c301
# Various examples of Excel datasets from readxl
library(readxl)
(xl <- read(readxl_example("datasets.xls")))
#> New names:
#> • `` -> `...1`
#> # A data.trame: [32 × 11]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
#> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
#> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
#> 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
#> 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
#> 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
#> 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
#> 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
#> 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
#> 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
#> # ℹ 22 more rows
(xl <- read(readxl_example("datasets.xlsx"), sheet = "mtcars"))
#> New names:
#> • `` -> `...1`
#> # A data.trame: [32 × 11]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
#> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
#> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
#> 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
#> 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
#> 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
#> 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
#> 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
#> 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
#> 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
#> # ℹ 22 more rows
(xl <- read(readxl_example("datasets.xlsx"), sheet = 3))
#> New names:
#> • `` -> `...1`
#> # A data.trame: [1,000 × 5]
#> lat long depth mag stations
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 -20.4 182. 562 4.8 41
#> 2 -20.6 181. 650 4.2 15
#> 3 -26 184. 42 5.4 43
#> 4 -18.0 182. 626 4.1 19
#> 5 -20.4 182. 649 4 11
#> 6 -19.7 184. 195 4 12
#> 7 -11.7 166. 82 4.8 43
#> 8 -28.1 182. 194 4.4 15
#> 9 -28.7 182. 211 4.7 35
#> 10 -17.5 180. 622 4.3 19
#> # ℹ 990 more rows
# Accomodate a column with disparate types via col_type = "list"
(clip <- read(readxl_example("clippy.xls"), col_types = c("text", "list")))
#> New names:
#> • `` -> `...1`
#> # A data.trame: [4 × 2]
#> name value
#> <chr> <list>
#> 1 Name <chr [1]>
#> 2 Species <chr [1]>
#> 3 Approx date of death <dttm [1]>
#> 4 Weight in grams <dbl [1]>
(clip <- read(readxl_example("clippy.xlsx"), col_types = c("text", "list")))
#> New names:
#> • `` -> `...1`
#> # A data.trame: [4 × 2]
#> name value
#> <chr> <list>
#> 1 Name <chr [1]>
#> 2 Species <chr [1]>
#> 3 Approx date of death <dttm [1]>
#> 4 Weight in grams <dbl [1]>
tibble::deframe(clip)
#> $Name
#> [1] "Clippy"
#>
#> $Species
#> [1] "paperclip"
#>
#> $`Approx date of death`
#> [1] "2007-01-01 UTC"
#>
#> $`Weight in grams`
#> [1] 0.9
#>
# Read from a specific range in a sheet
(xl <- read(readxl_example("datasets.xlsx"), range = "mtcars!B1:D5"))
#> New names:
#> • `` -> `...1`
#> # A data.trame: [4 × 3]
#> cyl disp hp
#> <dbl> <dbl> <dbl>
#> 1 6 160 110
#> 2 6 160 110
#> 3 4 108 93
#> 4 6 258 110
(deaths <- read(readxl_example("deaths.xls"), range = cell_rows(5:15)))
#> New names:
#> • `` -> `...1`
#> # A data.trame: [10 × 6]
#> Name Profession Age `Has kids` `Date of birth` `Date of death`
#> <chr> <chr> <dbl> <lgl> <dttm> <dttm>
#> 1 David Bo… musician 69 TRUE 1947-01-08 00:00:00 2016-01-10 00:00:00
#> 2 Carrie F… actor 60 TRUE 1956-10-21 00:00:00 2016-12-27 00:00:00
#> 3 Chuck Be… musician 90 TRUE 1926-10-18 00:00:00 2017-03-18 00:00:00
#> 4 Bill Pax… actor 61 TRUE 1955-05-17 00:00:00 2017-02-25 00:00:00
#> 5 Prince musician 57 TRUE 1958-06-07 00:00:00 2016-04-21 00:00:00
#> 6 Alan Ric… actor 69 FALSE 1946-02-21 00:00:00 2016-01-14 00:00:00
#> 7 Florence… actor 82 TRUE 1934-02-14 00:00:00 2016-11-24 00:00:00
#> 8 Harper L… author 89 FALSE 1926-04-28 00:00:00 2016-02-19 00:00:00
#> 9 Zsa Zsa … actor 99 TRUE 1917-02-06 00:00:00 2016-12-18 00:00:00
#> 10 George M… musician 53 FALSE 1963-06-25 00:00:00 2016-12-25 00:00:00
(deaths <- read(readxl_example("deaths.xlsx"), range = cell_rows(5:15)))
#> New names:
#> • `` -> `...1`
#> # A data.trame: [10 × 6]
#> Name Profession Age `Has kids` `Date of birth` `Date of death`
#> <chr> <chr> <dbl> <lgl> <dttm> <dttm>
#> 1 David Bo… musician 69 TRUE 1947-01-08 00:00:00 2016-01-10 00:00:00
#> 2 Carrie F… actor 60 TRUE 1956-10-21 00:00:00 2016-12-27 00:00:00
#> 3 Chuck Be… musician 90 TRUE 1926-10-18 00:00:00 2017-03-18 00:00:00
#> 4 Bill Pax… actor 61 TRUE 1955-05-17 00:00:00 2017-02-25 00:00:00
#> 5 Prince musician 57 TRUE 1958-06-07 00:00:00 2016-04-21 00:00:00
#> 6 Alan Ric… actor 69 FALSE 1946-02-21 00:00:00 2016-01-14 00:00:00
#> 7 Florence… actor 82 TRUE 1934-02-14 00:00:00 2016-11-24 00:00:00
#> 8 Harper L… author 89 FALSE 1926-04-28 00:00:00 2016-02-19 00:00:00
#> 9 Zsa Zsa … actor 99 TRUE 1917-02-06 00:00:00 2016-12-18 00:00:00
#> 10 George M… musician 53 FALSE 1963-06-25 00:00:00 2016-12-25 00:00:00
(type_me <- read(readxl_example("type-me.xls"), sheet = "logical_coercion",
col_types = c("logical", "text")))
#> New names:
#> • `` -> `...1`
#> Warning: Expecting logical in A5 / R5C1: got a date
#> Warning: Expecting logical in A8 / R8C1: got 'cabbage'
#> # A data.trame: [10 × 2]
#> `maybe boolean?` description
#> <lgl> <chr>
#> 1 NA "empty"
#> 2 FALSE "0 (numeric)"
#> 3 TRUE "1 (numeric)"
#> 4 NA "datetime"
#> 5 TRUE "boolean true"
#> 6 FALSE "boolean false"
#> 7 NA "\"cabbage\""
#> 8 TRUE "the string \"true\""
#> 9 FALSE "the letter \"F\""
#> 10 FALSE "\"False\" preceded by single quote"
(type_me <- read(readxl_example("type-me.xlsx"), sheet = "numeric_coercion",
col_types = c("numeric", "text")))
#> New names:
#> • `` -> `...1`
#> Warning: Coercing boolean to numeric in A3 / R3C1
#> Warning: Coercing boolean to numeric in A4 / R4C1
#> Warning: Expecting numeric in A5 / R5C1: got a date
#> Warning: Coercing text to numeric in A6 / R6C1: '123456'
#> Warning: Expecting numeric in A8 / R8C1: got 'cabbage'
#> # A data.trame: [7 × 2]
#> `maybe numeric?` explanation
#> <dbl> <chr>
#> 1 NA "empty"
#> 2 1 "boolean true"
#> 3 0 "boolean false"
#> 4 40534 "datetime"
#> 5 123456 "the string \"123456\""
#> 6 123456 "the number 123456"
#> 7 NA "\"cabbage\""
(type_me <- read(readxl_example("type-me.xls"), sheet = "date_coercion",
col_types = c("date", "text")))
#> New names:
#> • `` -> `...1`
#> Warning: Expecting date in A5 / R5C1: got boolean
#> Warning: Expecting date in A6 / R6C1: got 'cabbage'
#> Warning: Coercing numeric to date in A7 / R7C1
#> Warning: Coercing numeric to date in A8 / R8C1
#> # A data.trame: [7 × 2]
#> `maybe a datetime?` explanation
#> <dttm> <chr>
#> 1 NA "empty"
#> 2 2016-05-23 00:00:00 "date only format"
#> 3 2016-04-28 11:30:00 "date and time format"
#> 4 NA "boolean true"
#> 5 NA "\"cabbage\""
#> 6 1904-01-05 07:12:00 "4.3 (numeric)"
#> 7 2012-01-02 00:00:00 "another numeric"
(type_me <- read(readxl_example("type-me.xlsx"), sheet = "text_coercion",
col_types = c("text", "text")))
#> New names:
#> • `` -> `...1`
#> # A data.trame: [6 × 2]
#> text explanation
#> <chr> <chr>
#> 1 NA "empty"
#> 2 cabbage "\"cabbage\""
#> 3 TRUE "boolean true"
#> 4 1.3 "numeric"
#> 5 41175 "datetime"
#> 6 36436153 "another numeric"
(xl <- read(readxl_example("geometry.xls"), col_names = FALSE))
#> New names:
#> • `` -> `...1`
#> • `` -> `...2`
#> • `` -> `...3`
#> # A data.trame: [4 × 3]
#> ...1 ...2 ...3
#> <chr> <chr> <chr>
#> 1 B3 C3 D3
#> 2 B4 C4 D4
#> 3 B5 C5 D5
#> 4 B6 C6 D6
(xl <- read(readxl_example("geometry.xlsx"), range = cell_rows(4:8)))
#> # A data.trame: [4 × 3]
#> B4 C4 D4
#> <chr> <chr> <chr>
#> 1 B5 C5 D5
#> 2 B6 C6 D6
#> 3 NA NA NA
#> 4 NA NA NA
# Various examples from haven
library(haven)
haven_example <- function(path)
system.file("examples", path, package = "haven", mustWork = TRUE)
(iris2 <- read(haven_example("iris.dta"))) # Stata v. 8-14
#> # A data.trame: [150 × 5]
#> sepallength sepalwidth petallength petalwidth species
#> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 5.10 3.5 1.40 0.200 setosa
#> 2 4.90 3 1.40 0.200 setosa
#> 3 4.70 3.20 1.30 0.200 setosa
#> 4 4.60 3.10 1.5 0.200 setosa
#> 5 5 3.60 1.40 0.200 setosa
#> 6 5.40 3.90 1.70 0.400 setosa
#> 7 4.60 3.40 1.40 0.300 setosa
#> 8 5 3.40 1.5 0.200 setosa
#> 9 4.40 2.90 1.40 0.200 setosa
#> 10 4.90 3.10 1.5 0.100 setosa
#> # ℹ 140 more rows
(iris2 <- read(haven_example("iris.sav"))) # SPSS, TODO: labelled -> factor?
#> # A data.trame: [150 × 5]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> <dbl> <dbl> <dbl> <dbl> <dbl+lbl>
#> 1 5.1 3.5 1.4 0.2 1 [setosa]
#> 2 4.9 3 1.4 0.2 1 [setosa]
#> 3 4.7 3.2 1.3 0.2 1 [setosa]
#> 4 4.6 3.1 1.5 0.2 1 [setosa]
#> 5 5 3.6 1.4 0.2 1 [setosa]
#> 6 5.4 3.9 1.7 0.4 1 [setosa]
#> 7 4.6 3.4 1.4 0.3 1 [setosa]
#> 8 5 3.4 1.5 0.2 1 [setosa]
#> 9 4.4 2.9 1.4 0.2 1 [setosa]
#> 10 4.9 3.1 1.5 0.1 1 [setosa]
#> # ℹ 140 more rows
(pbc <- read(data_example("pbc.por"))) # SPSS, POR format
#> # A data.trame: [418 × 20]
#> AGE ALB ALKPHOS ASCITES BILI CHOL EDEMA EDTRT HEPMEG TIME PLATELET
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 58.8 2.6 1718 1 14.5 261 1 1 1 400 190
#> 2 56.4 4.14 7395. 0 1.1 302 0 0 1 4500 221
#> 3 70.1 3.48 516 0 1.4 176 1 0.5 0 1012 151
#> 4 54.7 2.54 6122. 0 1.8 244 1 0.5 1 1925 183
#> 5 38.1 3.53 671 0 3.4 279 0 0 1 1504 136
#> 6 66.3 3.98 944 0 0.8 248 0 0 1 2503 -9
#> 7 55.5 4.09 824 0 1 322 0 0 1 1832 204
#> 8 53.1 4 4651. 0 0.3 280 0 0 0 2466 373
#> 9 42.5 3.08 2276 0 3.2 562 0 0 0 2400 251
#> 10 70.6 2.74 918 1 12.6 200 1 1 0 51 302
#> # ℹ 408 more rows
#> # ℹ 9 more variables: PROTIME <dbl>, SEX <dbl>, SGOT <dbl>, SPIDERS <dbl>,
#> # STAGE <dbl>, STATUS <dbl>, TRT <dbl>, TRIG <dbl>, COPPER <dbl>
(iris2 <- read$sas(haven_example("iris.sas7bdat"))) # SAS file
#> # A data.trame: [150 × 5]
#> Sepal_Length Sepal_Width Petal_Length Petal_Width Species
#> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#> 7 4.6 3.4 1.4 0.3 setosa
#> 8 5 3.4 1.5 0.2 setosa
#> 9 4.4 2.9 1.4 0.2 setosa
#> 10 4.9 3.1 1.5 0.1 setosa
#> # ℹ 140 more rows
(afalfa <- read(data_example("afalfa.xpt"))) # SAS transport file
#> # A data.trame: [40 × 6]
#> POP SAMPLE REP SEEDWT HARV1 HARV2
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 min 0 1 64 172. 180.
#> 2 min 1 1 54 138. 151.
#> 3 min 2 1 40 146. 129.
#> 4 min 3 1 45 170. 191.
#> 5 min 4 1 64 125. 173.
#> 6 MAX 5 1 75 179 235.
#> 7 MAX 6 1 45 166. 174.
#> 8 MAX 7 1 63 170. 156.
#> 9 MAX 8 1 65 193. 178.
#> 10 MAX 9 1 59 186. 179.
#> # ℹ 30 more rows
# Note that where completion is available, you have a completion list of file
# format after typing read$<tab>
# }