Unified (formula-based) interface version of the learning vector quantization
algorithms provided by class::olvq1()
, class::lvq1()
, class::lvq2()
,
and class::lvq3()
.
Usage
mlLvq(train, ...)
ml_lvq(train, ...)
# S3 method for formula
mlLvq(
formula,
data,
k.nn = 5,
size,
prior,
algorithm = "olvq1",
...,
subset,
na.action
)
# S3 method for default
mlLvq(train, response, k.nn = 5, size, prior, algorithm = "olvq1", ...)
# S3 method for mlLvq
summary(object, ...)
# S3 method for summary.mlLvq
print(x, ...)
# S3 method for mlLvq
predict(
object,
newdata,
type = "class",
method = c("direct", "cv"),
na.action = na.exclude,
...
)
Arguments
- train
a matrix or data frame with predictors.
- ...
further arguments passed to the classification method or its
predict()
method (not used here for now).- formula
a formula with left term being the factor variable to predict and the right term with the list of independent, predictive variables, separated with a plus sign. If the data frame provided contains only the dependent and independent variables, one can use the
class ~ .
short version (that one is strongly encouraged). Variables with minus sign are eliminated. Calculations on variables are possible according to usual formula convention (possibly protected by usingI()
).- data
a data.frame to use as a training set.
- k.nn
k used for k-NN number of neighbor considered. Default is 5.
- size
the size of the codebook. Defaults to min(round(0.4 \* nc \* (nc - 1 + p/2),0), n) where nc is the number of classes.
- prior
probabilities to represent classes in the codebook (default values are the proportions in the training set).
- algorithm
"olvq1"
(by default, the optimized 'lvq1' version), or"lvq1"
,"lvq2"
,"lvq3"
.- subset
index vector with the cases to define the training set in use (this argument must be named, if provided).
- na.action
function to specify the action to be taken if
NA
s are found. For [ml_lvq)]na.fail
is used by default. The calculation is stopped if there is anyNA
in the data. Another option isna.omit
, where cases with missing values on any required variable are dropped (this argument must be named, if provided). For thepredict()
method, the default, and most suitable option, isna.exclude
. In that case, rows withNA
s innewdata=
are excluded from prediction, but reinjected in the final results so that the number of items is still the same (and in the same order asnewdata=
).[ml_lvq)]: R:ml_lvq)
- response
a vector of factor of the classes.
- x, object
an mlLvq object
- newdata
a new dataset with same conformation as the training set (same variables, except may by the class for classification or dependent variable for regression). Usually a test set, or a new dataset to be predicted.
- type
the type of prediction to return. For this method, only
"class"
is accepted, and it is the default. It returns the predicted classes.- method
"direct"
(default) or"cv"
."direct"
predicts new cases innewdata=
if this argument is provided, or the cases in the training set if not. Take care that not providingnewdata=
means that you just calculate the self-consistency of the classifier but cannot use the metrics derived from these results for the assessment of its performances. Either use a different dataset innewdata=
or use the alternate cross-validation ("cv") technique. If you specifymethod = "cv"
thencvpredict()
is used and you cannot providenewdata=
in that case.
Value
ml_lvq()
/mlLvq()
creates an mlLvq, mlearning object
containing the classifier and a lot of additional metadata used by the
functions and methods you can apply to it like predict()
or
cvpredict()
. In case you want to program new functions or extract
specific components, inspect the "unclassed" object using unclass()
.
See also
mlearning()
, cvpredict()
, confusion()
, also class::olvq1()
,
class::lvq1()
, class::lvq2()
, and class::lvq3()
that actually do the
classification.
Examples
# Prepare data: split into training set (2/3) and test set (1/3)
data("iris", package = "datasets")
train <- c(1:34, 51:83, 101:133)
iris_train <- iris[train, ]
iris_test <- iris[-train, ]
# One case with missing data in train set, and another case in test set
iris_train[1, 1] <- NA
iris_test[25, 2] <- NA
iris_lvq <- ml_lvq(data = iris_train, Species ~ .)
summary(iris_lvq)
#> Codebook:
#> Class Sepal.Length Sepal.Width Petal.Length Petal.Width
#> 9 setosa 4.660377 3.100943 1.433962 0.1773585
#> 8 setosa 5.230061 3.622699 1.507362 0.2625767
#> 60 versicolor 5.562766 2.612224 3.855116 1.1475967
#> 69 versicolor 6.465909 2.950533 4.420159 1.3611252
#> 109 virginica 6.378270 2.903770 5.501573 2.0378277
#> 106 virginica 7.575410 3.265574 6.472131 2.1245902
predict(iris_lvq) # This object only returns classes
#> [1] setosa setosa setosa setosa setosa setosa
#> [7] setosa setosa setosa setosa setosa setosa
#> [13] setosa setosa setosa setosa setosa setosa
#> [19] setosa setosa setosa setosa setosa setosa
#> [25] setosa setosa setosa setosa setosa setosa
#> [31] setosa setosa setosa versicolor versicolor versicolor
#> [37] versicolor versicolor versicolor versicolor versicolor versicolor
#> [43] versicolor versicolor versicolor versicolor versicolor versicolor
#> [49] versicolor versicolor versicolor versicolor versicolor versicolor
#> [55] versicolor versicolor versicolor versicolor versicolor versicolor
#> [61] virginica versicolor versicolor versicolor versicolor versicolor
#> [67] virginica virginica virginica virginica virginica virginica
#> [73] versicolor virginica virginica virginica virginica virginica
#> [79] virginica virginica virginica virginica virginica virginica
#> [85] virginica versicolor virginica virginica virginica virginica
#> [91] virginica virginica versicolor virginica virginica virginica
#> [97] virginica virginica virginica
#> Levels: setosa versicolor virginica
#' # Self-consistency, do not use for assessing classifier performances!
confusion(iris_lvq)
#> 99 items classified with 95 true positives (error rate = 4%)
#> Predicted
#> Actual 01 02 03 (sum) (FNR%)
#> 01 setosa 33 0 0 33 0
#> 02 versicolor 0 32 1 33 3
#> 03 virginica 0 3 30 33 9
#> (sum) 33 35 31 99 4
# Use an independent test set instead
confusion(predict(iris_lvq, newdata = iris_test), iris_test$Species)
#> 50 items classified with 47 true positives (error rate = 6%)
#> Predicted
#> Actual 01 02 03 04 (sum) (FNR%)
#> 01 setosa 16 0 0 0 16 0
#> 02 NA 0 0 0 0 0
#> 03 versicolor 0 1 15 1 17 12
#> 04 virginica 0 0 1 16 17 6
#> (sum) 16 1 16 17 50 6