Unified (formula-based) interface version of the support vector machine
algorithm provided by e1071::svm()
.
Usage
mlSvm(train, ...)
ml_svm(train, ...)
# S3 method for formula
mlSvm(
formula,
data,
scale = TRUE,
type = NULL,
kernel = "radial",
classwt = NULL,
...,
subset,
na.action
)
# S3 method for default
mlSvm(
train,
response,
scale = TRUE,
type = NULL,
kernel = "radial",
classwt = NULL,
...
)
# S3 method for mlSvm
predict(
object,
newdata,
type = c("class", "membership", "both"),
method = c("direct", "cv"),
na.action = na.exclude,
...
)
Arguments
- train
a matrix or data frame with predictors.
- ...
further arguments passed to the classification or regression method. See
e1071::svm()
.- formula
a formula with left term being the factor variable to predict (for supervised classification), a vector of numbers (for regression) or nothing (for unsupervised classification) and the right term with the list of independent, predictive variables, separated with a plus sign. If the data frame provided contains only the dependent and independent variables, one can use the
class ~ .
short version (that one is strongly encouraged). Variables with minus sign are eliminated. Calculations on variables are possible according to usual formula convention (possibly protected by usingI()
).- data
a data.frame to use as a training set.
- scale
are the variables scaled (so that mean = 0 and standard deviation = 1)?
TRUE
by default. If a vector is provided, it is applied to variables with recycling.- type
For
ml_svm()
/mlSvm()
, the type of classification or regression machine to use. The default value ofNULL
uses"C-classification"
if response variable is factor andeps-regression
if it is numeric. It can also be"nu-classification"
or"nu-regression"
. The "C" and "nu" versions are basically the same but with a different parameterisation. The range of C is from zero to infinity, while the range for nu is from zero to one. A fifth option is"one_classification"
that is specific to novelty detection (find the items that are different from the rest). Forpredict()
, the type of prediction to return."class"
by default, the predicted classes. Other options are"membership"
the membership (number between 0 and 1) to the different classes, or"both"
to return classes and memberships.- kernel
the kernel used by svm, see
e1071::svm()
for further explanations. Can be"radial"
,"linear"
,"polynomial"
or"sigmoid"
.- classwt
priors of the classes. Need not add up to one.
- subset
index vector with the cases to define the training set in use (this argument must be named, if provided).
- na.action
function to specify the action to be taken if
NA
s are found. Forml_svm()
na.fail
is used by default. The calculation is stopped if there is anyNA
in the data. Another option isna.omit
, where cases with missing values on any required variable are dropped (this argument must be named, if provided). For thepredict()
method, the default, and most suitable option, isna.exclude
. In that case, rows withNA
s innewdata=
are excluded from prediction, but reinjected in the final results so that the number of items is still the same (and in the same order asnewdata=
).- response
a vector of factor (classification) or numeric (regression).
- object
an mlSvm object
- newdata
a new dataset with same conformation as the training set (same variables, except may by the class for classification or dependent variable for regression). Usually a test set, or a new dataset to be predicted.
- method
"direct"
(default) or"cv"
."direct"
predicts new cases innewdata=
if this argument is provided, or the cases in the training set if not. Take care that not providingnewdata=
means that you just calculate the self-consistency of the classifier but cannot use the metrics derived from these results for the assessment of its performances. Either use a different data set innewdata=
or use the alternate cross-validation ("cv") technique. If you specifymethod = "cv"
thencvpredict()
is used and you cannot providenewdata=
in that case.
Value
ml_svm()
/mlSvm()
creates an mlSvm, mlearning object
containing the classifier and a lot of additional metadata used by the
functions and methods you can apply to it like predict()
or
cvpredict()
. In case you want to program new functions or extract
specific components, inspect the "unclassed" object using unclass()
.
See also
mlearning()
, cvpredict()
, confusion()
, also e1071::svm()
that actually does the calculation.
Examples
# Prepare data: split into training set (2/3) and test set (1/3)
data("iris", package = "datasets")
train <- c(1:34, 51:83, 101:133)
iris_train <- iris[train, ]
iris_test <- iris[-train, ]
# One case with missing data in train set, and another case in test set
iris_train[1, 1] <- NA
iris_test[25, 2] <- NA
iris_svm <- ml_svm(data = iris_train, Species ~ .)
summary(iris_svm)
#> A mlearning object of class mlSvm (support vector machine):
#> Initial call: mlSvm.formula(formula = Species ~ ., data = iris_train)
#>
#> Call:
#> svm.default(x = sapply(train, as.numeric), y = response, scale = scale,
#> type = type, kernel = kernel, class.weights = classwt, probability = TRUE,
#> .args. = ..1)
#>
#>
#> Parameters:
#> SVM-Type: C-classification
#> SVM-Kernel: radial
#> cost: 1
#>
#> Number of Support Vectors: 42
#>
#> ( 8 17 17 )
#>
#>
#> Number of Classes: 3
#>
#> Levels:
#> setosa versicolor virginica
#>
#>
#>
predict(iris_svm) # Default type is class
#> [1] setosa setosa setosa setosa setosa setosa
#> [7] setosa setosa setosa setosa setosa setosa
#> [13] setosa setosa setosa setosa setosa setosa
#> [19] setosa setosa setosa setosa setosa setosa
#> [25] setosa setosa setosa setosa setosa setosa
#> [31] setosa setosa setosa versicolor versicolor versicolor
#> [37] versicolor versicolor versicolor versicolor versicolor versicolor
#> [43] versicolor versicolor versicolor versicolor versicolor versicolor
#> [49] versicolor versicolor versicolor versicolor versicolor versicolor
#> [55] versicolor versicolor versicolor versicolor versicolor versicolor
#> [61] virginica versicolor versicolor versicolor versicolor versicolor
#> [67] virginica virginica virginica virginica virginica virginica
#> [73] virginica virginica virginica virginica virginica virginica
#> [79] virginica virginica virginica virginica virginica virginica
#> [85] virginica versicolor virginica virginica virginica virginica
#> [91] virginica virginica virginica virginica virginica virginica
#> [97] virginica virginica virginica
#> Levels: setosa versicolor virginica
predict(iris_svm, type = "membership")
#> setosa versicolor virginica
#> 1 0.952774913 0.030871412 0.016353675
#> 2 0.968246698 0.017814836 0.013938466
#> 3 0.962422695 0.022536988 0.015040318
#> 4 0.964604420 0.019090493 0.016305087
#> 5 0.952920314 0.028915268 0.018164418
#> 6 0.964235804 0.019066326 0.016697870
#> 7 0.967382674 0.018700065 0.013917261
#> 8 0.940953600 0.037583456 0.021462944
#> 9 0.962462310 0.022665107 0.014872582
#> 10 0.957392104 0.025273315 0.017334581
#> 11 0.967071625 0.018274876 0.014653499
#> 12 0.957854393 0.025941969 0.016203638
#> 13 0.951230899 0.027373121 0.021395980
#> 14 0.938694244 0.036062239 0.025243518
#> 15 0.930932398 0.037660337 0.031407266
#> 16 0.953219533 0.028252123 0.018528344
#> 17 0.965037235 0.020515079 0.014447686
#> 18 0.942181745 0.037607045 0.020211210
#> 19 0.956476189 0.025217590 0.018306221
#> 20 0.951143415 0.031490947 0.017365638
#> 21 0.959398293 0.023903159 0.016698548
#> 22 0.951029511 0.024514896 0.024455593
#> 23 0.944831578 0.038361661 0.016806761
#> 24 0.962832146 0.021532656 0.015635198
#> 25 0.943963470 0.038356821 0.017679709
#> 26 0.961153875 0.024412795 0.014433330
#> 27 0.963690404 0.021316634 0.014992962
#> 28 0.964058935 0.021234779 0.014706286
#> 29 0.965595320 0.020291904 0.014112777
#> 30 0.960520933 0.024951684 0.014527383
#> 31 0.946807644 0.035864733 0.017327624
#> 32 0.940206660 0.032229594 0.027563746
#> 33 0.943120577 0.031535798 0.025343625
#> 34 0.020963596 0.915961853 0.063074551
#> 35 0.016993047 0.952505456 0.030501498
#> 36 0.017671232 0.819575341 0.162753427
#> 37 0.009421227 0.950487262 0.040091511
#> 38 0.010969701 0.896696242 0.092334057
#> 39 0.010661393 0.968091338 0.021247269
#> 40 0.021361907 0.899152303 0.079485790
#> 41 0.029940743 0.947635850 0.022423407
#> 42 0.012796178 0.969508994 0.017694828
#> 43 0.014664266 0.944933707 0.040402026
#> 44 0.024259722 0.934219500 0.041520778
#> 45 0.013588403 0.964252771 0.022158826
#> 46 0.018468182 0.969363152 0.012168667
#> 47 0.010214462 0.949759268 0.040026270
#> 48 0.023092401 0.969530046 0.007377553
#> 49 0.015699093 0.965414193 0.018886714
#> 50 0.016055557 0.929710698 0.054233745
#> 51 0.015149669 0.980865568 0.003984763
#> 52 0.019918527 0.787181492 0.192899980
#> 53 0.011021435 0.981596035 0.007382529
#> 54 0.022867705 0.619562645 0.357569650
#> 55 0.010899323 0.983325760 0.005774916
#> 56 0.013576732 0.678881884 0.307541384
#> 57 0.010787973 0.976565011 0.012647017
#> 58 0.011807199 0.980322599 0.007870202
#> 59 0.013159217 0.968493106 0.018347677
#> 60 0.014841249 0.898495677 0.086663074
#> 61 0.015575825 0.423357433 0.561066741
#> 62 0.010068025 0.934621023 0.055310951
#> 63 0.019267710 0.975995279 0.004737011
#> 64 0.011135553 0.978660908 0.010203539
#> 65 0.013676848 0.978899513 0.007423639
#> 66 0.011502970 0.983392125 0.005104905
#> 67 0.019235585 0.006796298 0.973968117
#> 68 0.010494832 0.046342605 0.943162563
#> 69 0.010480653 0.005983146 0.983536201
#> 70 0.011887374 0.044459946 0.943652680
#> 71 0.010915737 0.003342065 0.985742198
#> 72 0.013081741 0.007100951 0.979817308
#> 73 0.015892993 0.392434228 0.591672779
#> 74 0.013193374 0.015848064 0.970958562
#> 75 0.013684208 0.030398101 0.955917690
#> 76 0.019045733 0.015013759 0.965940508
#> 77 0.016725909 0.096380607 0.886893484
#> 78 0.010731795 0.029645356 0.959622849
#> 79 0.011009667 0.009681532 0.979308801
#> 80 0.010060303 0.027605123 0.962334574
#> 81 0.014098795 0.004140644 0.981760561
#> 82 0.015411488 0.012837844 0.971750668
#> 83 0.013296118 0.073259072 0.913444810
#> 84 0.024332649 0.023191460 0.952475891
#> 85 0.023234439 0.016001691 0.960763869
#> 86 0.018080835 0.572952116 0.408967049
#> 87 0.011863861 0.005932962 0.982203177
#> 88 0.012470468 0.050151824 0.937377708
#> 89 0.017177365 0.013410635 0.969412000
#> 90 0.012209514 0.170174585 0.817615901
#> 91 0.013586337 0.019251949 0.967161714
#> 92 0.012729685 0.030355806 0.956914509
#> 93 0.012665785 0.251274790 0.736059425
#> 94 0.015233431 0.338331853 0.646434716
#> 95 0.010240692 0.004450885 0.985308423
#> 96 0.017185457 0.118323789 0.864490754
#> 97 0.013799572 0.016625171 0.969575258
#> 98 0.025818577 0.031130094 0.943051329
#> 99 0.010528869 0.002922124 0.986549007
predict(iris_svm, type = "both")
#> $class
#> [1] setosa setosa setosa setosa setosa setosa
#> [7] setosa setosa setosa setosa setosa setosa
#> [13] setosa setosa setosa setosa setosa setosa
#> [19] setosa setosa setosa setosa setosa setosa
#> [25] setosa setosa setosa setosa setosa setosa
#> [31] setosa setosa setosa versicolor versicolor versicolor
#> [37] versicolor versicolor versicolor versicolor versicolor versicolor
#> [43] versicolor versicolor versicolor versicolor versicolor versicolor
#> [49] versicolor versicolor versicolor versicolor versicolor versicolor
#> [55] versicolor versicolor versicolor versicolor versicolor versicolor
#> [61] virginica versicolor versicolor versicolor versicolor versicolor
#> [67] virginica virginica virginica virginica virginica virginica
#> [73] virginica virginica virginica virginica virginica virginica
#> [79] virginica virginica virginica virginica virginica virginica
#> [85] virginica versicolor virginica virginica virginica virginica
#> [91] virginica virginica virginica virginica virginica virginica
#> [97] virginica virginica virginica
#> Levels: setosa versicolor virginica
#>
#> $membership
#> setosa versicolor virginica
#> 1 0.952774913 0.030871412 0.016353675
#> 2 0.968246698 0.017814836 0.013938466
#> 3 0.962422695 0.022536988 0.015040318
#> 4 0.964604420 0.019090493 0.016305087
#> 5 0.952920314 0.028915268 0.018164418
#> 6 0.964235804 0.019066326 0.016697870
#> 7 0.967382674 0.018700065 0.013917261
#> 8 0.940953600 0.037583456 0.021462944
#> 9 0.962462310 0.022665107 0.014872582
#> 10 0.957392104 0.025273315 0.017334581
#> 11 0.967071625 0.018274876 0.014653499
#> 12 0.957854393 0.025941969 0.016203638
#> 13 0.951230899 0.027373121 0.021395980
#> 14 0.938694244 0.036062239 0.025243518
#> 15 0.930932398 0.037660337 0.031407266
#> 16 0.953219533 0.028252123 0.018528344
#> 17 0.965037235 0.020515079 0.014447686
#> 18 0.942181745 0.037607045 0.020211210
#> 19 0.956476189 0.025217590 0.018306221
#> 20 0.951143415 0.031490947 0.017365638
#> 21 0.959398293 0.023903159 0.016698548
#> 22 0.951029511 0.024514896 0.024455593
#> 23 0.944831578 0.038361661 0.016806761
#> 24 0.962832146 0.021532656 0.015635198
#> 25 0.943963470 0.038356821 0.017679709
#> 26 0.961153875 0.024412795 0.014433330
#> 27 0.963690404 0.021316634 0.014992962
#> 28 0.964058935 0.021234779 0.014706286
#> 29 0.965595320 0.020291904 0.014112777
#> 30 0.960520933 0.024951684 0.014527383
#> 31 0.946807644 0.035864733 0.017327624
#> 32 0.940206660 0.032229594 0.027563746
#> 33 0.943120577 0.031535798 0.025343625
#> 34 0.020963596 0.915961853 0.063074551
#> 35 0.016993047 0.952505456 0.030501498
#> 36 0.017671232 0.819575341 0.162753427
#> 37 0.009421227 0.950487262 0.040091511
#> 38 0.010969701 0.896696242 0.092334057
#> 39 0.010661393 0.968091338 0.021247269
#> 40 0.021361907 0.899152303 0.079485790
#> 41 0.029940743 0.947635850 0.022423407
#> 42 0.012796178 0.969508994 0.017694828
#> 43 0.014664266 0.944933707 0.040402026
#> 44 0.024259722 0.934219500 0.041520778
#> 45 0.013588403 0.964252771 0.022158826
#> 46 0.018468182 0.969363152 0.012168667
#> 47 0.010214462 0.949759268 0.040026270
#> 48 0.023092401 0.969530046 0.007377553
#> 49 0.015699093 0.965414193 0.018886714
#> 50 0.016055557 0.929710698 0.054233745
#> 51 0.015149669 0.980865568 0.003984763
#> 52 0.019918527 0.787181492 0.192899980
#> 53 0.011021435 0.981596035 0.007382529
#> 54 0.022867705 0.619562645 0.357569650
#> 55 0.010899323 0.983325760 0.005774916
#> 56 0.013576732 0.678881884 0.307541384
#> 57 0.010787973 0.976565011 0.012647017
#> 58 0.011807199 0.980322599 0.007870202
#> 59 0.013159217 0.968493106 0.018347677
#> 60 0.014841249 0.898495677 0.086663074
#> 61 0.015575825 0.423357433 0.561066741
#> 62 0.010068025 0.934621023 0.055310951
#> 63 0.019267710 0.975995279 0.004737011
#> 64 0.011135553 0.978660908 0.010203539
#> 65 0.013676848 0.978899513 0.007423639
#> 66 0.011502970 0.983392125 0.005104905
#> 67 0.019235585 0.006796298 0.973968117
#> 68 0.010494832 0.046342605 0.943162563
#> 69 0.010480653 0.005983146 0.983536201
#> 70 0.011887374 0.044459946 0.943652680
#> 71 0.010915737 0.003342065 0.985742198
#> 72 0.013081741 0.007100951 0.979817308
#> 73 0.015892993 0.392434228 0.591672779
#> 74 0.013193374 0.015848064 0.970958562
#> 75 0.013684208 0.030398101 0.955917690
#> 76 0.019045733 0.015013759 0.965940508
#> 77 0.016725909 0.096380607 0.886893484
#> 78 0.010731795 0.029645356 0.959622849
#> 79 0.011009667 0.009681532 0.979308801
#> 80 0.010060303 0.027605123 0.962334574
#> 81 0.014098795 0.004140644 0.981760561
#> 82 0.015411488 0.012837844 0.971750668
#> 83 0.013296118 0.073259072 0.913444810
#> 84 0.024332649 0.023191460 0.952475891
#> 85 0.023234439 0.016001691 0.960763869
#> 86 0.018080835 0.572952116 0.408967049
#> 87 0.011863861 0.005932962 0.982203177
#> 88 0.012470468 0.050151824 0.937377708
#> 89 0.017177365 0.013410635 0.969412000
#> 90 0.012209514 0.170174585 0.817615901
#> 91 0.013586337 0.019251949 0.967161714
#> 92 0.012729685 0.030355806 0.956914509
#> 93 0.012665785 0.251274790 0.736059425
#> 94 0.015233431 0.338331853 0.646434716
#> 95 0.010240692 0.004450885 0.985308423
#> 96 0.017185457 0.118323789 0.864490754
#> 97 0.013799572 0.016625171 0.969575258
#> 98 0.025818577 0.031130094 0.943051329
#> 99 0.010528869 0.002922124 0.986549007
#>
# Self-consistency, do not use for assessing classifier performances!
confusion(iris_svm)
#> 99 items classified with 97 true positives (error rate = 2%)
#> Predicted
#> Actual 01 02 03 (sum) (FNR%)
#> 01 setosa 33 0 0 33 0
#> 02 versicolor 0 32 1 33 3
#> 03 virginica 0 1 32 33 3
#> (sum) 33 33 33 99 2
# Use an independent test set instead
confusion(predict(iris_svm, newdata = iris_test), iris_test$Species)
#> 50 items classified with 47 true positives (error rate = 6%)
#> Predicted
#> Actual 01 02 03 04 (sum) (FNR%)
#> 01 setosa 16 0 0 0 16 0
#> 02 NA 0 0 0 0 0
#> 03 versicolor 0 1 15 1 17 12
#> 04 virginica 0 0 1 16 17 6
#> (sum) 16 1 16 17 50 6
# Another dataset
data("HouseVotes84", package = "mlbench")
house_svm <- ml_svm(data = HouseVotes84, Class ~ ., na.action = na.omit)
summary(house_svm)
#> A mlearning object of class mlSvm (support vector machine):
#> Initial call: mlSvm.formula(formula = Class ~ ., data = HouseVotes84, na.action = na.omit)
#>
#> Call:
#> svm.default(x = sapply(train, as.numeric), y = response, scale = scale,
#> type = type, kernel = kernel, class.weights = classwt, probability = TRUE,
#> .args. = ..1)
#>
#>
#> Parameters:
#> SVM-Type: C-classification
#> SVM-Kernel: radial
#> cost: 1
#>
#> Number of Support Vectors: 78
#>
#> ( 43 35 )
#>
#>
#> Number of Classes: 2
#>
#> Levels:
#> democrat republican
#>
#>
#>
# Cross-validated confusion matrix
confusion(cvpredict(house_svm), na.omit(HouseVotes84)$Class)
#> 232 items classified with 224 true positives (error rate = 3.4%)
#> Predicted
#> Actual 01 02 (sum) (FNR%)
#> 01 democrat 118 6 124 5
#> 02 republican 2 106 108 2
#> (sum) 120 112 232 3
# Regression using support vector machine
data(airquality, package = "datasets")
ozone_svm <- ml_svm(data = airquality, Ozone ~ ., na.action = na.omit)
summary(ozone_svm)
#> A mlearning object of class mlSvm (support vector machine):
#> [regression variant]
#> Initial call: mlSvm.formula(formula = Ozone ~ ., data = airquality, na.action = na.omit)
#>
#> Call:
#> svm.default(x = sapply(train, as.numeric), y = response, scale = scale,
#> type = type, kernel = kernel, class.weights = classwt, probability = TRUE,
#> .args. = ..1)
#>
#>
#> Parameters:
#> SVM-Type: eps-regression
#> SVM-Kernel: radial
#> cost: 1
#> gamma: 0.2
#> epsilon: 0.1
#>
#> Sigma: 0.3644775
#>
#>
#> Number of Support Vectors: 90
#>
#>
#>
#>
#>
plot(na.omit(airquality)$Ozone, predict(ozone_svm))
abline(a = 0, b = 1)