Most metrics in supervised classifications are sensitive to the relative proportion of the items in the different classes. When a confusion matrix is calculated on a test set, it uses the proportions observed on that test set. If they are representative of the proportions in the population, metrics are not biased. When it is not the case, priors of a confusion object can be adjusted to better reflect proportions that are supposed to be observed in the different classes in order to get more accurate metrics.
Usage
prior(object, ...)
# S3 method for confusion
prior(object, ...)
prior(object, ...) <- value
# S3 method for confusion
prior(object, ...) <- value
Arguments
- object
a confusion object (or another class if a method is implemented)
- ...
further arguments passed to methods
- value
a (named) vector of positive numbers of zeros of the same length as the number of classes in the confusion object. It can also be a single >= 0 number and in this case, equal probabilities are applied to all the classes (use 1 for relative frequencies and 100 for relative frequencies in percent). If the value has zero length or is
NULL
, original prior probabilities (from the test set) are used. If the vector is named, names must correspond to existing class names in the confusion object.
Value
prior()
returns the current class frequencies associated with
the first classification tabulated in the confusion object, i.e., for
rows in the confusion matrix.
Examples
data("Glass", package = "mlbench")
# Use a little bit more informative labels for Type
Glass$Type <- as.factor(paste("Glass", Glass$Type))
# Use learning vector quantization to classify the glass types
# (using default parameters)
summary(glass_lvq <- ml_lvq(Type ~ ., data = Glass))
#> Codebook:
#> Class RI Na Mg Al Si K
#> 39 Glass 1 1.521277 14.36424 4.0239650 0.2303602 72.02336 0.05438544
#> 32 Glass 1 1.519196 12.65659 4.8674990 0.4665660 73.85181 -0.70557100
#> 42 Glass 1 1.517072 13.05397 3.5624747 1.1790037 73.04297 0.36584618
#> 62 Glass 1 1.519485 13.65144 3.9084643 1.1892587 72.02706 0.17018101
#> 17 Glass 1 1.517766 12.73950 3.5976780 1.2351181 73.12186 0.58917126
#> 53 Glass 1 1.512427 13.32208 3.9524905 0.4489736 73.89021 -0.41723689
#> 7 Glass 1 1.517945 13.24449 3.6731433 1.0806355 73.10465 0.51905537
#> 65 Glass 1 1.521897 13.29258 3.9273652 0.7334249 72.17823 0.12908872
#> 99 Glass 2 1.517391 12.49545 3.4441722 1.0178870 73.56065 0.20786576
#> 112 Glass 2 1.527390 11.02000 0.0000000 0.7500000 73.08000 0.00000000
#> 90 Glass 2 1.516678 12.63550 4.0566619 1.7910617 73.35970 -0.21845344
#> 123 Glass 2 1.517596 13.34163 3.8999283 1.3457780 72.48911 0.51160215
#> 146 Glass 2 1.518161 12.94614 3.8968416 1.1444819 72.40279 0.61233719
#> 74 Glass 2 1.514592 14.31787 3.7701862 1.4697558 72.65758 0.01233319
#> 140 Glass 2 1.516436 13.01225 3.5639963 1.5689297 73.06020 0.44685888
#> 111 Glass 2 1.529245 12.18383 0.0000000 1.3056921 71.14729 0.23481414
#> 147 Glass 3 1.516647 13.38319 4.2593376 0.9504240 72.69514 0.12440480
#> 153 Glass 3 1.518183 13.64508 3.6166110 0.6917897 72.88964 0.09196932
#> 168 Glass 5 1.520283 11.71126 0.8515020 2.1059832 73.20731 0.56499288
#> 178 Glass 6 1.517372 14.38524 2.2219272 1.2808706 73.32370 -0.14151907
#> 203 Glass 7 1.515268 14.47779 -0.1009342 2.7167221 73.54135 -0.24604345
#> 209 Glass 7 1.514531 14.25746 -0.6027840 3.6910000 73.68328 -1.82460000
#> 214 Glass 7 1.518406 14.80140 0.1955234 1.8482661 72.96822 -0.15294447
#> Ca Ba Fe
#> 39 9.201240 -0.082865256 -0.02213713
#> 32 8.766666 0.000000000 -0.27342900
#> 42 8.617063 -0.016637285 0.07246322
#> 62 8.772069 0.177102475 -0.04529924
#> 17 8.595615 -0.001611879 0.04024410
#> 53 9.165447 -0.272307692 -0.14355668
#> 7 8.361845 -0.077109478 -0.01304638
#> 65 9.642040 0.000000000 0.08271237
#> 99 8.930357 0.015530409 0.07132736
#> 112 14.960000 0.000000000 0.00000000
#> 90 8.192072 0.000000000 0.03076215
#> 123 8.218452 -0.104701969 0.06487314
#> 146 8.644896 0.000000000 0.27817524
#> 74 8.354815 -0.925046496 0.09880781
#> 140 8.164653 -0.044621318 -0.01078383
#> 111 14.260043 0.681301667 0.15537564
#> 147 8.560674 0.000000000 0.00000000
#> 153 8.895123 0.051116040 0.08178566
#> 168 11.387505 0.012140868 0.04744348
#> 178 9.101822 -0.229954212 -0.05084183
#> 203 8.726922 0.840061733 0.02440280
#> 209 9.261000 1.530912000 0.00000000
#> 214 8.307199 1.956745508 0.02198064
# Calculate cross-validated confusion matrix
(glass_conf <- confusion(cvpredict(glass_lvq), Glass$Type))
#> 214 items classified with 141 true positives (error rate = 34.1%)
#> Predicted
#> Actual 01 02 03 04 05 06 (sum) (FNR%)
#> 01 Glass 3 0 11 6 0 0 0 17 100
#> 02 Glass 1 0 58 12 0 0 0 70 17
#> 03 Glass 2 0 21 51 3 0 1 76 33
#> 04 Glass 5 0 0 4 7 0 2 13 46
#> 05 Glass 6 0 2 1 0 2 4 9 78
#> 06 Glass 7 0 2 3 1 0 23 29 21
#> (sum) 0 94 77 11 2 30 214 34
# When the probabilities in each class do not match the proportions in the
# training set, all these calculations are useless. Having an idea of
# the real proportions (so-called, priors), one should first reweight the
# confusion matrix before calculating statistics, for instance:
prior1 <- c(10, 10, 10, 100, 100, 100) # Glass types 1-3 are rare
prior(glass_conf) <- prior1
glass_conf
#> 214 items classified with 141 true positives (error rate = 34.1%)
#> with initial row frequencies:
#> Glass 1 Glass 2 Glass 3 Glass 5 Glass 6 Glass 7
#> 70 76 17 13 9 29
#> Rescaled to:
#> Predicted
#> Actual 01 02 03 04 05 06 (sum) (FNR%)
#> 01 Glass 2 7 0 0 0 3 0 10 33
#> 02 Glass 5 31 54 0 15 0 0 100 46
#> 03 Glass 6 11 0 22 44 22 0 100 78
#> 04 Glass 7 10 3 0 79 7 0 100 21
#> 05 Glass 1 2 0 0 0 8 0 10 17
#> 06 Glass 3 4 0 0 0 6 0 10 100
#> (sum) 64 58 22 139 47 0 330 48
summary(glass_conf, type = c("Fscore", "Recall", "Precision"))
#> 214 items classified with 141 true positives (error = 34.1%)
#>
#> Global statistics on reweighted data:
#> Error rate: 48.4%, F(micro-average): 0.486, F(macro-average): 0.364
#>
#> Fscore Recall Precision
#> Glass 5 0.6829404 0.5384615 0.9333842
#> Glass 7 0.6629332 0.7931034 0.5694678
#> Glass 6 0.3636364 0.2222222 1.0000000
#> Glass 1 0.2925838 0.8285714 0.1776593
#> Glass 2 0.1809270 0.6710526 0.1045589
#> Glass 3 0.0000000 0.0000000 NaN
# This is very different than if glass types 1-3 are abundants!
prior2 <- c(100, 100, 100, 10, 10, 10) # Glass types 1-3 are abundants
prior(glass_conf) <- prior2
glass_conf
#> 214 items classified with 141 true positives (error rate = 34.1%)
#> with initial row frequencies:
#> Glass 1 Glass 2 Glass 3 Glass 5 Glass 6 Glass 7
#> 70 76 17 13 9 29
#> Rescaled to:
#> Predicted
#> Actual 01 02 03 04 05 06 (sum) (FNR%)
#> 01 Glass 2 67 28 0 4 0 1 100 33
#> 02 Glass 1 17 83 0 0 0 0 100 17
#> 03 Glass 3 35 65 0 0 0 0 100 100
#> 04 Glass 5 3 0 0 5 0 2 10 46
#> 05 Glass 6 1 2 0 0 2 4 10 78
#> 06 Glass 7 1 1 0 0 0 8 10 21
#> (sum) 125 178 0 10 2 15 330 50
summary(glass_conf, type = c("Fscore", "Recall", "Precision"))
#> 214 items classified with 141 true positives (error = 34.1%)
#>
#> Global statistics on reweighted data:
#> Error rate: 49.8%, F(micro-average): 0.511, F(macro-average): 0.455
#>
#> Fscore Recall Precision
#> Glass 7 0.6287055 0.7931034 0.5207600
#> Glass 2 0.5971155 0.6710526 0.5378543
#> Glass 1 0.5958663 0.8285714 0.4652113
#> Glass 5 0.5473057 0.5384615 0.5564452
#> Glass 6 0.3636364 0.2222222 1.0000000
#> Glass 3 0.0000000 0.0000000 NaN
# Weight can also be used to construct a matrix of relative frequencies
# In this case, all rows sum to one
prior(glass_conf) <- 1
print(glass_conf, digits = 2)
#> 214 items classified with 141 true positives (error rate = 34.1%)
#> with initial row frequencies:
#> Glass 1 Glass 2 Glass 3 Glass 5 Glass 6 Glass 7
#> 70 76 17 13 9 29
#> Rescaled to:
#> Predicted
#> Actual 01 02 03 04 05 06 (sum) (FNR%)
#> 01 Glass 6 0.22 0.44 0.22 0.00 0.11 0.00 1.00 78.00
#> 02 Glass 7 0.00 0.79 0.07 0.00 0.10 0.03 1.00 21.00
#> 03 Glass 1 0.00 0.00 0.83 0.00 0.17 0.00 1.00 17.00
#> 04 Glass 3 0.00 0.00 0.65 0.00 0.35 0.00 1.00 100.00
#> 05 Glass 2 0.00 0.01 0.28 0.00 0.67 0.04 1.00 33.00
#> 06 Glass 5 0.00 0.15 0.00 0.00 0.31 0.54 1.00 46.00
#> (sum) 0.22 1.40 2.04 0.00 1.72 0.61 6.00 49.00
# However, it is easier to work with relative frequencies in percent
# and one gets a more compact presentation
prior(glass_conf) <- 100
glass_conf
#> 214 items classified with 141 true positives (error rate = 34.1%)
#> with initial row frequencies:
#> Glass 1 Glass 2 Glass 3 Glass 5 Glass 6 Glass 7
#> 70 76 17 13 9 29
#> Rescaled to:
#> Predicted
#> Actual 01 02 03 04 05 06 (sum) (FNR%)
#> 01 Glass 6 22 44 22 0 11 0 100 78
#> 02 Glass 7 0 79 7 0 10 3 100 21
#> 03 Glass 1 0 0 83 0 17 0 100 17
#> 04 Glass 3 0 0 65 0 35 0 100 100
#> 05 Glass 2 0 1 28 0 67 4 100 33
#> 06 Glass 5 0 15 0 0 31 54 100 46
#> (sum) 22 140 204 0 172 61 600 49
# To reset row class frequencies to original propotions, just assign NULL
prior(glass_conf) <- NULL
glass_conf
#> 214 items classified with 141 true positives (error rate = 34.1%)
#> Predicted
#> Actual 01 02 03 04 05 06 (sum) (FNR%)
#> 01 Glass 3 0 11 6 0 0 0 17 100
#> 02 Glass 1 0 58 12 0 0 0 70 17
#> 03 Glass 2 0 21 51 3 0 1 76 33
#> 04 Glass 5 0 0 4 7 0 2 13 46
#> 05 Glass 6 0 2 1 0 2 4 9 78
#> 06 Glass 7 0 2 3 1 0 23 29 21
#> (sum) 0 94 77 11 2 30 214 34
prior(glass_conf)
#> Glass 1 Glass 2 Glass 3 Glass 5 Glass 6 Glass 7
#> 70 76 17 13 9 29