Skip to contents
loading...

Most metrics in supervised classifications are sensitive to the relative proportion of the items in the different classes. When a confusion matrix is calculated on a test set, it uses the proportions observed on that test set. If they are representative of the proportions in the population, metrics are not biased. When it is not the case, priors of a confusion object can be adjusted to better reflect proportions that are supposed to be observed in the different classes in order to get more accurate metrics.

Usage

prior(object, ...)

# S3 method for confusion
prior(object, ...)

prior(object, ...) <- value

# S3 method for confusion
prior(object, ...) <- value

Arguments

object

a confusion object (or another class if a method is implemented)

...

further arguments passed to methods

value

a (named) vector of positive numbers of zeros of the same length as the number of classes in the confusion object. It can also be a single >= 0 number and in this case, equal probabilities are applied to all the classes (use 1 for relative frequencies and 100 for relative frequencies in percent). If the value has zero length or is NULL, original prior probabilities (from the test set) are used. If the vector is named, names must correspond to existing class names in the confusion object.

Value

prior() returns the current class frequencies associated with the first classification tabulated in the confusion object, i.e., for rows in the confusion matrix.

See also

Examples

data("Glass", package = "mlbench")
# Use a little bit more informative labels for Type
Glass$Type <- as.factor(paste("Glass", Glass$Type))
# Use learning vector quantization to classify the glass types
# (using default parameters)
summary(glass_lvq <- ml_lvq(Type ~ ., data = Glass))
#> Codebook:
#>       Class       RI       Na         Mg        Al       Si           K
#> 39  Glass 1 1.521277 14.36424  4.0239650 0.2303602 72.02336  0.05438544
#> 32  Glass 1 1.519196 12.65659  4.8674990 0.4665660 73.85181 -0.70557100
#> 42  Glass 1 1.517072 13.05397  3.5624747 1.1790037 73.04297  0.36584618
#> 62  Glass 1 1.519485 13.65144  3.9084643 1.1892587 72.02706  0.17018101
#> 17  Glass 1 1.517766 12.73950  3.5976780 1.2351181 73.12186  0.58917126
#> 53  Glass 1 1.512427 13.32208  3.9524905 0.4489736 73.89021 -0.41723689
#> 7   Glass 1 1.517945 13.24449  3.6731433 1.0806355 73.10465  0.51905537
#> 65  Glass 1 1.521897 13.29258  3.9273652 0.7334249 72.17823  0.12908872
#> 99  Glass 2 1.517391 12.49545  3.4441722 1.0178870 73.56065  0.20786576
#> 112 Glass 2 1.527390 11.02000  0.0000000 0.7500000 73.08000  0.00000000
#> 90  Glass 2 1.516678 12.63550  4.0566619 1.7910617 73.35970 -0.21845344
#> 123 Glass 2 1.517596 13.34163  3.8999283 1.3457780 72.48911  0.51160215
#> 146 Glass 2 1.518161 12.94614  3.8968416 1.1444819 72.40279  0.61233719
#> 74  Glass 2 1.514592 14.31787  3.7701862 1.4697558 72.65758  0.01233319
#> 140 Glass 2 1.516436 13.01225  3.5639963 1.5689297 73.06020  0.44685888
#> 111 Glass 2 1.529245 12.18383  0.0000000 1.3056921 71.14729  0.23481414
#> 147 Glass 3 1.516647 13.38319  4.2593376 0.9504240 72.69514  0.12440480
#> 153 Glass 3 1.518183 13.64508  3.6166110 0.6917897 72.88964  0.09196932
#> 168 Glass 5 1.520283 11.71126  0.8515020 2.1059832 73.20731  0.56499288
#> 178 Glass 6 1.517372 14.38524  2.2219272 1.2808706 73.32370 -0.14151907
#> 203 Glass 7 1.515268 14.47779 -0.1009342 2.7167221 73.54135 -0.24604345
#> 209 Glass 7 1.514531 14.25746 -0.6027840 3.6910000 73.68328 -1.82460000
#> 214 Glass 7 1.518406 14.80140  0.1955234 1.8482661 72.96822 -0.15294447
#>            Ca           Ba          Fe
#> 39   9.201240 -0.082865256 -0.02213713
#> 32   8.766666  0.000000000 -0.27342900
#> 42   8.617063 -0.016637285  0.07246322
#> 62   8.772069  0.177102475 -0.04529924
#> 17   8.595615 -0.001611879  0.04024410
#> 53   9.165447 -0.272307692 -0.14355668
#> 7    8.361845 -0.077109478 -0.01304638
#> 65   9.642040  0.000000000  0.08271237
#> 99   8.930357  0.015530409  0.07132736
#> 112 14.960000  0.000000000  0.00000000
#> 90   8.192072  0.000000000  0.03076215
#> 123  8.218452 -0.104701969  0.06487314
#> 146  8.644896  0.000000000  0.27817524
#> 74   8.354815 -0.925046496  0.09880781
#> 140  8.164653 -0.044621318 -0.01078383
#> 111 14.260043  0.681301667  0.15537564
#> 147  8.560674  0.000000000  0.00000000
#> 153  8.895123  0.051116040  0.08178566
#> 168 11.387505  0.012140868  0.04744348
#> 178  9.101822 -0.229954212 -0.05084183
#> 203  8.726922  0.840061733  0.02440280
#> 209  9.261000  1.530912000  0.00000000
#> 214  8.307199  1.956745508  0.02198064

# Calculate cross-validated confusion matrix
(glass_conf <- confusion(cvpredict(glass_lvq), Glass$Type))
#> 214 items classified with 141 true positives (error rate = 34.1%)
#>             Predicted
#> Actual        01  02  03  04  05  06 (sum) (FNR%)
#>   01 Glass 3   0  11   6   0   0   0    17    100
#>   02 Glass 1   0  58  12   0   0   0    70     17
#>   03 Glass 2   0  21  51   3   0   1    76     33
#>   04 Glass 5   0   0   4   7   0   2    13     46
#>   05 Glass 6   0   2   1   0   2   4     9     78
#>   06 Glass 7   0   2   3   1   0  23    29     21
#>   (sum)        0  94  77  11   2  30   214     34

# When the probabilities in each class do not match the proportions in the
# training set, all these calculations are useless. Having an idea of
# the real proportions (so-called, priors), one should first reweight the
# confusion matrix before calculating statistics, for instance:
prior1 <- c(10, 10, 10, 100, 100, 100) # Glass types 1-3 are rare
prior(glass_conf) <- prior1
glass_conf
#> 214 items classified with 141 true positives (error rate = 34.1%)
#> with initial row frequencies:
#> Glass 1 Glass 2 Glass 3 Glass 5 Glass 6 Glass 7 
#>      70      76      17      13       9      29 
#> Rescaled to:
#>             Predicted
#> Actual        01  02  03  04  05  06 (sum) (FNR%)
#>   01 Glass 2   7   0   0   0   3   0    10     33
#>   02 Glass 5  31  54   0  15   0   0   100     46
#>   03 Glass 6  11   0  22  44  22   0   100     78
#>   04 Glass 7  10   3   0  79   7   0   100     21
#>   05 Glass 1   2   0   0   0   8   0    10     17
#>   06 Glass 3   4   0   0   0   6   0    10    100
#>   (sum)       64  58  22 139  47   0   330     48
summary(glass_conf, type = c("Fscore", "Recall", "Precision"))
#> 214 items classified with 141 true positives (error = 34.1%)
#> 
#> Global statistics on reweighted data:
#> Error rate: 48.4%, F(micro-average): 0.486, F(macro-average): 0.364
#> 
#>            Fscore    Recall Precision
#> Glass 5 0.6829404 0.5384615 0.9333842
#> Glass 7 0.6629332 0.7931034 0.5694678
#> Glass 6 0.3636364 0.2222222 1.0000000
#> Glass 1 0.2925838 0.8285714 0.1776593
#> Glass 2 0.1809270 0.6710526 0.1045589
#> Glass 3 0.0000000 0.0000000       NaN

# This is very different than if glass types 1-3 are abundants!
prior2 <- c(100, 100, 100, 10, 10, 10) # Glass types 1-3 are abundants
prior(glass_conf) <- prior2
glass_conf
#> 214 items classified with 141 true positives (error rate = 34.1%)
#> with initial row frequencies:
#> Glass 1 Glass 2 Glass 3 Glass 5 Glass 6 Glass 7 
#>      70      76      17      13       9      29 
#> Rescaled to:
#>             Predicted
#> Actual        01  02  03  04  05  06 (sum) (FNR%)
#>   01 Glass 2  67  28   0   4   0   1   100     33
#>   02 Glass 1  17  83   0   0   0   0   100     17
#>   03 Glass 3  35  65   0   0   0   0   100    100
#>   04 Glass 5   3   0   0   5   0   2    10     46
#>   05 Glass 6   1   2   0   0   2   4    10     78
#>   06 Glass 7   1   1   0   0   0   8    10     21
#>   (sum)      125 178   0  10   2  15   330     50
summary(glass_conf, type = c("Fscore", "Recall", "Precision"))
#> 214 items classified with 141 true positives (error = 34.1%)
#> 
#> Global statistics on reweighted data:
#> Error rate: 49.8%, F(micro-average): 0.511, F(macro-average): 0.455
#> 
#>            Fscore    Recall Precision
#> Glass 7 0.6287055 0.7931034 0.5207600
#> Glass 2 0.5971155 0.6710526 0.5378543
#> Glass 1 0.5958663 0.8285714 0.4652113
#> Glass 5 0.5473057 0.5384615 0.5564452
#> Glass 6 0.3636364 0.2222222 1.0000000
#> Glass 3 0.0000000 0.0000000       NaN

# Weight can also be used to construct a matrix of relative frequencies
# In this case, all rows sum to one
prior(glass_conf) <- 1
print(glass_conf, digits = 2)
#> 214 items classified with 141 true positives (error rate = 34.1%)
#> with initial row frequencies:
#> Glass 1 Glass 2 Glass 3 Glass 5 Glass 6 Glass 7 
#>      70      76      17      13       9      29 
#> Rescaled to:
#>             Predicted
#> Actual           01     02     03     04     05     06  (sum) (FNR%)
#>   01 Glass 6   0.22   0.44   0.22   0.00   0.11   0.00   1.00  78.00
#>   02 Glass 7   0.00   0.79   0.07   0.00   0.10   0.03   1.00  21.00
#>   03 Glass 1   0.00   0.00   0.83   0.00   0.17   0.00   1.00  17.00
#>   04 Glass 3   0.00   0.00   0.65   0.00   0.35   0.00   1.00 100.00
#>   05 Glass 2   0.00   0.01   0.28   0.00   0.67   0.04   1.00  33.00
#>   06 Glass 5   0.00   0.15   0.00   0.00   0.31   0.54   1.00  46.00
#>   (sum)        0.22   1.40   2.04   0.00   1.72   0.61   6.00  49.00
# However, it is easier to work with relative frequencies in percent
# and one gets a more compact presentation
prior(glass_conf) <- 100
glass_conf
#> 214 items classified with 141 true positives (error rate = 34.1%)
#> with initial row frequencies:
#> Glass 1 Glass 2 Glass 3 Glass 5 Glass 6 Glass 7 
#>      70      76      17      13       9      29 
#> Rescaled to:
#>             Predicted
#> Actual        01  02  03  04  05  06 (sum) (FNR%)
#>   01 Glass 6  22  44  22   0  11   0   100     78
#>   02 Glass 7   0  79   7   0  10   3   100     21
#>   03 Glass 1   0   0  83   0  17   0   100     17
#>   04 Glass 3   0   0  65   0  35   0   100    100
#>   05 Glass 2   0   1  28   0  67   4   100     33
#>   06 Glass 5   0  15   0   0  31  54   100     46
#>   (sum)       22 140 204   0 172  61   600     49

# To reset row class frequencies to original propotions, just assign NULL
prior(glass_conf) <- NULL
glass_conf
#> 214 items classified with 141 true positives (error rate = 34.1%)
#>             Predicted
#> Actual        01  02  03  04  05  06 (sum) (FNR%)
#>   01 Glass 3   0  11   6   0   0   0    17    100
#>   02 Glass 1   0  58  12   0   0   0    70     17
#>   03 Glass 2   0  21  51   3   0   1    76     33
#>   04 Glass 5   0   0   4   7   0   2    13     46
#>   05 Glass 6   0   2   1   0   2   4     9     78
#>   06 Glass 7   0   2   3   1   0  23    29     21
#>   (sum)        0  94  77  11   2  30   214     34
prior(glass_conf)
#> Glass 1 Glass 2 Glass 3 Glass 5 Glass 6 Glass 7 
#>      70      76      17      13       9      29