Get or set priors on a confusion matrix

Most metrics in supervised classifications are sensitive to the relative proportion of the items in the different classes. When a confusion matrix is calculated on a test set, it uses the proportions observed on that test set. If they are representative of the proportions in the population, metrics are not biased. When it is not the case, priors of a confusion object can be adjusted to better reflect proportions that are supposed to be observed in the different classes in order to get more accurate metrics.

prior(object, ...)

# S3 method for class 'confusion'
prior(object, ...)

prior(object, ...) <- value

# S3 method for class 'confusion'
prior(object, ...) <- value

Arguments

object: a confusion object (or another class if a method is implemented)
...: further arguments passed to methods
value: a (named) vector of positive numbers of zeros of the same length as the number of classes in the confusion object. It can also be a single >= 0 number and in this case, equal probabilities are applied to all the classes (use 1 for relative frequencies and 100 for relative frequencies in percent). If the value has zero length or is NULL, original prior probabilities (from the test set) are used. If the vector is named, names must correspond to existing class names in the confusion object.

Value

prior() returns the current class frequencies associated with the first classification tabulated in the confusion object, i.e., for rows in the confusion matrix.

Examples

data("Glass", package = "mlbench")
# Use a little bit more informative labels for Type
Glass$Type <- as.factor(paste("Glass", Glass$Type))
# Use learning vector quantization to classify the glass types
# (using default parameters)
summary(glass_lvq <- ml_lvq(Type ~ ., data = Glass))
#> Codebook:
#>       Class       RI       Na         Mg        Al       Si            K
#> 49  Glass 1 1.520758 13.37216  4.5416927 0.7631547 72.17919  0.159303804
#> 63  Glass 1 1.521077 13.90805  4.6740598 0.5483256 72.12317  0.119285093
#> 51  Glass 1 1.520361 14.73156  4.1689416 0.1584321 71.99793  0.027173676
#> 66  Glass 1 1.519342 13.71187  4.1003185 1.1208254 72.08823  0.113329211
#> 52  Glass 1 1.519219 13.20100  3.9256181 0.9442345 73.04407  0.746929851
#> 36  Glass 1 1.516538 13.24000  3.6298811 1.1325477 72.99334  0.592057385
#> 45  Glass 1 1.516971 12.79689  4.3727848 1.2170500 72.82608  0.340399815
#> 17  Glass 1 1.517348 12.83780  3.6612585 1.2263541 73.17099  0.603019790
#> 141 Glass 2 1.514787 13.81319  3.7019270 1.6123497 72.40667  0.243086166
#> 136 Glass 2 1.520752 12.76002  3.2688734 1.1880717 72.09919  0.472077901
#> 99  Glass 2 1.517281 12.55761  3.7911488 0.5960059 73.66494  0.451026662
#> 116 Glass 2 1.517685 13.41550  3.8356416 1.3682491 72.42857  0.551258305
#> 138 Glass 2 1.517084 12.82538  3.9638354 1.6122934 73.15281  0.038640934
#> 76  Glass 2 1.516247 12.95243  3.5336431 1.5983106 73.04709  0.553284938
#> 124 Glass 2 1.513257 14.33518  3.8557239 1.8130447 72.65403 -0.001633805
#> 78  Glass 2 1.516206 13.22447  4.0318192 1.4517749 72.90775  0.307331714
#> 153 Glass 3 1.517675 13.66381  3.9363801 0.6695214 72.88635  0.082255564
#> 162 Glass 3 1.518499 13.64279  4.1068876 0.4822603 72.79627  0.119380279
#> 167 Glass 5 1.512518 12.54293  0.5712833 3.0942316 71.50995  4.922979490
#> 177 Glass 6 1.515053 14.51803  3.3391799 1.3287286 73.66541 -0.293804212
#> 201 Glass 7 1.512963 14.91337 -1.4369472 4.2018864 73.21524 -0.060600000
#> 206 Glass 7 1.517392 15.14021  0.1910860 1.8310640 73.01365  0.116902959
#> 200 Glass 7 1.513645 14.89314 -0.1493134 3.2105685 73.46057 -0.194469285
#>           Ca           Ba            Fe
#> 49  8.919681  0.000000000  0.0488727860
#> 63  8.649568 -0.141349818  0.0787419809
#> 51  8.827306  0.000000000 -0.1022826875
#> 66  8.517304  0.228041229 -0.0380792069
#> 52  8.469913 -0.254111842  0.0824236980
#> 36  8.349374 -0.053418554 -0.0036757431
#> 45  8.343740 -0.270752467  0.2810535302
#> 17  8.371044  0.001600812  0.0678711633
#> 141 8.340808 -0.560498383  0.1227483112
#> 136 9.643770  0.266838197  0.2044662082
#> 99  8.604083 -0.003324812  0.2395773694
#> 116 8.209417 -0.056027415  0.0783642769
#> 138 8.179913  0.000000000 -0.0079714286
#> 76  8.104046 -0.029038900 -0.0077591978
#> 124 8.197459 -1.275885078  0.1006775148
#> 78  7.992000 -0.214689730  0.0018672759
#> 153 8.639933  0.034844423  0.0452203178
#> 162 8.371591  0.237519749  0.3800315990
#> 167 7.213652  0.000000000 -0.0470689843
#> 177 7.764608 -0.349732967 -0.1003109950
#> 201 6.993418  2.059003200  0.0000000000
#> 206 7.617267  2.028716251 -0.0003270727
#> 200 7.753164  1.045518117  0.0260563249

# Calculate cross-validated confusion matrix
(glass_conf <- confusion(cvpredict(glass_lvq), Glass$Type))
#> 214 items classified with 138 true positives (error rate = 35.5%)
#>             Predicted
#> Actual        01  02  03  04  05  06 (sum) (FNR%)
#>   01 Glass 3   2  13   2   0   0   0    17     88
#>   02 Glass 1   2  53  15   0   0   0    70     24
#>   03 Glass 2   0  18  54   3   1   0    76     29
#>   04 Glass 5   0   2   7   3   0   1    13     77
#>   05 Glass 6   1   0   2   0   4   2     9     56
#>   06 Glass 7   0   3   3   0   1  22    29     24
#>   (sum)        5  89  83   6   6  25   214     36

# When the probabilities in each class do not match the proportions in the
# training set, all these calculations are useless. Having an idea of
# the real proportions (so-called, priors), one should first reweight the
# confusion matrix before calculating statistics, for instance:
prior1 <- c(10, 10, 10, 100, 100, 100) # Glass types 1-3 are rare
prior(glass_conf) <- prior1
glass_conf
#> 214 items classified with 138 true positives (error rate = 35.5%)
#> with initial row frequencies:
#> Glass 1 Glass 2 Glass 3 Glass 5 Glass 6 Glass 7 
#>      70      76      17      13       9      29 
#> Rescaled to:
#>             Predicted
#> Actual        01  02  03  04  05  06 (sum) (FNR%)
#>   01 Glass 2   7   0   0   0   2   0    10     29
#>   02 Glass 5  54  23   0   8  15   0   100     77
#>   03 Glass 6  22   0  44  22   0  11   100     56
#>   04 Glass 7  10   0   3  76  10   0   100     24
#>   05 Glass 1   2   0   0   0   8   0    10     24
#>   06 Glass 3   1   0   0   0   8   1    10     88
#>   (sum)       97  23  48 106  43  13   330     52
summary(glass_conf, type = c("Fscore", "Recall", "Precision"))
#> 214 items classified with 138 true positives (error = 35.5%)
#> 
#> Global statistics on reweighted data:
#> Error rate: 51.7%, F(micro-average): 0.499, F(macro-average): 0.372
#> 
#>            Fscore    Recall  Precision
#> Glass 7 0.7373245 0.7586207 0.71719142
#> Glass 6 0.6005020 0.4444444 0.92545743
#> Glass 5 0.3738011 0.2307692 0.98318241
#> Glass 1 0.2840190 0.7571429 0.17479377
#> Glass 2 0.1330103 0.7105263 0.07337283
#> Glass 3 0.1042356 0.1176471 0.09356899

# This is very different than if glass types 1-3 are abundants!
prior2 <- c(100, 100, 100, 10, 10, 10) # Glass types 1-3 are abundants
prior(glass_conf) <- prior2
glass_conf
#> 214 items classified with 138 true positives (error rate = 35.5%)
#> with initial row frequencies:
#> Glass 1 Glass 2 Glass 3 Glass 5 Glass 6 Glass 7 
#>      70      76      17      13       9      29 
#> Rescaled to:
#>             Predicted
#> Actual        01  02  03  04  05  06 (sum) (FNR%)
#>   01 Glass 2  71  24   0   4   1   0   100     29
#>   02 Glass 1  21  76   3   0   0   0   100     24
#>   03 Glass 3  12  76  12   0   0   0   100     88
#>   04 Glass 5   5   2   0   2   0   1    10     77
#>   05 Glass 6   2   0   1   0   4   2    10     56
#>   06 Glass 7   1   1   0   0   0   8    10     24
#>   (sum)      113 178  16   6   6  11   330     48
summary(glass_conf, type = c("Fscore", "Recall", "Precision"))
#> 214 items classified with 138 true positives (error = 35.5%)
#> 
#> Global statistics on reweighted data:
#> Error rate: 47.6%, F(micro-average): 0.548, F(macro-average): 0.498
#> 
#>            Fscore    Recall Precision
#> Glass 7 0.7373245 0.7586207 0.7171914
#> Glass 2 0.6675143 0.7105263 0.6294125
#> Glass 6 0.5519314 0.4444444 0.7279934
#> Glass 1 0.5438424 0.7571429 0.4243075
#> Glass 5 0.2839352 0.2307692 0.3689320
#> Glass 3 0.2033078 0.1176471 0.7477745

# Weight can also be used to construct a matrix of relative frequencies
# In this case, all rows sum to one
prior(glass_conf) <- 1
print(glass_conf, digits = 2)
#> 214 items classified with 138 true positives (error rate = 35.5%)
#> with initial row frequencies:
#> Glass 1 Glass 2 Glass 3 Glass 5 Glass 6 Glass 7 
#>      70      76      17      13       9      29 
#> Rescaled to:
#>             Predicted
#> Actual          01    02    03    04    05    06 (sum) (FNR%)
#>   01 Glass 1  0.76  0.03  0.21  0.00  0.00  0.00  1.00  24.00
#>   02 Glass 3  0.76  0.12  0.12  0.00  0.00  0.00  1.00  88.00
#>   03 Glass 2  0.24  0.00  0.71  0.04  0.01  0.00  1.00  29.00
#>   04 Glass 5  0.15  0.00  0.54  0.23  0.00  0.08  1.00  77.00
#>   05 Glass 6  0.00  0.11  0.22  0.00  0.44  0.22  1.00  56.00
#>   06 Glass 7  0.10  0.00  0.10  0.00  0.03  0.76  1.00  24.00
#>   (sum)       2.02  0.26  1.91  0.27  0.49  1.06  6.00  50.00
# However, it is easier to work with relative frequencies in percent
# and one gets a more compact presentation
prior(glass_conf) <- 100
glass_conf
#> 214 items classified with 138 true positives (error rate = 35.5%)
#> with initial row frequencies:
#> Glass 1 Glass 2 Glass 3 Glass 5 Glass 6 Glass 7 
#>      70      76      17      13       9      29 
#> Rescaled to:
#>             Predicted
#> Actual        01  02  03  04  05  06 (sum) (FNR%)
#>   01 Glass 1  76   3  21   0   0   0   100     24
#>   02 Glass 3  76  12  12   0   0   0   100     88
#>   03 Glass 2  24   0  71   4   1   0   100     29
#>   04 Glass 5  15   0  54  23   0   8   100     77
#>   05 Glass 6   0  11  22   0  44  22   100     56
#>   06 Glass 7  10   0  10   0   3  76   100     24
#>   (sum)      202  26 191  27  49 106   600     50

# To reset row class frequencies to original propotions, just assign NULL
prior(glass_conf) <- NULL
glass_conf
#> 214 items classified with 138 true positives (error rate = 35.5%)
#>             Predicted
#> Actual        01  02  03  04  05  06 (sum) (FNR%)
#>   01 Glass 3   2  13   2   0   0   0    17     88
#>   02 Glass 1   2  53  15   0   0   0    70     24
#>   03 Glass 2   0  18  54   3   1   0    76     29
#>   04 Glass 5   0   2   7   3   0   1    13     77
#>   05 Glass 6   1   0   2   0   4   2     9     56
#>   06 Glass 7   0   3   3   0   1  22    29     24
#>   (sum)        5  89  83   6   6  25   214     36
prior(glass_conf)
#> Glass 1 Glass 2 Glass 3 Glass 5 Glass 6 Glass 7 
#>      70      76      17      13       9      29

Get or set priors on a confusion matrix

Arguments

Value

See also

Examples