A confusion matrix is a technique for summarizing the performance of a classification algorithm. Classification accuracy alone can be misleading if you have an unequal number of observations in each class or if you have more than two classes in your data set. Calculating a confusion matrix can give you a better idea of what your classification model is getting right and what types of errors it is making.


Classification Accuracy and its Limitations:

Classification Accuracy = Correct Predictions/Total Predictions


The main problem with classification accuracy is that it hides the detail you need to better understand the performance of your classification model. Below are two examples:

 

1.  When you are data has more than 2 classes. With 3 or more classes you may get a classification accuracy of 80%, but you don’t know if that is because all classes are being predicted equally well or whether one or two classes are being neglected by the model.

 

2.  When your data does not have an even number of classes. You may achieve accuracy of 90% or more, but this is not a good score if 90 records for every 100 belong to one class and you can achieve this score by always predicting the most common class value.

 

Classification accuracy can hide the detail you need to diagnose the performance of your model. But thankfully we can tease apart this detail by using a confusion matrix.


Confusion Matrix Terminology:

A confusion matrix is a table that is often use to describe the performance of a classification model on a set of test data for which true values are known.

 

Let’s start with an example for a binary classifier:


N=165

Predicted no:

Predicted yes:

Actual no:

50

10

Actual yes:

5

100


What we can learn from Confusion Matrix?

 

There are two possible predicted classes: "yes" and "no". If we were predicting the presence of a disease, for example, "yes" would mean they have the disease, and "no" would mean they don't have the disease.

 

  • The classifier made a total of 165 predictions (e.g., 165 patients were being tested for the presence of that disease).
  • Out of those 165 cases, the classifier predicted "yes" 110 times, and "no" 55 times.
  • In reality, 105 patients in the sample have the disease, and 60 patients do not.


Let's now define the most basic terms, which are whole numbers (not rates):

 

True positives (TP): These are cases in which we predicted yes (they have the disease), and they do have the disease.

 

True negatives (TN): We predicted no, and they don't have the disease.

 

False positives (FP): We predicted yes, but they don't actually have the disease. (Also known as a "Type I error.")

 

False negatives (FN): We predicted no, but they actually do have the disease. (Also known as a "Type II error.")


 

N=165

Predicted No:

Predicted Yes:

 

Actual No:

TN=50

FP=10

60

Actual Yes:

FN=5

TP=100

105

 

55

110

 

 


This is a list of rates that are often computed from a confusion matrix for a binary classifier:


Accuracy: Overall, how often is the classifier correct?

1. (TP+TN)/total = (100+50)/165 = 0.91


Misclassification Rate: Overall, how often is it wrong?

1. (FP+FN)/total = (10+5)/165 = 0.09

2. Equivalent to 1 minus Accuracy

3. Also known as "Error Rate"

 

True Positive Rate: When it's actually yes, how often does it predict yes?

1. TP/actual yes = 100/105 = 0.95

2. Also known as "Sensitivity" or "Recall"

 

False Positive Rate: When it's actually no, how often does it predict yes?

1. FP/actual no = 10/60 = 0.17

 

Specificity: When it's actually no, how often does it predict no?

1. TN/actual no = 50/60 = 0.83

2. Equivalent to 1 minus False Positive Rate

 

Precision: When it predicts yes, how often is it correct?

1. TP/predicted yes = 100/110 = 0.91

 

Prevalence: How often does the yes condition actually occur in our sample?

1. Actual yes/total = 105/165 = 0.64