ROC

ROC Curve

ROC Curve is a graph showing the performance of a classification model at all classification thresholds (e.g for logistic regression the threshold is default 0.5, we can adjust this threshold to 0.8 and assign examples to classes using this threhold). This curve plots two parameters:

  • True Positive Rate (\(\text{sensitivity} = \frac{\text{True Positive}}{\text{True Positive + False Negative}}\))
  • False Positive Rate (\(1 - \text{specificity} = 1 -\frac{\text{True Negative}}{\text{True Negative + False Positive}} = \frac{\text{False Positive}}{\text{True Negative + False Positive}}\))

An ROC curve plots TRP vs FPR at different classification thresholds. Lowering the classification threshold classifies more items as positive (e.g threshold 0.1 for logistic regression), thus increasing both FP and TP.

To compute the points an ROC curve, we could evaluate a classification model many times with different threshold and repeat this for all thresholds, but this is inefficient. Fortunately, there's an effective algorithm that can provide this information called AUC.

AUC

AUC stands for Area under ROC curve. It measures the entire two-dimensional area underneath the entire ROC curve. Since \(TPR \in [0, 1]\) and \(FPR \in [0, 1]\), the AUC can be interpreted as probability. It is the probability that the model ranks a random positive example more highly than a random negative example. Thus, by rearranging the predictions from left to right, AUC is the probability that a random positive example is positioned to the right of a random negative example:

A model whose predictions are 100% wrong will have AUC 0, a model whose predictions are 100% correct has an AUC of 1.


Properties

AUC is desirable for the following two reasons:

  • Scale-invariant: It measures hwo well predictions are ranked, rather than their absolute values.
  • Classification-threshold-invariant: It measures the quality of the model's predictions irrespective of what classification threshold is chosen. (when we draw ROC, we are varying the threshold, the only thing matters to AUC is the predicted values (not class) and how true positive class samples ranked against negative class samples w.r.t their predicted values)

However, AUC is not desirable when:

  1. Sometimes probabilities matters, AUC does not tell you how good is a predicted probability.
  2. Sometimes classification threshold matters, In cases where there are wide disparities in the cost of false negatives vs. false positives, it may be critical to minimize one type of classification error. For example, when doing email spam detection, you likely want to prioritize minimizing false positives (even if that results in a significant increase of false negatives). AUC isn't a useful metric for this type of optimization.

Ref

https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc