This content is likely not relevant anymore. Try searching or browse recent questions.
Need help understanding ROC and AUC 0 Recommended Answers 0 Replies 1 Upvote
In the Classification lecture, there's a section that I need help understanding:
> Classification: ROC Curve and AUC

1. >To compute the points in an ROC curve, we could evaluate a logistic regression model many times with different classification thresholds, but this would be inefficient. Fortunately, there's an efficient, sorting-based algorithm that can provide this information for us, called AUC.

This appears to say that you don't need the points on the curve, but rather you can just take the area under the curve instead. You can't do this if you don't have a curve in the first place. You can obtain a curve either by a magic logistic calculation out of nowhere or by a function that approximates the points that were measured. No mention was made of a magic function that creates this curve, which leaves approximating the curve through many points, points that are only obtained by evaluating the model many times with different classification thresholds.

Is this statement trying to say that you don't need the curve in order to find the area under said curve? That doesn't make sense. What are they trying to say?

2. Figure 6 is described like this:

> AUC provides an aggregate measure of performance across all possible classification thresholds. One way of interpreting AUC is as the probability that the model ranks a random positive example more highly than a random negative example. For example, given the following examples, which are arranged from left to right in ascending order of logistic regression predictions:

The section on Logistic Regression does not explain this concept of ranking examples.

What is this "ranking"?

3. > AUC is desirable for the following two reasons:
>AUC is scale-invariant. It measures how well predictions are ranked, rather than their absolute values.

This statement is making a distinction between prediction "ranking" and prediction "absolute value". Predictions are probabilities, which are clamped to the range 0-1. Their absolute value is equivalent to their non-absolute value. If "ranking" is simply "probability of positive", then a prediction's ranking (probability on the range 0-1) and it's absolute value (probability on the range 0-1) are the exact same values, but that would cause the statement to contradict itself, saying, "It measures probability rather than probability."

> AUC is classification-threshold-invariant. It measures the quality of the model's predictions irrespective of what classification threshold is chosen.

What is "quality"?
And how does it measure it?

There's a followup section that also needs explaining:
> Classification: Check Your Understanding (ROC and AUC)

1. > How would multiplying all of the predictions from a given model by 2.0 (for example, if the model predicts 0.4, we multiply by 2.0 to get a prediction of 0.8) change the model's performance as measured by AUC?

Problem: All predictions are probabilities that are clamped to the range 0-1. If you multiply all predictions by 2, then all predictions that were once >0.5 are now clamped to 1, and all predictions that were once <= 0.5 fill the space that was left. Effectively the latter half of both axes has been chopped off, leaving just the shape of the first half to be stretched into the range 0-1. The very shape of the curve has been altered. The question is asking how the model's performance would change; I can't answer that because the shape of the curve has been altered.

Please explain what the question is getting at.
Details
No replies yet.
This question is locked and replying has been disabled.
10 characters required
You will lose what you have written so far.
Personal information found

We found the following personal information in your message:

This information will be visible to anyone who visits or subscribes to notifications for this post. Are you sure you want to continue?

Delete post?