Coursera - Machine Learning in Production - Week 2 - Section 4

Coursera - Machine Learning in Production - Week 2 - Section 4 - Modeling challenges

2025年01月06日

-
-
n = 100

Actual YES Actual NO

Predicted YES TRUE POSITIVES

FALSE POSITIVES

Predicted NO FALSE NEGATIVES

TRUE NEGATIVE

FP + TN = 98
TP + FP = 100 => FN = 0 & TN = 0
Recall = TP/(TP+FN) = TP / (TP + 0) = 100%
-

Question 1
You are working on a binary classification ML algorithm that detects whether a patient has a specific disease. In your dataset, 98% of the training examples (patients) don't have the disease, so the dataset is very skewed. Accuracy on both positive and negative classes is important. You read a research paper claiming to have developed a system that achieves 95% on ____ metric. What metric would give you the most confidence they've built a useful and non-trivial system? (Select one)

Accuracy
Precision
Recall
F1 score

Question 2
On the previous problem above with 98% negative examples, if your algorithm is print("1") (i.e., it says everyone has the disease). Which of these statements is true?

The algorithm achieves 0% recall.
The algorithm achieves 0% precision.
The algorithm achieves 100% recall.
The algorithm achieves 100% precision.

Question 3
True or False? During error analysis, each example should only be assigned one tag. For example, in a speech recognition application you may have the tags: "car noise", "people noise" and "low bandwidth". If you encounter an example with both car noise and low bandwidth audio, you should use your judgement to assign just one of these two tags rather than apply both tags.

False
True

Question 4
You are building a visual inspection system. Error analysis finds:

Type of defect	Accuracy	HLP	% of data
Scratch	95%	98%	50%
Discoloration	90%	90%	50%

Based on this, what is the more promising type of defect to work on?

Discoloration, because the algorithm's accuracy is lower and thus there's more room for improvement.
Discoloration, because HLP is lower which suggests this is therefore the harder problem that thus needs more attention.
Scratch defects, because the gap to HLP is higher and thus there’s more room for improvement.
Work on both classes equally because they are each 50% of the data.

Question 5
You're considering applying data augmentation to a phone visual inspection problem. Which of the following statements are true about data augmentation? (Select all that apply)

Data augmentation should try to generate more examples in the parts of the input space where you'd like to see improvement in the algorithm's performance.
Data augmentation should distort the input sufficiently to make sure they are hard to classify by humans as well.
GANs can be used for data augmentation.
Data augmentation should try to generate more examples in the parts of the input space where the algorithm is already doing well and there's no need for improvement.

Category: AI Tags: AI public

	Actual YES	Actual NO
Predicted YES	TRUE POSITIVES	FALSE POSITIVES
Predicted NO	FALSE NEGATIVES	TRUE NEGATIVE

Sky Cone

Coursera - Machine Learning in Production - Week 2 - Section 4 - Modeling challenges