Coursera - Machine Learning in Production - Week 2 - Section 2 - Error analysis and performance auditing

2025年01月03日


Week 2: Modeling Challenges and Strategies


Section 2: Error analysis and performance auditing


1. Error analysis example


Speech recognition example



Useful metrics for each tag

  • What fraction of errors has that tag?
  • Of all data with that tag, what fraction is misclassified?
  • What fraction of all the data has that tag?
  • How much room for improvement is there on data with that tag?
One example that you've already seen for how to do this analysis is to measure human level performance on data with that tag.​

2. Prioritizing what to work on


Prioritizing what to work on

Type Accuracy Human level
performance
Gap to HLP % of
data
 
Clean Speech 94% 95% 1% 60% 0.6%
Car Noise 89% 93% 4% 4% 0.16%
People Noise 87% 89% 2% 30% 0.6%
Low Bandwidth 70% 70% 0% 6% ~0%

Prioritizing what to work on

Decide on most important categories to work on based on:
  • How much room for improvement there is.
  • How frequently that category appears.
  • How easy is to improve accuracy in that category.
  • How important it is to improve in that category.

3. Skewed datasets


Confusion matrix: Precision and Recall



What happens with print("0")?



Combining precision and recall - F1 score

  Precision (P) Recall (R) F1
Model 1 88.3 79.1 83.4%
Model 2 97.0 7.3 13.6%



Multi-class metrics

multi-class classification problems​

Classes: Scratch, Dent, Pit mark, Discoloration
Defect Type Precision Recall F1
Scratch 82.1% 99.2% 89.8%
Dent 92.1% 99.5% 95.7%
Pit mark 85.3% 98.7% 91.5%
Discoloration 72.1% 97% 82.7%

4. Performance auditing


Auditing framework

Check for accuracy, fairness/bias, and other problems.
1. Brainstorm the ways the system might go wrong.
  • Performance on subsets of data (e.g., ethnicity, gender).
  • How common are certain errors (e.g., FP, FN).
  • Performance on rare classes.
2. Establish metrics to assess performance against these issues on appropriate slices of data.
3. Get business/product owner buy-in.

Speech recognition example

1. Brainstorm the ways the system might go wrong.
  • Accuracy on different genders and ethnicities.
  • Accuracy on different devices.
  • Prevalence of rude mis-transcriptions.
2. Establish metrics to assess performance against these issues on appropriate slices of data.
  • Mean accuracy for different genders and major accents.
  • Mean accuracy on different devices.
  • Check for prevalence of offensive words in the output.


week 2


Category: AI Tags: public

Upvote


Downvote