Coursera - Machine Learning in Production - Week 2 - Section 2 - Error analysis and performance auditing
2025年01月03日
Speech recognition example
Useful metrics for each tag
Prioritizing what to work on
Prioritizing what to work on
Decide on most important categories to work on based on:
Confusion matrix: Precision and Recall
What happens with print("0")?
Combining precision and recall - F1 score
Multi-class metrics
multi-class classification problems
Classes: Scratch, Dent, Pit mark, Discoloration
Auditing framework
Check for accuracy, fairness/bias, and other problems.
1. Brainstorm the ways the system might go wrong.
3. Get business/product owner buy-in.
Speech recognition example
1. Brainstorm the ways the system might go wrong.
week 2
Week 2: Modeling Challenges and Strategies
Section 2: Error analysis and performance auditing
1. Error analysis example
Speech recognition example
Useful metrics for each tag
- What fraction of errors has that tag?
- Of all data with that tag, what fraction is misclassified?
- What fraction of all the data has that tag?
- How much room for improvement is there on data with that tag?
One example that you've already seen for how to do this analysis is to measure human level performance on data with that tag.
2. Prioritizing what to work on
Prioritizing what to work on
Type | Accuracy | Human level performance |
Gap to HLP | % of data |
|
Clean Speech | 94% | 95% | 1% | 60% | 0.6% |
Car Noise | 89% | 93% | 4% | 4% | 0.16% |
People Noise | 87% | 89% | 2% | 30% | 0.6% |
Low Bandwidth | 70% | 70% | 0% | 6% | ~0% |
Prioritizing what to work on
Decide on most important categories to work on based on:
- How much room for improvement there is.
- How frequently that category appears.
- How easy is to improve accuracy in that category.
- How important it is to improve in that category.
3. Skewed datasets
Confusion matrix: Precision and Recall
What happens with print("0")?
Combining precision and recall - F1 score
Precision (P) | Recall (R) | F1 | |
Model 1 | 88.3 | 79.1 | 83.4% |
Model 2 | 97.0 | 7.3 | 13.6% |
Multi-class metrics
multi-class classification problems
Classes: Scratch, Dent, Pit mark, Discoloration
Defect Type | Precision | Recall | F1 |
Scratch | 82.1% | 99.2% | 89.8% |
Dent | 92.1% | 99.5% | 95.7% |
Pit mark | 85.3% | 98.7% | 91.5% |
Discoloration | 72.1% | 97% | 82.7% |
4. Performance auditing
Auditing framework
Check for accuracy, fairness/bias, and other problems.
1. Brainstorm the ways the system might go wrong.
- Performance on subsets of data (e.g., ethnicity, gender).
- How common are certain errors (e.g., FP, FN).
- Performance on rare classes.
3. Get business/product owner buy-in.
Speech recognition example
1. Brainstorm the ways the system might go wrong.
- Accuracy on different genders and ethnicities.
- Accuracy on different devices.
- Prevalence of rude mis-transcriptions.
- Mean accuracy for different genders and major accents.
- Mean accuracy on different devices.
- Check for prevalence of offensive words in the output.
week 2