Coursera - Machine Learning in Production - Week 1 - Section 3 - Deployment
2025年01月03日
Software engineering issues
Checklist of questions
Common deployment cases
1. New product/capability
2. Automate/assist with manual task
3. Replace previous ML system
Key ideas:
shadow mode deployment
ML system shadows the human and runs in parallel.
ML system's output not used for any decisions during this phase.
Canary deployment
Blue green deployment
Old: Blue version
New: Green version
software metric
input metric
output metric
Examples of metrics to track
Software metrics: Memory, compute, latency, throughput, server load
Input metrics:
Model maintenance
Week 1: Overview of the ML Lifecycle and Deployment
If you wish to dive more deeply into the topics covered this week, feel free to check out these optional references. You won’t have to read these to complete this week’s practice quizzes.
Concept and Data Drift
Monitoring ML Models
A Chat with Andrew on MLOps: From Model-centric to Data-centric
Papers
Konstantinos, Katsiapis, Karmarkar, A., Altay, A., Zaks, A., Polyzotis, N., … Li, Z. (2020). Towards ML Engineering: A brief history of TensorFlow Extended (TFX). http://arxiv.org/abs/2010.02013
Paleyes, A., Urma, R.-G., & Lawrence, N. D. (2020). Challenges in deploying machine learning: A survey of case studies. http://arxiv.org/abs/2011.09926
Sculley, D., Holt, G., Golovin, D., Davydov, E., & Phillips, T. (n.d.). Hidden technical debt in machine learning systems. Retrieved April 28, 2021, from Nips.c https://papers.nips.cc/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf
Week 1: Overview of the ML Lifecycle and Deployment
Section 3: Deployment
1. Key challenges
Software engineering issues
Checklist of questions
- Realtime or Batch
- Cloud vs. Edge/Browser
- Compute resources (CPU/GPU/memory)
- Latency, throughput (QPS)
- Logging
- Security and privacy
2. Deployment patterns
Common deployment cases
1. New product/capability
2. Automate/assist with manual task
3. Replace previous ML system
Key ideas:
- Gradual ramp up with monitoring
- Rollback
shadow mode deployment
ML system shadows the human and runs in parallel.
ML system's output not used for any decisions during this phase.
Canary deployment
- Roll out to small fraction (say 5%) of traffic initially.
- Monitor system and ramp up traffic gradually.
Blue green deployment
Old: Blue version
New: Green version
3. Monitoring
software metric
input metric
output metric
Examples of metrics to track
Software metrics: Memory, compute, latency, throughput, server load
Input metrics:
- Avg input length
- Avg input volume
- Num missing values
- Avg image brightness
- # times return "" (null)
- # times user redoes search
- # times user switches to typing
- CTR
Model maintenance
- Manual retraining
- Automatic retraining
4. Pipeline monitoring
5. Week 1 Optional References
Week 1: Overview of the ML Lifecycle and Deployment
If you wish to dive more deeply into the topics covered this week, feel free to check out these optional references. You won’t have to read these to complete this week’s practice quizzes.
Concept and Data Drift
Monitoring ML Models
A Chat with Andrew on MLOps: From Model-centric to Data-centric
Papers
Konstantinos, Katsiapis, Karmarkar, A., Altay, A., Zaks, A., Polyzotis, N., … Li, Z. (2020). Towards ML Engineering: A brief history of TensorFlow Extended (TFX). http://arxiv.org/abs/2010.02013
Paleyes, A., Urma, R.-G., & Lawrence, N. D. (2020). Challenges in deploying machine learning: A survey of case studies. http://arxiv.org/abs/2011.09926
Sculley, D., Holt, G., Golovin, D., Davydov, E., & Phillips, T. (n.d.). Hidden technical debt in machine learning systems. Retrieved April 28, 2021, from Nips.c https://papers.nips.cc/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf