AWS re:Invent 2022 -- Building modern apps: Architecting for observability & resilience (ARC217-L)

2022年12月08日




Observability & Resilience
-----------------------------
AWS's Well-Architected Framework



Resilience in the cloud
- Anticipating
- Monitoring
- Responding
- Learning

Categories of failure


Recovery-oriented patterns
- Backoff and retry
- Circuit breaker
- Graceful degradation
- Throttling
- Load shedding

AWS investments in resilience


The Amazon Route 53 Application Recovery Controller. It enables you to control your application recovery across multiple AWS Regions, availability zones and on-prem.

Zonal Shift feature for Amazon Route53 ARC - New
Speeds recovery for multi-AZ applications


[18:15 - 18:22] Capital One


Implementing essential metrics
In order to detect, investigate, and respond to impact
- Customer experience metrics
- Impact assessment metrics
- Operational health metrics


Monitoring in the cloud
Three pillars of observability tooling
- Metrics
- Logs
- Traces



[34:15 - 43:02] Finra


Reliability design principles
- Automatically recover from failure
- Test recovery procedures
- Scale horizontally to increase aggregate workload availability
- Stop guessing capacity
- Manage change in automation


Operational excellence principles
- Perform operations as code
- Make frequent, small, reversable changes
- Refine operations procedures frequently
- Anticipate failure
- Learn from all operational failures


Game days
Regularly test your people, processes, and systems
- Exercise your procedures
- Ensure no impact to users
- Simulate exceptional event


Infrastructure event management
- Planning for large-scale events
- Framework to ensure alignment
- Planning, review, and risk mitigation
- Post-event recap
AWS use this process for events like Amazon Prime day.


AWS Well-Architectured Framework
https://aws.amazon.com/cn/architecture/well-architected/?wa-lens-whitepapers.sort-by=item.additionalFields.sortDate&wa-lens-whitepapers.sort-order=desc&wa-guidance-whitepapers.sort-by=item.additionalFields.sortDate&wa-guidance-whitepapers.sort-order=desc

AWS Fault Isolation Boundaries Whitepaper
https://docs.aws.amazon.com/whitepapers/latest/aws-fault-isolation-boundaries/abstract-and-introduction.html

AWS Solutions Library
https://aws.amazon.com/cn/solutions/resilience/





Category: AWS Tags: public

Upvote


Downvote