My checklist for when a new service is ready for production.
I know, nothing new, no discovery here. But it happens that even across one team different engineers will put different weight to different items or have different lists. So here’s mine.
I won’t mention testing here, which belongs more to development phase.
- monitoring (work metrics, resource metrics, events, alerts, dashboards, service health checks, end to end checks)
- run books covering at least basic failures
- logs shipped to central log, no writing to local disk
- if for some reason log can’t be shipped to central log – log rotation, yes for Docker too
- scaling runbooks or even better – scaling automation
- performance testing scenario
- backup scenario
- chaos monkey scenarios
What’s yours? What would you add?