May 21, 2024

This article has a concept I've never heard about: invisible downtime. This is the idea that there are problems in your application that the customer sees. Your servers are running, but the application doesn't work correctly or is pausing with a delay that impacts customers. From an IT perspective, the SLA is being met and there aren't any problems. From a customer viewpoint, they're ready to start looking at a competitor's offering.

Lots of developers and operations people know there are issues in our systems. We know networks go down or connectivity to some service is delayed. We also know the database gets slow, or at least, slower than we'd like. We know there are poor-performing code and under-sized hardware, running with storage that doesn't produce as many IOPs as our workload demands. We would also like time to fix these issues, but often we aren't given any resources.

