The Scenario
Imagine you’re on call late at night, and suddenly your application becomes unresponsive. The monitoring tools are showing a significant spike in resource usage, and after digging into the logs, you identify that a particular database query is being executed repeatedly. After digging through the logs, you find the culprit: a single database query, running over and over again like a bad horror movie on repeat. Whether it’s due to a bug, a poorly written request, or just bad luck, this one query is hogging all the CPU and memory, bringing your entire system to its knees. And the worst part? Every time it runs, the app crashes again, creating an endless cycle of outages.
Mitigating the Impact
The quickest way to remedy this situation is to prevent the query from ever reaching your application. One approach could be to configure your ingress controller or API gateway to detect this specific query and return a 404 error or a redirect response instead of allowing it to pass through to the backend. This can be achieved by setting up a rule that matches the query’s characteristics and intercepts it before it causes damage. Additionally, you could deploy rate limiting or circuit breakers that detect when a query starts causing problems and automatically mitigate the risk.
Understanding the Pattern & Preventing Future Incidents
The “Query of Death” is a common pattern where a specific query or API call causes an application to crash, lock up, or consume excessive resources. This pattern often occurs when a query is poorly optimized, contains a logical flaw, or is repeatedly executed in an unintended way.
The primary solution in these scenarios is to block or redirect the offending query before it reaches the application. This can be accomplished through various means, such as configuring the API gateway, ingress controller, or firewall to detect and handle the query differently. By preventing the query from executing, you stabilize the application and can then investigate and resolve the root cause without the pressure of an ongoing outage.
Stay tuned!
Is your application ready to handle sudden spikes in load without crashing? In our next episode, we’ll discuss managing high resource demands to keep your system running smoothly through the night—and how to do better in the future.
More Info & Contact
Expired certificates or credentials can lead to unexpected outages. However, these issues can be prevented through automated renewal, monitoring, and regular testing. We’ll show you how – contact us at hello@qualityminds.de or call us at +49 911 660732011!
This blog post is part of our multi-part series, where we describe common software outages and help you resolve them quickly. You can find all other posts under Foreword: Navigating the Storms of Software Outages.
Schreib uns eine Mail – wir freuen uns auf deine Nachricht! hello@qualityminds.de oder auf LinkedIn