Incident Summary
Cloudi-fi identified (February 15th at 11:30 UTC+1) an incident - authentication on captive portals (for Guests) and authentication on the Admin platform (for Administrators) were no longer available 75 minutes before the fix.
Status
Resolved
Incident time window
Start: February 15th, 2023, at 11:30 UTC+1
- Problem definition
- Operation to identify the root cause
- Workaround and development of the patch
End: February 15th, 2023, at 12:45 UTC+1
Root cause
There is an overload of our API—an unusual volume of API calls coming from multi-team tenants. As a reminder, our APIs are used for processing our workflows, such as Guest registrations or Administration authentications.
Impact
Thus, these API calls produced a few queries that took unexpected duration to run, hogging database resources and preventing it from processing any other queries (such as Guests registration or administration authentication).
Actions plan
Action 1 - Monitoring - Done
Monitoring detected an overload on the database without producing a critical alert.
Change the alert thresholds to raise a critical alert
Action 2 - Patch - In progress
Optimize queries performed by our Admin API on multi-teams enabled tenants to reduce database stress.
Action 3 - WAF - In progress
Activate WAF to reduce unwanted traffic ( bot, monitoring, etc.)