Incident Summary
Cloudi-fi identified (August 25th 17:30 CET) an incident - captive portal and authentication on Admin platform (for Administrators) were not available.
Status
Resolved
Incident time window
Start: August 25th 2023, at 17:30 CET
- Problem definition
- Operation to identify the root cause
- Workaround and development of the patch
End: August 29th 2023, at 12:00 CET
Impact
Access to admin console and captive portal
Root cause
The issue was due to an overload on our encryption service used for communication between our servers (to secure traffic). This overload was introduced by a TCP connections increase. After analysis, this increase was due to a recent change (August-22nd) in our servers in charge to store guest sessions.
- The objective of this change was to improve the redundancy of our servers in charge to store guest sessions.
- The change was tested in our QA environment and no issue/impact were identified.
To quickly re-enable the service, we rollbacked this recent change. As a next step, we will
- Review the capacity of our encryption service
- Replay the change in QA and confirm that no new issue happens (before any new implementation)
Actions plan
Action 1 - August 25th 2023, at 18:30 CET
Restart of our NGINX and PHP servers -> captive portal and Admin console available again
Action 2 - August 26th 2023, at 10:56 CET
Issue appeared again - restart of our PHP FastCGI Process Manager (PHP-FPM) -> captive portal and Admin console available again
Action 3 - August 26th 2023, at 17:33 CET
Remove the DB persistency configuration on our servers - random issues still there the day after
Action 4 - August 27th 2023, at 15:54 CET
Disable the Redis cluster configuration -> Service dow again - August 28th 2023, at 08:04 CET
Action 5 - August 28th 2023, at 08:54 CET
Restart of our PHP FastCGI Process Manager (PHP-FPM) -> captive portal and Admin console available again
Action 6 - August 28th 2023, at 14:49 CET
Change the caching method -> Service dow again - August 28th 2023, at 17:06 CET
Action 7 - August 28th 2023, at 15:00 CET
Rollback of our recent change (implemented August-22nd) on our servers in charge to store guest sessions.
Action 8 - August 29th 2023, at 12:00 CET
Change the workflow to our servers (in charge of guest sessions storage) to prevent from non shared guest sessions between our nodes.