Major Incident for Classic LMS, Learn and Perform
»
View Event Details
| Created Thu, 04 Jul 2019 07:22:00 +0000
Post-Mortem
Between 08:00 and 12:51 BST on 05/07/2019 we identified an issue accessing all Kallidus Perform, Learn and Classic LMS services. The issue was caused by an underlying problem with our hosting provider, Microsoft Azure. They experienced an outage within their storage infrastructure across data centre regions and was not limited to Kallidus. This caused various services that we provide to be unavailable. Under these circumstances, we were reliant on Microsoft to restore their service outage.
Microsoft Azure has provided a preliminary root cause, which is subject to change as they continue to investigate. If there are any major changes we will send out an update:
Azure - Summary of impact: Between 06:00 UTC and 16:25 UTC on 04 July 2019, a subset of customers leveraging Storage in UK South may have experienced service availability issues. In addition, resources with dependencies on Storage, may also have experienced downstream impact in the form of availability issues.
Azure - Preliminary root cause: Engineers identified high levels of resource utilization on a single storage scale unit. As a result, services dependent on the storage scale unit experienced a high number of failures and latency which manifested in availability issues.
Azure - Mitigation: Engineers manually applied load balancing configuration changes to bring the affected storage scale unit back to a healthy state. As a consequence, resource utilization levels were brought back to normal mitigating the issue.
Azure - Next steps: Engineers will continue to investigate to establish the full root cause and prevent future occurrences.
Posted: Fri, 05 Jul 2019 16:23:00 +0000