Kallidus Status

Essential reporting maintenance

» View Event Details | Created Fri, 19 Jul 2019 07:30:00 +0000

Resolved This maintenance was successfully completed by 8am as anticipated, all schedules have been released as planned and normal service has resumed. Thank you for your patience whilst we carried this out.
Posted: Fri, 19 Jul 2019 09:05:00 +0000

Issue with Reporting on Classic LMS and Learn

» View Event Details | Created Mon, 15 Jul 2019 07:22:00 +0000

Resolved Summary of impact: Between 11:17 (BST) on 14 July and 09:18 (BST) on 15 July 2019, reporting services for the Kallidus LMS, Learn and Perform products were unavailable. Preliminary root cause: Kallidus engineers have determined that this was caused by a critical issue with one of the system databases on the server providing customer reports databases. Engineers continue to investigate the trigger and will update this RCA once this has been established. Mitigation: Upon becoming aware of the issue, engineers recycled the database services on the affected server and normal service resumed. Next steps: Kallidus will review the alerting surrounding both failed reports and notifications sent to its on-call engineers when critical database issues occur.
Posted: Tue, 16 Jul 2019 10:40:00 +0000

Degraded Service on Kallidus Reporting

» View Event Details | Created Fri, 12 Jul 2019 09:55:00 +0000

Post-Mortem Summary of impact: Between 10:59 (BST) and 11:06 (BST) on 11 July 2019, reporting services for the Kallidus LMS, Learn and Perform products were unavailable. Preliminary root cause: Kallidus engineers have determined that this was due to a CPU resource bottleneck caused by a Business Objects process on one of it's two core Business Objects servers. Mitigation: Upon becoming aware of the issue, engineers restarted the affected service and normal service resumed. Next steps: Kallidus have engaged SAP (the owners of Business Objects) and made some configuration changes to the affected component upon their recommendation. Engineers continue to work with SAP to identify the root cause and will update this RCA once this has been established.
Posted: Wed, 17 Jul 2019 12:16:00 +0000

Kallidus Suite Availability

» View Event Details | Created Wed, 10 Jul 2019 09:16:00 +0000

Resolved Between 09:16 and 09:27 BST on 10 July 2019, some of the incoming traffic to Kallidus suite was incorrectly routed to an inactive, standby device. Any customers attempting to connect to the Kallidus application suite will have returned an error. Engineers have determined that this was caused by the restart of one of the devices controlling inbound access to the Kallidus cloud infrastructure. Although the device was operating in standby mode at the time, an undesired configuration change to a health service meant that traffic was unsuccessfully routed to it. Upon becoming aware of the disruption, engineers manually configured the problem service. In the future, the configuration scripts for the impacted devices will be modified ensure the successful startup of health services before it can participate in the routing of network traffic.
Posted: Wed, 10 Jul 2019 09:16:00 +0000

Major Incident for Classic LMS, Learn and Perform

» View Event Details | Created Thu, 04 Jul 2019 07:22:00 +0000

Post-Mortem Between 08:00 and 12:51 BST on 05/07/2019 we identified an issue accessing all Kallidus Perform, Learn and Classic LMS services. The issue was caused by an underlying problem with our hosting provider, Microsoft Azure. They experienced an outage within their storage infrastructure across data centre regions and was not limited to Kallidus. This caused various services that we provide to be unavailable. Under these circumstances, we were reliant on Microsoft to restore their service outage. Microsoft Azure has provided a preliminary root cause, which is subject to change as they continue to investigate. If there are any major changes we will send out an update: Azure - Summary of impact: Between 06:00 UTC and 16:25 UTC on 04 July 2019, a subset of customers leveraging Storage in UK South may have experienced service availability issues. In addition, resources with dependencies on Storage, may also have experienced downstream impact in the form of availability issues. Azure - Preliminary root cause: Engineers identified high levels of resource utilization on a single storage scale unit. As a result, services dependent on the storage scale unit experienced a high number of failures and latency which manifested in availability issues. Azure - Mitigation: Engineers manually applied load balancing configuration changes to bring the affected storage scale unit back to a healthy state. As a consequence, resource utilization levels were brought back to normal mitigating the issue. Azure - Next steps: Engineers will continue to investigate to establish the full root cause and prevent future occurrences.
Posted: Fri, 05 Jul 2019 16:23:00 +0000

Degraded Service on Kallidus Reporting

» View Event Details | Created Mon, 01 Jul 2019 10:11:00 +0000

Resolved All queued reports schedules have now caught up and normal service has resumed.
Posted: Mon, 01 Jul 2019 11:34:00 +0000

Kallidus Status

We share details of service availability and performance for Kallidus products

Event History

July 2019

Essential reporting maintenance

Issue with Reporting on Classic LMS and Learn

Degraded Service on Kallidus Reporting

Kallidus Suite Availability

Major Incident for Classic LMS, Learn and Perform

Degraded Service on Kallidus Reporting