MURAL Authentication Error

Incident Report for Mural

Postmortem

What happened:

At 08:20 UTC on May 18, 2021, we experienced a spike in CPU usage on our primary database servers. This was caused by an unexpectedly-large number of concurrent requests, which resulted in the MURAL application being unresponsive and users no longer being able to log in.

‌

Details and corrective actions:

At 09:01 UTC we initiated a manual fail-over to a new primary database server. This was completed at 09:05 UTC, at which time full service was restored with no data loss.

‌

What we’ve done to avoid this happening again:

Over the weekend of May 22-23 we performed a series of maintenance operations to significantly improve our database performance. We are also improving our automated internal alerts to identify potential issues earlier. This will help us to react faster in the event of a similar occurrence in future.

Posted May 26, 2021 - 20:14 GMT-03:00

Resolved

The issue with logging in and interacting with the MURAL application has been resolved, and full service has been resolved.

We apologize for the inconvenience this may have caused, and will be conducting a full review to avoid a repeat in the future.

Posted May 18, 2021 - 09:16 GMT-03:00

Monitoring

Our developers have just sent out a fix and we continue to monitor the issue.

Posted May 18, 2021 - 06:12 GMT-03:00

Investigating

We are currently investigating this issue.

Posted May 18, 2021 - 05:42 GMT-03:00

This incident affected: Website.