Major service degradation, issues logging in
Incident Report for MURAL
Postmortem

What happened:

At 10:31 Pacific Time on April 29, 2021, we executed a routine data maintenance procedure, related to bulk deletion of deactivated user accounts. This procedure involved a high number of concurrent operations, which generated unexpected high load on our main database servers, causing the primary to become unresponsive while processing said operations. This resulted in the MURAL application being unavailable for all users.

Details and corrective actions:

At 11:06 Pacific Time we initiated a manual failover to a new primary database server. This was completed at 11:17 Pacific Time and we were able to restore full service, without any loss of data.

What we’ve done to avoid this happening again:

We have temporarily restricted the procedure that triggered this situation, while we review and upgrade how it works. This is aimed also at improving our handling of massive bulk data operations to prevent them from impacting service availability.

MURAL administrators can continue to disable users on an individual or pick-list basis, via the admin interface.

We apologize for any inconvenience caused by this incident and thank you for your understanding.

Posted May 05, 2021 - 19:26 GMT-03:00

Resolved
This issue has been resolved and full access should be restored. We will be posting a post-mortem on this issue on status.mural.co in the coming days. Thank you for your patience and we do apologize again for this inconvenience.
Posted Apr 29, 2021 - 16:19 GMT-03:00
Monitoring
We have sent out a fix for this issue and will continue to monitor the situation. We will update here when we have received confirmation that the issue has been resolved.
Posted Apr 29, 2021 - 15:21 GMT-03:00
Identified
We have identified the root cause of this issue and are actively working on deploying a fix. We will update here with any news and sincerely apologize for any issues this may be causing.
Posted Apr 29, 2021 - 14:54 GMT-03:00
Investigating
We are currently experiencing a major service degradation. Users are experiencing latency and may be logged out of the application and will not be able to log in. We are investigating this issue and will update here with any news. We sincerely apologize for any issues this may be causing.
Posted Apr 29, 2021 - 14:43 GMT-03:00
This incident affected: Authentication and Realtime collaboration.