Users are unable to sign In to MURAL

Incident Report for Mural

Postmortem

What happened?
At 06:15 UTC on August 9, 2021, we experienced a spike in CPU usage on our primary database servers. A very large number of simultaneous operations in a single workspace, specifically joining a workspace and moving content between workspaces, generated write conflicts and resulted in the primary database server locking up. Users that were logged in at the time had their sessions terminated and no new login requests could be processed.

‌Details and corrective actions
We identified the cause of the incident and initiated a fail-over to a new primary database server at 06:44 UTC. This was completed at 07:35 UTC, at which point full service was restored. We immediately started investigating the root cause of the write conflicts and optimizing the workflows for joining workspaces to prevent this from impacting system availability again.

‌Summary
The outage resulted in 1 hour and 20 minutes of downtime. No data from prior to the outage was lost during this time.

‌What we’ve done to avoid this happening again
As an immediate action we implemented an optimization to the workflow for joining the impacted workspace. We are working towards applying this update to all workspaces in an upcoming release.

Posted Sep 02, 2021 - 19:21 GMT-03:00

Resolved

This incident has been resolved.

Posted Aug 09, 2021 - 05:48 GMT-03:00

Monitoring

A fix has been implemented and we are monitoring the results.

Posted Aug 09, 2021 - 04:48 GMT-03:00

Identified

The issue has been identified and a fix is being implemented.

Posted Aug 09, 2021 - 03:55 GMT-03:00

Investigating

Users are currently unable to sign in to MURAL. We know this is a major service disruption for everyone. We're investigating the issue and will restore regular service ASAP.

Please check our status page for the most up-to-date info 👉 status.mural.co/

Posted Aug 09, 2021 - 03:35 GMT-03:00

This incident affected: Mural Application (Authentication, Realtime collaboration).