Mural users are unable to log in

Incident Report for Mural

Postmortem

Summary:

On Saturday, November 11th at 03:00 UTC, Mural performed scheduled maintenance on our production clusters. Post-migration checks indicated all functions were performing as expected. On Monday, November 13th, some Mural customers reported difficulty logging into the Mural web application. Mural’s incident response team was immediately engaged in troubleshooting these reports.

Initial investigations revealed that the platform upgrade over the preceding weekend had incorrect settings for the DNS infrastructure and a key backend application's auto-scaling. This resulted in unstable connections for some users.

During the course of this investigation, we also discovered that load balancing improvements for clients with specific network and application configurations altered how the client’s IP address was interpreted by our system, preventing access for such clients.

Our incident response team addressed the auto-scaling configuration, resolving DNS-related issues and restored access for the majority of users. Next, a new load-balancing configuration underwent adjustments and testing to restore stable connections for the previously-impacted users.

The total time from when our incident response team started working on this incident, to deploying the final fix, was 9 hours 40 minutes.

What we’ve done to prevent this happening again:

As part of Mural’s post-incident procedure, our engineering teams conducted a thorough review to identify the root cause and outline necessary improvements. 8 separate changes have been identified and will be implemented in the coming weeks. These changes cover monitoring to detect this scenario sooner, enhanced post-migration checks to ensure this scenario and others are included in our use cases and reviewing our migration process to reduce the risks.

We apologize for any inconvenience this incident may have caused and sincerely thank your patience whilst we worked through this incident.

Posted Nov 16, 2023 - 09:54 GMT-03:00

Resolved

The correction we implemented earlier has been successful in resolving the issue and full service has been resolved.

Some users reported connectivity issues after the earlier correction. In all cases this has been solved by clearing browser cache and using the link app.mural.co/bye to clear any previous session data.

We apologize for the inconvenience this interruption caused. We will be conducting a full review will publish a root cause analysis in the coming days.

Posted Nov 13, 2023 - 19:00 GMT-03:00

Monitoring

The performance degradation issue has been addressed and service has returned to normal.

Users can resume logging in and using Mural as normal.

We'll continue to monitor the results of our corrections ensure service remains stable, and will publish a full root cause analysis in the coming days.

Posted Nov 13, 2023 - 13:58 GMT-03:00

Update

The issue with logging in to Mural has been largely resolved. Users that were unable to access Mural should be able to log in again.

There continues to be an intermittent performance degradation. Our team are investigating and we will continue to update our status page as this develops.

Stay up-to-date with the latest info via 👉 status.mural.co

Posted Nov 13, 2023 - 13:34 GMT-03:00

Update

We are continuing to work on a fix for this issue. We appreciate your patience while we resolve this.

Posted Nov 13, 2023 - 12:48 GMT-03:00

Update

We are continuing to work on a fix for this issue. We appreciate your patience while we resolve this.

Posted Nov 13, 2023 - 11:53 GMT-03:00

Update

We are continuing to work on a fix for this issue. We appreciate your patience while we resolve this.

Posted Nov 13, 2023 - 10:57 GMT-03:00

Update

We are continuing to work on a fix for this issue.

Posted Nov 13, 2023 - 10:06 GMT-03:00

Identified

The issue has been identified and we are working towards implementing the fix.

Posted Nov 13, 2023 - 09:58 GMT-03:00

Investigating

We're experiencing a service disruption that is preventing users from logging in to Mural. We're investigating the issue and will restore regular service as soon as possible.

Please check our status page for the most up-to-date info 👉 status.mural.co/

Posted Nov 13, 2023 - 09:01 GMT-03:00

This incident affected: Mural Application (Authentication).