At 06:40 UTC we started to receive notifications from Mural users that were unable to access their content in our European datacenter. This was caused by a configuration change that resulted in connections to our EU database being rejected.
Once the cause of the issue was identified, an emergency fix was created, tested and deployed. Normal service was restored at 07:17 UTC. The impact of the emergency fix was monitored for an hour, after which the incident was declared as ‘resolved’.
Murals in our EU datacenter were unavailable from 06:40 to 07:17 UTC, for a total of 37 minutes of service interruption.
What we've done to avoid this happening again
We are improving our deployment processes and monitoring to include anomalies in the volume of realtime connections.