Real Time collaboration Issues detected
Incident Report for Mural
Postmortem

The degradation on the real time collaboration service that occurred on Oct 30th was caused by a self-inflicted DDOS based on our websocket authentication protocol that would retry forever on a timeout. This was triggered by a high server side response time (caused by high load on our servers triggered by a slow deployment process) that made timeout of the authentication handshake to occur more often. We fixed the issue by improving the authentication protocol to allow for more flexibility in server side response time and by adding an "exponential backoff" strategy that would prevent a self-inflicted DDOS in the future.

Posted Nov 05, 2018 - 11:05 GMT-03:00

Resolved
This incident has been resolved.
Posted Oct 30, 2018 - 14:15 GMT-03:00
Monitoring
Issue was identified and fixed. We continue to monitor the behavior of the web application.
Posted Oct 30, 2018 - 14:14 GMT-03:00
Identified
Sockets authentication service is down. Users cannot edit murals. We are working on a fix.
Posted Oct 30, 2018 - 12:30 GMT-03:00
Investigating
We're investigating problems with real time collaboration in murals.
Posted Oct 30, 2018 - 12:22 GMT-03:00
This incident affected: Mural Application (Canvas).