At 14:19 UTC, April 22nd a part of our realtime collaboration service started to process requests a lot slower than usual, and a few minutes later our monitoring systems reported that part of our realtime collaboration service as unavailable.
Our realtime service was experiencing high processing times due to increased latency in a backing service we use for synchronizing a portion of the realtime collaboration events among users on the same murals. The rest of the servers were behaving properly and most users were able to collaborate normally.
The cause for these increased response times for said pub-sub service was an unplanned server patching performed by our Cloud Provider outside of our requested maintenance window.Some API servers were affected with increased load for some requests that relied on said service.
We immediately triggered a rotation for those affected servers. As some servers were rotated our API became slow while the backing pub-sub service continued to have high latency.
Once we noticed that our cloud provider had performed unannounced patching of one of our pub-sub servers, we rotated it as well. This process takes a while but once it completed, dependent services started showing expected latency once again.
We understand your frustration if this caused any inconvenience. We are deeply sorry for this.