As we continued to monitor the load after the prior incident, we noticed that there was more load than we wanted in the JFK region. Around 16:22 UTC we changed the weighting of the emergency capacity to absorb more load. Due to a bug in our monitoring system, this marked other regions as having capacity that was not available. This in turn caused some requests, a small percentage, in those impacted regions to route to missing nodes - which resulted in timeouts for the clients. By 16:46 UTC we had fully rolled back the bad configuration, and fully restored service.
Posted Jan 16, 2020 - 22:04 UTC
Resolved
Our fix was successful, and we are continuing to monitor, but do not expect any recurrence.
Posted Jan 07, 2020 - 17:15 UTC
Monitoring
We have applied a fix for the regions that were affected and are continuing to monitor the situation.
Posted Jan 07, 2020 - 17:00 UTC
Identified
We are experiencing new issues with our CDN connectivity; some sites are failing to load in some regions on our non-enterprise CDN. The cause has been identified and the team is working to resolve.