New CDN connectivity issues

Incident Report for Netlify

Postmortem

As we continued to monitor the load after the prior incident, we noticed that there was more load than we wanted in the JFK region. Around 16:22 UTC we changed the weighting of the emergency capacity to absorb more load. Due to a bug in our monitoring system, this marked other regions as having capacity that was not available. This in turn caused some requests, a small percentage, in those impacted regions to route to missing nodes - which resulted in timeouts for the clients. By 16:46 UTC we had fully rolled back the bad configuration, and fully restored service.

Posted Jan 16, 2020 - 22:04 UTC

Resolved

Our fix was successful, and we are continuing to monitor, but do not expect any recurrence.

Posted Jan 07, 2020 - 17:15 UTC

Monitoring

We have applied a fix for the regions that were affected and are continuing to monitor the situation.

Posted Jan 07, 2020 - 17:00 UTC

Identified

We are experiencing new issues with our CDN connectivity; some sites are failing to load in some regions on our non-enterprise CDN. The cause has been identified and the team is working to resolve.

Posted Jan 07, 2020 - 16:46 UTC

This incident affected: Standard Edge Network.