As we continued to monitor the load after the prior incident, we noticed that there was more load than we wanted in the JFK region. Around 16:22 UTC we changed the weighting of the emergency capacity to absorb more load. Due to a bug in our monitoring system, this marked other regions as having capacity that was not available. This in turn caused some requests, a small percentage, in those impacted regions to route to missing nodes - which resulted in timeouts for the clients. By 16:46 UTC we had fully rolled back the bad configuration, and fully restored service.