We attempted to double our CDN capacity so that the network could naturally absorb more requests and attacks without impacting customers. A bug in our configuration management system resulted in a bad configuration being pushed into the monitoring system. This caused us to remove most of our nodes on the regular ADN from the DNS record that serves that ADN. This was reverted and service was restored by 16:10UTC.
Since these incidents, we have revamped the runbooks and deployment/configuration scripts to be more detailed in the mitigation procedures and added detailed validation to the monitoring system's configuration. We have also doubled the ADN capacity to be better prepared for future high load situations.