Service Interruption 10/21 (2)
Incident Report for Pardot
Postmortem

Summary
The Pardot application and website experienced intermittent disruptions in service on Friday, October 21 from approximately 11:10am UTC to 13:20pm UTC and again from 15:50pm UTC to 20:40pm UTC. The first disruption impacted the East Coast of the United States, while the second disruption impacted a broader, global audience.

Root Cause
Websites rely on DNS (Domain Name System) to translate domain names to IP addresses and route web traffic to the appropriate location on the Internet. The root cause of the incident was an intermittent service disruption with Pardot’s DNS service provider, Dyn, as the result of a large-scale distributed denial of service attack (DDoS). During the incident, users may have experienced issues loading http://www.pardot.com/, https://pi.pardot.com/, and Pardot-hosted assets such as forms, files, and landing pages. All impacted Pardot services were fully restored once the underlying DNS disruption was resolved.

Timeline
On Friday 10/21 at 11:15am UTC, the Pardot Site Reliability team noticed issues resolving https://pi.pardot.com and engaged Salesforce’s Global Site Reliability team to investigate the cause. During the investigation process, we discovered that our DNS vendor, Dyn, had been experiencing a service disruption affecting some of its public nameservers as of 11:10am UTC. At 13:20pm UTC, we learned that Dyn had mitigated the disruption and was continuing to monitor the situation. All impacted Pardot services were also restored at this time. At 15:50pm UTC, Dyn reported that the disruption had returned, this time impacting a broader audience. At 20:50pm UTC, Dyn resolved the underlying DNS service disruption, and we confirmed that Pardot functionality had recovered while continuing to monitor. At 22:30pm UTC, the “all clear” was given.

Remedy & Future Prevention
Our team is taking several steps to mitigate the impact should a similar DNS disruption occur in the future. First, we plan to improve the redundancy of our DNS infrastructure by leveraging a second provider as a failover in the event of an emergency. We will also migrate from Dyn public nameservers to Salesforce private label Dyn nameservers, which were not impacted during this disruption. Lastly, the Pardot team will continue to investigate and implement further improvements to DNS stability as Salesforce continues to improve our DNS practices.

We appreciate your patience during this interruption. Thank you for your continued trust in us.

Zach Bailey, VP of Pardot Engineering

Posted almost 2 years ago. Oct 27, 2016 - 11:44 EDT

Resolved
Application performance has returned to normal. A full postmortem will be posted as soon as next steps have been identified.
Posted almost 2 years ago. Oct 21, 2016 - 18:33 EDT
Monitoring
Affected services have returned to normal as of approximately 8:30p.m. UTC. We're continuing to monitor.
Posted almost 2 years ago. Oct 21, 2016 - 17:52 EDT
Identified
We're continuing to track this incident closely with our DNS vendor. More information can be found at https://www.dynstatus.com/.
Posted almost 2 years ago. Oct 21, 2016 - 15:26 EDT
Investigating
As of approximately 4:15p.m. UTC, we're investigating an extension of this morning's issues with pardot.com and pi.pardot.com. Users may have trouble loading pi.pardot.com, pardot.com, and Pardot-hosted assets. We hope to have another update to share shortly.
Posted almost 2 years ago. Oct 21, 2016 - 12:26 EDT