On 12/16, between 11:10a.m. and 12:25p.m. EST, a subset of Pardot customers experienced a disruption in a portion of our background job processing that schedules new jobs. This disruption resulted in delayed processing for jobs scheduled to begin during this timeframe, including email sending, imports, exports, and CRM syncing.
We determined the root cause to be a spike in job process metadata, which in turn led to low memory errors that caused the system to stop scheduling jobs. While we have metrics and alerting in place to proactively detect similar issues, in investigating this incident we noticed a misconfigured monitoring metric that would have brought the problem to light sooner.
As part of our follow-up to prevent similar issues in the future, we will be taking two steps. First, we will be making improvements to our handling of job metadata to prevent memory issues that result when a spike occurs. Second, we will be correcting the alerting, providing better and more visible notifications should this class of error occur again in the future.
We appreciate your patience and continued trust as we worked to resolve this situation.
Zach Bailey, Sr. Director of Engineering