Issue Processing Events on our Events API
Incident Report for PagerDuty


On May 3 2017 at 20:01 UTC, PagerDuty suffered a service degradation affecting our Events API; this incident lasted for three hours. Customers would have experienced difficulties sending events to the PagerDuty Events API. We apologize to any customers who were affected by the outage.

What Happened?

At 19:12 UTC, PagerDuty began maintenance on one of the Cassandra-based services responsible for processing events from the Events API. During this maintenance, the Cassandra cluster became unstable while engineers increased the capacity of the overall system. PagerDuty engineers were immediately alerted to the issue and worked to bring the cluster back into a stable state. At 22:10 UTC, the cluster was stable and the API was able to process events.

What Are We Doing About This?

To avoid future issues like this one, we have put additional checks into place around how we scale our Cassandra cluster. We sincerely apologize if this degradation negatively impacted your team's usage of PagerDuty. If you have questions or concerns please contact us at

Posted about 1 year ago. May 16, 2017 - 23:10 UTC

We are now processing events and notifications normally. All systems are functional.
Posted over 1 year ago. May 03, 2017 - 22:10 UTC
We believe we have resolved the root cause of the issue with event ingestion and are working on processing the current backlog of events. We are monitoring the situation closely until we are fully recovered.
Posted over 1 year ago. May 03, 2017 - 21:46 UTC
We have identified the issue with event processing on our Events API and are currently working on a resolution.
Posted over 1 year ago. May 03, 2017 - 21:06 UTC
We are currently experiencing an issue processing events on our Events API. We are actively investigating the issue.
Posted over 1 year ago. May 03, 2017 - 20:33 UTC