Web/Mobile/APIs Functional - Some Notification Delays
Incident Report for PagerDuty
Postmortem

Summary

On March 1st, 2017, starting at 16:59 UTC, PagerDuty suffered a service degradation lasting 22 minutes. During this time, approximately 0.043% of notifications were delayed, with no alerts being lost.

What Happened?

An update to one of the services powering our event pipeline was released. This release contained a bug which was not detected during our testing. This bug caused errors in the application and prevented events from being processed correctly. As soon as the issue was discovered, the change was reverted, and event processing immediately started to recover.

What Are We Doing About This?

We will be increasing our test coverage and improving the testing stages of our deployment pipeline in order to detect these types of failures before they reach our production systems. Additionally, we will be reassessing our rollback strategy in order to mitigate long recovery times when reverting a recently deployment becomes necessary.

We apologize to all of our customers for any inconvenience that this delay may have caused. If there are any questions, comments, or concerns regarding this issue, please reach out to us at support@pagerduty.com.

Posted over 1 year ago. Mar 17, 2017 - 20:12 UTC

Resolved
We have resolved the issue with notification delays and are fully recovered.
Posted over 1 year ago. Mar 01, 2017 - 17:28 UTC
Investigating
Currently our web and mobile apps, as well as APIs are fully functional. We are experiencing a delay in some notifications.
Posted over 1 year ago. Mar 01, 2017 - 17:18 UTC