On May 11, 2017, starting at 15:00 UTC, we suffered a degradation to the data pipeline service responsible for processing incident details. This caused a delay in the display of incident details for approximately thirty minutes. This was followed by an additional thirty minute period of severe degradation to the service, during which incident details were not visible.
A transient network issue caused our data pipeline cluster to enter an inconsistent state, exacerbated by a bug in third-party software that we use. This resulted in a delay in the processing of event details. In the initial response, restarting a subset of cluster control hosts resulted in further state inconsistency, which resulted in the elevated service degradation.
We have updated our incident response documentation concerning the data pipeline, to more effectively restore the cluster in the event that it enters an inconsistent state. We also plan to upgrade the cluster control software so that it is no longer affected by the aforementioned bug.
We would like to again apologize for any inconvenience this issue caused. If you have any questions, do not hesitate to contact us at email@example.com.