This article outlines how our alerting mechanism functions from inception to resolution. For a project to be provisioned to the Live environment it must be able to generate alerts if the project is not running in its normal state.
Alerts can have one of three statuses
- OK - the alert has passed and the service is operating normally
- Warning - the alert has passed a warning threshold, the service is operating normally but has unusual activity
- Critical - the alert has not passed, the service may have reduced functionality or is not operating at all
Alerts can have one or more handlers. A handler is used to funnel the alerts to the correct place. Currently the handlers are used to divide alerts into the necessary PagerDuty service. Current handlers are:
- TADC - alerts for Talis Aspire Digitised Content
- TN - alerts for Talis primitives and the Talis Network
- TARL - alerts for Talis Aspire Reading Lists
PagerDuty3 is used to deliver alerts to the rota. The media in which to receive the alert is left to the user, this can be via SMS, Email, and Android/iPhone push notifications. Alerts are also delivered to the applicable Slack communication channel.
Alerts should self clear when the issue is resolved. Manual resolving of alerts should only happen in special circumstances.