Alerts

Destination

We support Slack channel by default but the alerts can be sent to any channel that AlertManager supports

For the following components some of the alerts are there for all of them

App: API, UI, MongoDB, PostgreSQL
Observability: Prometheus, Opentelemetry, Grafana, AlertManager, Kubernetes Events Exporter

Alert	Metric	Environment	Description
CPU Utilisation > 70%	k8s_pod_cpu_limit_utilization_ratio	dev, staging	Trigger an alert when CPU utilisation is greater than 70% of the CPU Limit allocated for that component
RAM Utilisation > 70%	k8s_pod_memory_limit_utilization_ratio	dev, staging	Trigger an alert when RAM utilisation is greater than 70% of the RAM Limit allocated for that component

Apart from the above common alerts, some of the components have alerts that are specific to the components

Alert	Metric	Environment	Description
API Health is 0	- Staging: httpcheck_status - Dev: app_api_health	dev, staging	Triggers an alert when API is down. - Pull based for Staging - Push based for Dev
Internal Server Error in API	http_server_duration_milliseconds_count	staging	- Triggers an alert whenever an API call results in an Internal Server Error [500 status code] - Also, gives info about what endpoint it occurred at for debugging purposes

Alert	Metric	Environment	Description
MongoDB Health is 0	mongodb_health_ratio	staging	Triggers an Alert whenever the MongoDB health metric reports value as 0