Overview
Metrics provide a way of monitoring and understanding behavior in aggregate. We've got 2 broad categories of metrics
- System Metrics
- Application Metrics
System Metrics
They help in determining the health of the system E.gs
- CPU, RAM utilisation of a component
- Health metric of components (API, DB)
- API response time, errors, etc
Application Metrics
These metrics are targetted to capture the data around what's happening at the application level
How do we capture these metrics?
- We use opentelemetry collector as an intermediatory layer to collect these metrics. This allows us to support the capability of having the customers to ship the metrics to their own backends too
- There are 3 different opentelemetry collectors deployed in the kubernetes cluster
Deployment Mode | Metric Level | Purpose |
---|---|---|
DaemonSet | System Metrics | Deployed on each node to capture system metrics across all the nodes |
Deployment | System Metrics | Capturing cluster level system metrics |
Deployment | Application Metrics | Captures the metrics sent by the application to deliver it to the appropriate backend |
Where do we store these metrics?
- If you've opted for Metrics Storage from DynamoAI, you're shipped with a Prometheus as a part of the DynamoAI package
- We send all the system metrics collected to the Prometheus instance deployed on the kubernetes cluster
Visualization
The visualization package comes with some predefined dashboards that contains graphs on top of these metrics for insights. Check Visualization for details.
Alerts
The Alerts package comes with an alerting tool with some predefined alerts on top of these metrics Check Alerts for details