Overview

Metrics provide a way of monitoring and understanding behavior in aggregate. We've got 2 broad categories of metrics

System Metrics
Application Metrics

System Metrics

They help in determining the health of the system E.gs

CPU, RAM utilisation of a component
Health metric of components (API, DB)
API response time, errors, etc

Application Metrics

These metrics are targetted to capture the data around what's happening at the application level

How do we capture these metrics?

We use opentelemetry collector as an intermediatory layer to collect these metrics. This allows us to support the capability of having the customers to ship the metrics to their own backends too
There are 3 different opentelemetry collectors deployed in the kubernetes cluster

Deployment Mode	Metric Level	Purpose
DaemonSet	System Metrics	Deployed on each node to capture system metrics across all the nodes
Deployment	System Metrics	Capturing cluster level system metrics
Deployment	Application Metrics	Captures the metrics sent by the application to deliver it to the appropriate backend

Where do we store these metrics?

If you've opted for Metrics Storage from DynamoAI, you're shipped with a Prometheus as a part of the DynamoAI package
We send all the system metrics collected to the Prometheus instance deployed on the kubernetes cluster

Visualization

The visualization package comes with some predefined dashboards that contains graphs on top of these metrics for insights. Check Visualization for details.

Alerts

The Alerts package comes with an alerting tool with some predefined alerts on top of these metrics Check Alerts for details

System Metrics​

Application Metrics​

How do we capture these metrics?​

Where do we store these metrics?​