Skip to main content

Overview

Metrics provide a way of monitoring and understanding behavior in aggregate. We've got 2 broad categories of metrics

  • System Metrics
  • Application Metrics

System Metrics

They help in determining the health of the system E.gs

  • CPU, RAM utilisation of a component
  • Health metric of components (API, DB)
  • API response time, errors, etc

Application Metrics

These metrics are targetted to capture the data around what's happening at the application level

How do we capture these metrics?

  • We use opentelemetry collector as an intermediatory layer to collect these metrics. This allows us to support the capability of having the customers to ship the metrics to their own backends too
  • There are 3 different opentelemetry collectors deployed in the kubernetes cluster
Deployment ModeMetric LevelPurpose
DaemonSetSystem MetricsDeployed on each node to capture system metrics across all the nodes
DeploymentSystem MetricsCapturing cluster level system metrics
DeploymentApplication MetricsCaptures the metrics sent by the application to deliver it to the appropriate backend

Where do we store these metrics?

  • If you've opted for Metrics Storage from DynamoAI, you're shipped with a Prometheus as a part of the DynamoAI package
  • We send all the system metrics collected to the Prometheus instance deployed on the kubernetes cluster

Visualization

The visualization package comes with some predefined dashboards that contains graphs on top of these metrics for insights. Check Visualization for details.

Alerts

The Alerts package comes with an alerting tool with some predefined alerts on top of these metrics Check Alerts for details