Skip to main content

System Metrics

There are 5 broad categories of system metrics

TypeOtel Receivers usedEnabled inDescription
Cluster Info MetricsK8sClusterReceiverOpentelemetry Collector DeploymentThe Kubernetes Cluster receiver collects cluster-level metrics from the Kubernetes API server. E.g CPU/RAM requested by a container
KubeStats MetricsKubeletStatsReceiverOpentelemetry Collector DaemonsetThe Kubelet Stats Receiver pulls node, pod, container, and volume metrics from the API server on a kubelet. E.g % of CPU/RAM Utilisation by a container
API Metricsotlp receiverOpentelemetry Collector DeploymentHTTP level metrics covering the http response time, the error instances that happened, health of the API. API pushes these metrics to the Opentelemetry collector deployment.
MongoDB MetricsMongoDBReceiverOpentelemetry Collector DeploymentSize of the database, active connections, size of collections, etc.
PostgreSQL MetricsPostgreSQLReceiverOpentelemetry Collector DeploymentSize of the database, active connections, size of tables, number of rows.

Custom Configuration

  • Apart from the API, all of the other metrics that are being collected are done via an opentelemetry receiver
  • Each of which can be customized to enable/disable the list of metrics that it collects as well as the labels that come with those metrics.
    • Each of these receivers usually have a metadata.yaml which lists out the metrics and their description and whether it's enabled by default or not
    • E.g in case of k8sclusterreciever, this is the metadata.yaml
  • For each of the receivers listed above we've enabled some of the metrics and disabled a few which weren't that relevant

List of Metrics

1. Cluster Info Metrics

Metric NameDescription
k8s.container.cpu_requestResource requested for the container. Details
k8s.container.cpu_limitMaximum resource limit set for the container. Details
k8s.container.memory_requestResource requested for the container. Details
k8s.container.memory_limitMaximum resource limit set for the container. Details
k8s.container.storage_requestResource requested for the container. Details
k8s.container.storage_limitMaximum resource limit set for the container. Details
k8s.container.restartsHow many times the container has restarted in the recent past. This value can go indefinitely high and be reset to 0 depending on kubelet configuration. Rather than the exact value, consider it as either == 0 (no recent restarts) or > 0.
k8s.container.readyWhether a container has passed its readiness probe (0 for no, 1 for yes)
k8s.pod.phaseCurrent phase of the pod (1 - Pending, 2 - Running, 3 - Succeeded, 4 - Failed, 5 - Unknown)
k8s.pod.status_reasonCurrent status reason of the pod (1 - Evicted, 2 - NodeAffinity, 3 - NodeLost, 4 - Shutdown, 5 - UnexpectedAdmissionError, 6 - Unknown)

2. KubeStats Metrics

Metric NameDescription
k8s.node.cpu.utilizationNode CPU utilization
k8s.node.cpu.timeTotal cumulative CPU time (sum of all cores) spent by the container/pod/node since its creation
k8s.node.memory.availableNode memory available
k8s.node.memory.usageNode memory usage
k8s.node.memory.working_setNode memory working_set
k8s.node.filesystem.availableNode filesystem available
k8s.node.filesystem.capacityNode filesystem capacity
k8s.node.filesystem.usageNode filesystem usage
k8s.pod.cpu.utilizationPod CPU utilization
k8s.pod.cpu.timeTotal cumulative CPU time (sum of all cores) spent by the container/pod/node since its creation
k8s.pod.memory.availablePod memory available
k8s.pod.memory.usagePod memory usage
k8s.pod.cpu_limit_utilizationPod CPU utilization as a ratio of the pod's total container limits, emitted only if all limits are set
k8s.pod.cpu_request_utilizationPod CPU utilization as a ratio of the pod's total container requests, emitted only if all requests are set
k8s.pod.memory_limit_utilizationPod memory utilization as a ratio of the pod's total container limits, emitted only if all limits are set
k8s.pod.memory_request_utilizationPod memory utilization as a ratio of the pod's total container requests, emitted only if all requests are set
k8s.pod.memory.working_setPod memory working_set
k8s.pod.filesystem.availablePod filesystem available
k8s.pod.filesystem.capacityPod filesystem capacity
k8s.pod.filesystem.usagePod filesystem usage
container.cpu.utilizationContainer CPU utilization
container.cpu.timeTotal cumulative CPU time (sum of all cores) spent by the container/pod/node since its creation
container.memory.availableContainer memory available
container.memory.usageContainer memory usage
container.memory.working_setContainer memory working_set
container.filesystem.availableContainer filesystem available
container.filesystem.capacityContainer filesystem capacity
container.filesystem.usageContainer filesystem usage
k8s.container.cpu_limit_utilizationContainer CPU utilization as a ratio of the container's limits
k8s.container.cpu_request_utilizationContainer CPU utilization as a ratio of the container's requests
k8s.container.memory_limit_utilizationContainer memory utilization as a ratio of the container's limits
k8s.container.memory_request_utilizationContainer memory utilization as a ratio of the container's requests
k8s.volume.availableThe number of available bytes in the volume
k8s.volume.capacityThe total capacity in bytes of the volume
k8s.volume.inodesThe total inodes in the filesystem
k8s.volume.inodes.freeThe free inodes in the filesystem
k8s.volume.inodes.usedThe inodes used by the filesystem. This may not equal inodes - free because filesystem may share inodes with other filesystems

3. API

Metric NameDescription
http_client_request_error_count_totalTotal number of client error requests
http_server_duration_milliseconds_bucketCumulative counters for the observation buckets (The duration of the inbound HTTP request)
http_server_duration_milliseconds_countCount of events that have been observed for the histogram metric (The duration of the inbound HTTP request)
http_server_duration_milliseconds_sumTotal sum of all observed values for the histogram metric (The duration of the inbound HTTP request)
http_server_request_count_requests_totalTotal number of HTTP requests
http_server_request_size_bytes_bucketCumulative counters for the observation buckets (Size of incoming bytes)
http_server_request_size_bytes_countCount of events that have been observed for the histogram metric (Size of incoming bytes)
http_server_request_size_bytes_sumTotal sum of all observed values for the histogram metric (Size of incoming bytes)
http_server_response_count_responses_totalTotal number of HTTP responses
http_server_response_error_count_totalTotal number of all response errors
http_server_response_size_bytes_bucketCumulative counters for the observation buckets (Size of outgoing bytes)
http_server_response_size_bytes_countCount of events that have been observed for the histogram metric (Size of outgoing bytes)
http_server_response_size_bytes_sumTotal sum of all observed values for the histogram metric (Size of outgoing bytes)
http_server_response_success_count_responses_totalTotal number of all successful responses
httpcheck_duration_millisecondsMeasures the duration of the HTTP check.
httpcheck_status1 if the check resulted in status_code matching the status_class, otherwise 0.

MongoDB Metrics

Metric NameDescription
mongodb.collection.countThe number of collections.
mongodb.data.sizeThe size of the collection. Data compression does not affect this value.
mongodb.connection.countThe number of connections.
mongodb.memory.usageThe amount of memory used.
mongodb.object.countThe number of objects.
mongodb.database.countThe number of existing databases.
mongodb.healthThe health status of the server. A value of '1' indicates healthy. A value of '0' indicates unhealthy.
mongodb.session.countThe total number of active sessions.

5. PostgreSQL Metrics

Metric NameDescription
postgresql.commitsThe number of commits.
postgresql.database.countNumber of user databases.
postgresql.db_sizeThe database disk usage.
postgresql.connection.maxConfigured maximum number of client connections allowed
postgresql.rowsThe number of rows in the database.
postgresql.index.scansThe number of index scans on a table.
postgresql.index.sizeThe size of the index on disk.
postgresql.operationsThe number of db row operations.
postgresql.table.countNumber of user tables in a database.
postgresql.table.sizeDisk space used by a table.