System Metrics
There are 5 broad categories of system metrics
Type | Otel Receivers used | Enabled in | Description |
---|---|---|---|
Cluster Info Metrics | K8sClusterReceiver | Opentelemetry Collector Deployment | The Kubernetes Cluster receiver collects cluster-level metrics from the Kubernetes API server. E.g CPU/RAM requested by a container |
KubeStats Metrics | KubeletStatsReceiver | Opentelemetry Collector Daemonset | The Kubelet Stats Receiver pulls node, pod, container, and volume metrics from the API server on a kubelet. E.g % of CPU/RAM Utilisation by a container |
API Metrics | otlp receiver | Opentelemetry Collector Deployment | HTTP level metrics covering the http response time, the error instances that happened, health of the API. API pushes these metrics to the Opentelemetry collector deployment. |
MongoDB Metrics | MongoDBReceiver | Opentelemetry Collector Deployment | Size of the database, active connections, size of collections, etc. |
PostgreSQL Metrics | PostgreSQLReceiver | Opentelemetry Collector Deployment | Size of the database, active connections, size of tables, number of rows. |
Custom Configuration
- Apart from the API, all of the other metrics that are being collected are done via an opentelemetry receiver
- Each of which can be customized to enable/disable the list of metrics that it collects as well as the labels that come with those metrics.
- Each of these receivers usually have a metadata.yaml which lists out the metrics and their description and whether it's enabled by default or not
- E.g in case of k8sclusterreciever, this is the metadata.yaml
- For each of the receivers listed above we've enabled some of the metrics and disabled a few which weren't that relevant
List of Metrics
1. Cluster Info Metrics
Metric Name | Description |
---|---|
k8s.container.cpu_request | Resource requested for the container. Details |
k8s.container.cpu_limit | Maximum resource limit set for the container. Details |
k8s.container.memory_request | Resource requested for the container. Details |
k8s.container.memory_limit | Maximum resource limit set for the container. Details |
k8s.container.storage_request | Resource requested for the container. Details |
k8s.container.storage_limit | Maximum resource limit set for the container. Details |
k8s.container.restarts | How many times the container has restarted in the recent past. This value can go indefinitely high and be reset to 0 depending on kubelet configuration. Rather than the exact value, consider it as either == 0 (no recent restarts) or > 0. |
k8s.container.ready | Whether a container has passed its readiness probe (0 for no, 1 for yes) |
k8s.pod.phase | Current phase of the pod (1 - Pending, 2 - Running, 3 - Succeeded, 4 - Failed, 5 - Unknown) |
k8s.pod.status_reason | Current status reason of the pod (1 - Evicted, 2 - NodeAffinity, 3 - NodeLost, 4 - Shutdown, 5 - UnexpectedAdmissionError, 6 - Unknown) |
2. KubeStats Metrics
Metric Name | Description |
---|---|
k8s.node.cpu.utilization | Node CPU utilization |
k8s.node.cpu.time | Total cumulative CPU time (sum of all cores) spent by the container/pod/node since its creation |
k8s.node.memory.available | Node memory available |
k8s.node.memory.usage | Node memory usage |
k8s.node.memory.working_set | Node memory working_set |
k8s.node.filesystem.available | Node filesystem available |
k8s.node.filesystem.capacity | Node filesystem capacity |
k8s.node.filesystem.usage | Node filesystem usage |
k8s.pod.cpu.utilization | Pod CPU utilization |
k8s.pod.cpu.time | Total cumulative CPU time (sum of all cores) spent by the container/pod/node since its creation |
k8s.pod.memory.available | Pod memory available |
k8s.pod.memory.usage | Pod memory usage |
k8s.pod.cpu_limit_utilization | Pod CPU utilization as a ratio of the pod's total container limits, emitted only if all limits are set |
k8s.pod.cpu_request_utilization | Pod CPU utilization as a ratio of the pod's total container requests, emitted only if all requests are set |
k8s.pod.memory_limit_utilization | Pod memory utilization as a ratio of the pod's total container limits, emitted only if all limits are set |
k8s.pod.memory_request_utilization | Pod memory utilization as a ratio of the pod's total container requests, emitted only if all requests are set |
k8s.pod.memory.working_set | Pod memory working_set |
k8s.pod.filesystem.available | Pod filesystem available |
k8s.pod.filesystem.capacity | Pod filesystem capacity |
k8s.pod.filesystem.usage | Pod filesystem usage |
container.cpu.utilization | Container CPU utilization |
container.cpu.time | Total cumulative CPU time (sum of all cores) spent by the container/pod/node since its creation |
container.memory.available | Container memory available |
container.memory.usage | Container memory usage |
container.memory.working_set | Container memory working_set |
container.filesystem.available | Container filesystem available |
container.filesystem.capacity | Container filesystem capacity |
container.filesystem.usage | Container filesystem usage |
k8s.container.cpu_limit_utilization | Container CPU utilization as a ratio of the container's limits |
k8s.container.cpu_request_utilization | Container CPU utilization as a ratio of the container's requests |
k8s.container.memory_limit_utilization | Container memory utilization as a ratio of the container's limits |
k8s.container.memory_request_utilization | Container memory utilization as a ratio of the container's requests |
k8s.volume.available | The number of available bytes in the volume |
k8s.volume.capacity | The total capacity in bytes of the volume |
k8s.volume.inodes | The total inodes in the filesystem |
k8s.volume.inodes.free | The free inodes in the filesystem |
k8s.volume.inodes.used | The inodes used by the filesystem. This may not equal inodes - free because filesystem may share inodes with other filesystems |
3. API
Metric Name | Description |
---|---|
http_client_request_error_count_total | Total number of client error requests |
http_server_duration_milliseconds_bucket | Cumulative counters for the observation buckets (The duration of the inbound HTTP request) |
http_server_duration_milliseconds_count | Count of events that have been observed for the histogram metric (The duration of the inbound HTTP request) |
http_server_duration_milliseconds_sum | Total sum of all observed values for the histogram metric (The duration of the inbound HTTP request) |
http_server_request_count_requests_total | Total number of HTTP requests |
http_server_request_size_bytes_bucket | Cumulative counters for the observation buckets (Size of incoming bytes) |
http_server_request_size_bytes_count | Count of events that have been observed for the histogram metric (Size of incoming bytes) |
http_server_request_size_bytes_sum | Total sum of all observed values for the histogram metric (Size of incoming bytes) |
http_server_response_count_responses_total | Total number of HTTP responses |
http_server_response_error_count_total | Total number of all response errors |
http_server_response_size_bytes_bucket | Cumulative counters for the observation buckets (Size of outgoing bytes) |
http_server_response_size_bytes_count | Count of events that have been observed for the histogram metric (Size of outgoing bytes) |
http_server_response_size_bytes_sum | Total sum of all observed values for the histogram metric (Size of outgoing bytes) |
http_server_response_success_count_responses_total | Total number of all successful responses |
httpcheck_duration_milliseconds | Measures the duration of the HTTP check. |
httpcheck_status | 1 if the check resulted in status_code matching the status_class, otherwise 0. |
MongoDB Metrics
Metric Name | Description |
---|---|
mongodb.collection.count | The number of collections. |
mongodb.data.size | The size of the collection. Data compression does not affect this value. |
mongodb.connection.count | The number of connections. |
mongodb.memory.usage | The amount of memory used. |
mongodb.object.count | The number of objects. |
mongodb.database.count | The number of existing databases. |
mongodb.health | The health status of the server. A value of '1' indicates healthy. A value of '0' indicates unhealthy. |
mongodb.session.count | The total number of active sessions. |
5. PostgreSQL Metrics
Metric Name | Description |
---|---|
postgresql.commits | The number of commits. |
postgresql.database.count | Number of user databases. |
postgresql.db_size | The database disk usage. |
postgresql.connection.max | Configured maximum number of client connections allowed |
postgresql.rows | The number of rows in the database. |
postgresql.index.scans | The number of index scans on a table. |
postgresql.index.size | The size of the index on disk. |
postgresql.operations | The number of db row operations. |
postgresql.table.count | Number of user tables in a database. |
postgresql.table.size | Disk space used by a table. |