KubeForge — Hands-on Kubernetes & EKS Learning

Scenario

Your Grafana dashboard shows no `kubelet_volume_stats_*` metrics. The `amazon-cloudwatch-agent` ConfigMap is missing the `metrics_collection` section for `disk` and `mem`. Add the missing metrics sections so volume utilization is tracked.

EKS Observability Stack

EKS clusters emit metrics and logs from multiple sources. The primary collection agents are:

Amazon CloudWatch Agent — collects host metrics (CPU, memory, disk) from EC2 nodes
CloudWatch Container Insights — enriches metrics with Kubernetes metadata (pod, namespace, node)
AWS Distro for OpenTelemetry (ADOT) — OpenTelemetry collector for traces and custom metrics
Fluent Bit — log forwarding from containers and nodes to CloudWatch Logs

CloudWatch Agent ConfigMap

The CloudWatch Agent is configured via a ConfigMap in the amazon-cloudwatch namespace. The cwagentconfig.json key controls which metrics are scraped:

{
  "agent": { "metrics_collection_interval": 60 },
  "metrics": {
    "metrics_collected": {
      "cpu": { "measurement": ["cpu_usage_idle", "cpu_usage_iowait"] },
      "disk": {
        "measurement": ["used_percent"],
        "resources": ["*"]
      },
      "mem": { "measurement": ["mem_used_percent"] }
    }
  }
}

Kubelet Volume Metrics

Kubelet exposes kubelet_volume_stats_* metrics (capacity, available, used) for each PVC. These are critical for disk utilization alerts. To surface them in CloudWatch Container Insights, the CloudWatch Agent must have disk in metrics_collected.

Without disk metrics, kubelet_volume_stats_used_bytes never reaches CloudWatch and PVC full events go unnoticed until pods start failing.

Container Insights Namespace Metrics

Container Insights publishes pre-aggregated metrics to the /aws/containerinsights/<cluster>/performance log group. You can query these with CloudWatch Metric Insights or build dashboards without raw Prometheus.

Prometheus + ADOT

For teams already running Prometheus, ADOT can scrape Prometheus endpoints and remote-write to Amazon Managed Service for Prometheus (AMP). From AMP, Grafana can visualize metrics without managing Prometheus long-term storage.

Enable kubelet volume metrics for EKS observability

intermediate~20 min

manifest.yamlYAML

Cluster loading…