May 8, 2026 7 min read
CloudWatch Logs: the bill that grows quietly until it doesn't
New log groups default to indefinite retention. VPC Flow Logs default to full mode. Together they explain four-figure monthly CloudWatch Logs bills on accounts that never thought of logs as a line item. The pattern, the audit query, and the three-step fix.
CloudWatch Logs is the bill no one budgets for. It does not show up on architecture diagrams. No engineer ever proposed it as a line item. It accumulates, month over month, until someone in finance asks why the AWS bill jumped 8 percent and traces it to a $3,400 CloudWatch line that did not exist a year ago.
We have audited dozens of these. The pattern is identical across every account: two configurations on by default that no one turned off, plus a long tail of log groups that have been accumulating since the account was created.
The two defaults that drive the bill
New log groups default to indefinite retention. AWS creates the log group, no retention policy is attached, and the data sits at $0.03 per GB per month forever. The console shows "Never expire". Most engineers, when they see it, do not realize this is the default rather than the explicit choice. Every Lambda function, every ECS service, every CloudWatch agent install produces a log group with this default.
VPC Flow Logs default to "ALL" traffic when enabled, capturing every accepted and rejected packet across every ENI in the VPC. Vended logs to CloudWatch are billed at $0.25 per GB ingest. A VPC with 50 EC2 instances and a busy ALB generates 1 to 3 TB of Flow Log data per month. That is $250 to $750 per month for data most teams query under twelve times per quarter.
Stack the two. Indefinite retention takes the ingestion bill and amortizes it forever. Full-mode Flow Logs ten-x the ingestion bill in the first place. The result is what we keep finding in audits: $3,000 to $8,000 monthly CloudWatch Logs lines on accounts that did not think they had a logging strategy.
The audit query
AWS does not break CloudWatch Logs cost down per log group on the bill. The Cost and Usage Report has aggregated lines by region and usage type but does not split by source. To attribute, run this against the account:
aws logs describe-log-groups \
--query 'logGroups[].[logGroupName,storedBytes,retentionInDays]' \
--output text | \
sort -k2 -n -r | head -20
Sort by stored bytes descending. The top 5 log groups usually
account for 70 to 80 percent of the storage line. The largest is
almost always one of three things: /aws/vpc/flowlogs,
/aws/eks/<cluster>/cluster, or
/aws/lambda/<function> for a high-traffic
Lambda that logs every invocation. Those three patterns explain
the majority of accounts we audit.
For an exact monthly cost on your specific ingest, retention, and query pattern, the CloudWatch Logs cost calculator breaks the bill into ingestion, storage tail, and Logs Insights queries, and shows the savings from cutting retention.
The three-step fix
Step one: apply retention to every log group that does not have one. This is a one-liner against the AWS CLI:
aws logs describe-log-groups \
--query 'logGroups[?retentionInDays==`null`].logGroupName' \
--output text | \
xargs -n1 -I{} aws logs put-retention-policy \
--log-group-name {} --retention-in-days 30 Pick 30 days as the default. For groups that need longer history (audit logs, security logs), set them explicitly to 365 or 7 days and route them to S3 via a subscription filter for archive. The bill drops noticeably the next billing cycle. We have seen accounts cut $2,000 per month from this single change.
Step two: switch VPC Flow Logs to reject-only OR sample to 10 percent. Reject-only is the right call if you are using Flow Logs for security forensics (looking for blocked attempts) and not for traffic analysis. Sampling at 10 percent is the right call if you are using them to size NAT Gateways or measure cross-AZ traffic. Either change cuts Flow Log ingestion 90 percent. For a busy VPC that is $200 to $700 per month back.
Step three: route high-volume logs that need long retention to S3 instead of CloudWatch. CloudFront real-time logs are the classic case: at $0.50 per GB ingest plus $0.03 per GB per month storage, they cost roughly 18x what Kinesis Firehose to S3 costs ($0.029 per GB Firehose plus $0.023 per GB S3). For audit-grade retention where the logs are rarely queried, switch the destination. The export-to-S3 pattern preserves the logs in case Athena queries are needed later, at a fraction of the cost.
Why "just turn off logging" is not the answer
Every audit has someone in the room ask why we are not just turning off the noisy logs. The answer is that nearly all the logs are useful sometimes. VPC Flow Logs catch a security incident. Lambda logs explain a 3 a.m. error rate spike. EKS cluster logs are how you debug a control-plane regression. The problem is not that the logs exist. The problem is that they keep existing forever, at full volume, with no retention.
The right framing is "logs need a lifecycle." Hot for 30 days for active debugging. Warm in S3 for 1 year for incident forensics. Glacier or deletion after that. CloudWatch Logs is a fine choice for the hot 30 days. It is a terrible choice for the warm 11 months. The split between the two is where the savings live.
What we tell every audit customer
Three rules, in order:
Rule one: every log group has a retention policy. Default is 30 days. Audit logs go to 365 with explicit justification. No log group is left at "Never expire" by accident.
Rule two: vended logs go to S3 via Firehose, not to CloudWatch Logs. The exception is the active 30-day debugging window for services where you actually run Logs Insights queries. For everything else, S3 is 10x cheaper for the same data.
Rule three: CloudWatch Logs cost is a quarterly review item. The line drifts up between reviews because new services keep creating new log groups. The drift is small per quarter and large per year if no one looks.
Rule one is the biggest single check, and it costs nothing to apply. Rule two is the structural fix. Rule three is what keeps the bill from re-growing in 18 months.
Keep reading
More from the blog
May 8, 2026 · 8 min read
S3 Intelligent-Tiering vs Standard-IA: when each is the wrong choice
Both classes look like a free lunch at first glance. The math says otherwise on small objects, short-lived data, and predictable access patterns. The decision tree we run on every audit, with the dollar thresholds where each class breaks even.
May 8, 2026 · 6 min read
EBS gp2 to gp3: the easiest AWS savings still left on the table in 2026
gp3 launched in December 2020. It is 20 percent cheaper per GB than gp2 and includes 3,000 IOPS and 125 MB/s for free. Most accounts still run gp2. The math, the migration command, and why it is the safest production change AWS offers.