Cloud Horizon Get the free audit

May 10, 2026 8 min read

NAT Gateway: the AWS bill that hides in private subnets

NAT Gateway looks like plumbing until the private-subnet traffic grows. The hourly fee is predictable, the per-GB processing fee is not, and the expensive traffic is usually S3, ECR, package mirrors, or cross-AZ chatter that should not be there. The audit path, the math, and the fixes that cut the line without breaking production.

NAT Gateway is one of the cleanest AWS services to deploy and one of the easiest to ignore. Put private workloads in private subnets, give them a default route to NAT, and the app can reach package registries, APIs, S3, ECR, and every SaaS endpoint it needs. The architecture diagram looks normal. The bill does not.

AWS charges for NAT Gateway in two places: an hourly charge while the gateway is provisioned and a data-processing charge for every gigabyte processed through it, regardless of where the traffic started or ended. In US regions the common public baseline is $0.045 per hour and $0.045 per GB. Three gateways running all month cost about $99 before any data moves. Push 20 TB through them and the data-processing line adds about $921. Push 100 TB and NAT is suddenly a five-figure annual problem.

The mistake is treating that as internet egress. Most of the expensive NAT lines we audit are not user traffic. They are private workloads calling AWS services through public endpoints, container clusters pulling layers through NAT, CI workers downloading the same package archive thousands of times, or chatty services crossing an AZ boundary before they ever leave the VPC.

The baseline math

Start with the shape AWS recommends for availability: one NAT Gateway per Availability Zone, and private subnets in each AZ routing to the local gateway. That pattern is operationally sound. It also means a three-AZ VPC has three hourly lines running 730 hours per month.

The hourly line is rarely the real issue. At $0.045 per hour, three gateways cost roughly $98.55 per month. The data-processing line moves faster. Ten terabytes through NAT is about $461. Twenty terabytes is about $922. A busy build cluster, analytics job, or container platform can move that much without anyone thinking of it as networking spend.

This is why a NAT line can sit under $200 for months and then jump to $2,500 after a platform rollout. The count of gateways did not change. The route table did.

The four traffic patterns behind the spike

1. S3 and DynamoDB without Gateway Endpoints

S3 and DynamoDB have Gateway VPC Endpoints. The endpoint has no hourly charge and no per-GB charge. Route the private subnet to the endpoint and the traffic no longer crosses NAT.

This is the first audit check because the fix is boring and the savings are immediate. If a workload moves 30 TB per month to S3 through NAT, the avoidable NAT processing line is roughly $1,382 per month in a US region. The Terraform change is one endpoint resource and a route table association. No application rewrite. No downtime.

2. ECR image pulls through public endpoints

Kubernetes and ECS platforms can turn image pulls into a networking bill fast. Every node scale-out, deploy, and rollback pulls layers. If ECR, ECR Docker, S3, CloudWatch Logs, and STS are all reached through public endpoints, NAT becomes the hidden tax on every release.

Interface Endpoints are not free, so we do not recommend blindly adding every service endpoint. ECR is different when clusters are busy. The hourly endpoint charge is usually smaller than the repeated NAT processing on image layers, especially when teams run many short-lived jobs or rebuild nodes often.

3. Package mirrors and SaaS APIs with no cache

Private CI workers pulling npm, NuGet, Maven, PyPI, apt, or container base images through NAT are a predictable source of waste. The traffic is legitimate. The repeated download pattern is the problem.

A package proxy, artifact cache, or internal mirror turns thousands of identical downloads into one outbound fetch plus local reads. This is a cost fix and a reliability fix: when a public registry has a bad day, builds keep moving.

4. Cross-AZ routes before NAT

NAT should be local to the AZ whenever possible. If a private subnet in one AZ routes through a NAT Gateway in another AZ, the account pays for NAT processing and may also pay standard cross-AZ data transfer. The NAT line gets blamed, but the architecture issue is in the subnet route table.

This often appears after a failure drill or a rushed VPC module change. One NAT is deleted, subnets are pointed at the surviving gateway, and nobody puts the local gateway back. The app works. The bill doubles on the loud paths.

The audit path

We do not start with a rewrite. We start with three questions.

First, what is the monthly NatGateway-Bytes line by VPC, account, and AZ? This splits idle NAT gateways from traffic-driven NAT gateways. The hourly line is inventory cleanup. The byte line is route and endpoint cleanup.

Second, does each VPC have Gateway Endpoints for S3 and DynamoDB attached to every private route table? Missing S3 endpoints are the fastest win in most accounts. Missing DynamoDB endpoints are less common but just as clean when the workload uses DynamoDB heavily.

Third, which destinations dominate the NAT flow logs? If the top rows are AWS services, move them to endpoints. If they are package registries, add a cache. If they are SaaS APIs, decide whether the traffic is normal customer value or a polling loop that should be throttled.

The fixes, in order

Enable S3 and DynamoDB Gateway Endpoints first. It is the least risky fix and the only one with a zero-dollar endpoint bill. The route table change is easy to review and easy to roll back.

Add Interface Endpoints for the high-volume AWS services next. ECR, CloudWatch Logs, SSM, Secrets Manager, KMS, and STS are the usual candidates. Measure traffic first because the endpoint hourly fee is real. The right endpoint set for a CI account is different from the right endpoint set for a quiet production app.

Fix cross-AZ NAT routes. Every private subnet should route to the NAT Gateway in the same AZ unless you are deliberately paying for a reduced gateway count. If the design intentionally uses fewer gateways, write that down as a cost trade-off instead of letting it hide.

Cache the repeat downloads. Artifact caches and package mirrors are not only for build speed. They are network cost controls. Once a platform team sees package download volume next to the NAT line, this usually becomes an easy sell.

Use the calculator before the Terraform change

The free NAT Gateway cost calculator separates hourly cost from data-processing cost so the fix is obvious. If the hourly line dominates, clean up idle gateways and decide whether every AZ needs its own NAT. If the data line dominates, go straight to endpoints, route tables, flow logs, and caches.

For the source pricing model, AWS documents the NAT hourly and per-GB charges on the Amazon VPC pricing page and recommends endpoint and same-AZ routing strategies in the NAT Gateway pricing guide. The calculator uses editable defaults because real accounts may have a private pricing agreement or region-specific rates.

The 14-day Cloud Horizon audit takes the same path with real account data: NAT spend by VPC, missing endpoint coverage, cross-AZ route checks, and the exact Terraform diff that removes the waste. Read-only access, one-page summary, and no spreadsheet archaeology.

Run this on your real account

Free 14-day NAT Gateway audit

We pull your real NAT spend, find missing endpoint coverage, identify cross-AZ routes, and hand your engineer the Terraform diff. Free, read-only, and focused on the line items you can actually reduce.

Keep reading

More from the blog