Cloud Horizon Get the free audit

May 8, 2026 9 min read

Cross-AZ data transfer: the quiet tax on every chatty AWS workload

Cross-AZ data transfer charges $0.01 per GB in each direction. That sounds like nothing until you see what a chatty microservice mesh, a multi-AZ RDS, and a misplaced NAT Gateway can do to it. The patterns we see, and the architectural fixes that pay back in weeks.

Cross-AZ data transfer is the single most underestimated line on the AWS bill. The price is $0.01 per GB in each direction, billed on both the sending and receiving instance. A round trip is therefore $0.02 per GB. For a workload pushing 50 TB a month between AZs, that is $1,000 every month for traffic that never leaves the region.

It hides because nobody plans for it. Engineers optimize for availability and reach for multi-AZ defaults. The data transfer column in the bill labels these charges as DataTransfer-Regional-Bytes, which most teams do not recognize, and the per-GB rate looks small enough to ignore. By the time the line item is large enough to notice, the architecture is already shaped around it.

The four patterns that produce most of the bill

Across audits, four architectural shapes account for almost all the cross-AZ traffic we find. They compound, which is why a single workload can rack up tens of thousands a month without anyone signing off on it.

1. Chatty microservice mesh spread across AZs

Service A in AZ-1 calls Service B in AZ-2, which calls Service C back in AZ-1. Each hop crosses an AZ boundary. Each crossed boundary bills on both sides. A single user request that fans out to ten internal calls easily moves a megabyte of payload across AZs and bills it ten times.

The fix is topology-aware routing, not zone-pinning. Most service meshes (Linkerd, Istio, App Mesh, Consul) support locality-weighted load balancing where requests prefer same-AZ replicas and only cross zones when the local replica is unhealthy. That single setting often cuts cross-AZ traffic by 60 to 80 percent on a chatty mesh.

2. Multi-AZ RDS or Aurora with a single-AZ application tier

RDS Multi-AZ replicates writes to a standby in a second AZ. That replication traffic is free. What is not free is the application tier in AZ-1 reading from a primary that fails over to AZ-2 and keeps serving reads from there. Every query response now crosses an AZ boundary on the way back to the application.

Aurora makes this worse. The cluster has writers in one AZ and readers spread across all AZs. If the application uses the cluster endpoint for reads, the round-robin will land in a different AZ on most calls. Pinning reads to the local-AZ reader endpoint costs nothing and removes the entire line item.

3. NAT Gateway in the wrong AZ

A NAT Gateway lives in one AZ. Every private subnet that uses it sends traffic to that AZ before the NAT forwards it to the internet. If the NAT is in AZ-1 and the workload is in AZ-2, every byte of outbound internet traffic pays the cross-AZ tax on top of the NAT data-processing fee and the internet egress fee.

Three line items, stacked: $0.01 cross-AZ, $0.045 NAT processing, and $0.09 egress on the first 10 TB. That is $0.145 per GB on traffic that should be costing $0.09. The fix is one NAT Gateway per AZ with subnet-route-tables that route to the local NAT. Hourly cost goes up by a few NAT Gateways. Per-GB cost drops by a sixth.

4. EKS pods scheduled across AZs without topology awareness

Default EKS scheduling spreads pods across AZs for resilience. That is the right default. The wrong default is a Service in front of those pods that round-robins across all AZs without checking where the caller is. Every internal Service-to-Service call has a two-thirds chance of crossing an AZ.

Kubernetes 1.21 added topology-aware hints, and EKS supports them natively. With one annotation on the Service, kube-proxy prefers endpoints in the caller's zone. The traffic graph stays the same, the bill drops noticeably, and resilience does not change because cross-zone fallback still happens when local endpoints are gone.

How to measure it before you fix it

VPC Flow Logs are the only ground truth. Cost Explorer reports the total per region and per service, but it does not tell you which workload is producing the traffic. Flow Logs do, and the schema includes both source and destination AZ.

Enable Flow Logs on the VPC, ship them to S3 in Parquet format, and query them with Athena. The query that matters: source AZ, destination AZ, summed bytes, grouped by source ENI. The top ten ENIs almost always account for 80 percent of the cross-AZ traffic. From there it is one tag-lookup to map the ENI back to a workload.

For teams that already have a cost-and-usage report (CUR) loaded somewhere, the line item to filter is lineItem/UsageType matching %DataTransfer-Regional-Bytes%. The CUR will not tell you the ENI, but it will tell you the instance-hour cost of the line, which is enough to scope the work.

What the math actually looks like

Take a moderate-sized SaaS application: 200 microservice instances across three AZs, each one chatting at roughly 50 KB per request, 100 requests per second per instance. That is one gigabyte per second of internal traffic, two-thirds of which crosses AZ boundaries by default. Sustained, that produces about 1.7 PB of cross-AZ traffic per month, which bills at $34,000 a month at $0.02 per GB round-trip.

After topology-aware routing brings local-AZ preference up to 95 percent, the same workload moves about 130 TB cross-AZ a month. That is $2,600 a month. The change is one annotation on each Service object and a Helm chart bump. Two engineering days, one deploy window, $31,000 a month back.

Plug your own numbers into the AWS data transfer cost calculator and see the line items broken down by route. The cross-AZ row is almost always the surprise.

The audit checklist we run

  • Is the service mesh configured with locality-weighted load balancing? If yes, what is the weight bias?
  • Are read endpoints on RDS and Aurora pinned to local-AZ readers, or is the application using the cluster endpoint?
  • Is there one NAT Gateway per AZ, with route tables that send local-AZ private subnets to the local-AZ NAT?
  • Are EKS Services using topology-aware hints, or are they routing across all AZs by default?
  • Are VPC Flow Logs enabled, and is the Athena workgroup that queries them owned by someone who looks at the data quarterly?

Five questions. Most teams fail two of them. Each fix is a one-week ticket and pays back in weeks, not quarters.

What this looks like on the bill afterward

DataTransfer-Regional-Bytes does not disappear. There is a baseline of cross-AZ traffic that resilience requires: writes to a multi-AZ primary, replication to a standby, failover paths. A healthy production workload on AWS spends roughly 5 to 10 percent of its compute bill on cross-AZ data transfer. Anything above 15 percent is signal that one of the four patterns above is in play.

The teams that have this line under control share one habit: they treat cross-AZ traffic as a budgeted resource, not a free one. They review the top ENIs by cross-AZ bytes monthly. They tag every Service with the locality strategy. They notice when the line item steps up. The rest of us discover it during the audit.

Keep reading

More from the blog