AWS Security from First Principles: Network and Infrastructure (Part 1)
AWS security is a big topic, and most people learn it in fragments. A blog post about WAF here, a tutorial about Security Groups there. But the pieces don't connect unless you see the full picture.
This is Part 1 of a two-part series. This post covers the network and infrastructure layer: how traffic flows, how to filter it, and how to keep it private. Part 2 covers identity, encryption, and governance.
Control Plane vs Data Plane
Before anything else, you need to understand this distinction. Every AWS service has two planes:
The control plane manages and configures resources. Creating an EC2 instance, making an S3 bucket, modifying a security group. These are API calls that change your infrastructure. They're protected by IAM policies, SCPs, and logged by CloudTrail.
The data plane handles the actual traffic and data flow. An HTTP request hitting your load balancer, a user reading an S3 object, an app querying DynamoDB. These are protected by Security Groups, NACLs, WAF, TLS, and encryption.
Why does this matter? Because they're attacked differently and defended differently.
A control plane compromise means an attacker can create, delete, or modify your infrastructure. Open a security group to the internet, delete your backups, create a new IAM role to exfiltrate data. You defend against this with IAM, MFA, SCPs, and least privilege.
A data plane compromise means an attacker can access or modify your actual data. SQL injection, intercepting traffic, reading S3 objects they shouldn't. You defend against this with security groups, NACLs, WAF, TLS, and encryption.
A simple example: creating an RDS instance is a control plane operation (IAM decides if you can). Connecting to that RDS and running a SQL query is a data plane operation (the security group decides if your IP can reach port 3306, and TLS encrypts the connection).
Security Groups vs NACLs
Both filter network traffic, but they work at different levels and behave differently.
A Security Group is a firewall around a specific resource (technically, around its network interface). It's stateful, meaning if you allow inbound traffic on port 443, the return traffic is automatically allowed. You can only write allow rules, not deny rules. All rules are evaluated together.
A NACL (Network Access Control List) is a firewall around an entire subnet. It's stateless, meaning you must explicitly allow both inbound and outbound traffic. You can write both allow and deny rules. Rules are evaluated in order, lowest number first, and the first match wins.
The practical difference: Security Groups are your primary tool. Apply them to every resource. Think of them as the lock on each apartment door. NACLs are the building's front gate. Most teams leave NACLs at their default (allow all) and rely on Security Groups for fine-grained control. NACLs are useful for blocking known bad IP ranges at the subnet level.
NAT Gateway Security
A NAT Gateway lets resources in a private subnet reach the internet (outbound) without being directly reachable from the internet (inbound). You can't attach a Security Group to a NAT Gateway. Instead, you control access through the Security Group on the instances behind it and the NACL on the subnet. Place the NAT Gateway in a public subnet and route private subnet traffic to it.
WAF (Web Application Firewall)
WAF operates at Layer 7 (HTTP). Security Groups and NACLs operate at Layer 3/4 (IP and port). They solve different problems.
WAF sits in front of your application and inspects the actual HTTP requests. You attach it to CloudFront, ALB, API Gateway, or AppSync. Use it to block SQL injection, cross-site scripting, rate limit requests for DDoS mitigation, geo-block countries, or filter by request headers, body, or URI.
WAF does not replace Security Groups or NACLs. A Security Group blocks traffic by IP and port before it ever reaches your application. WAF blocks traffic by inspecting the content of HTTP requests that already made it through the network layer.
Securing an ALB
An Application Load Balancer is often the front door to your application. Here's how to lock it down:
- Attach a Security Group that allows only port 443 inbound (HTTPS)
- Use an ACM certificate for TLS termination
- Attach WAF for Layer 7 filtering
- Enable access logs to S3
- Backend instances should only allow traffic from the ALB's Security Group, not from the internet
That last point is important. Your EC2 instances or containers should reference the ALB's security group as the source, not 0.0.0.0/0. This way, even if someone discovers your instance's IP, they can't bypass the ALB.
Securing API Gateway
API Gateway has its own security model:
- Use IAM auth, Cognito authorizers, or Lambda authorizers for authentication
- Attach WAF for Layer 7 filtering
- Use usage plans and API keys for throttling (but not for auth, API keys are not secrets)
- Use resource policies to restrict access by IP or VPC
- For private APIs, use VPC endpoints so traffic never leaves the AWS network
- Enable mutual TLS (mTLS) for certificate-based client authentication
ALB vs NLB
Both are load balancers, but they operate at different layers and have different strengths.
ALB works at Layer 7 (HTTP/HTTPS). It understands URLs, headers, cookies, and query strings. It can route based on path (/users goes to one service, /orders goes to another) or hostname (api.example.com vs admin.example.com). It supports WAF, Lambda targets, and SSL termination. But it doesn't give you static IPs.
NLB works at Layer 4 (TCP/UDP). It only sees IP addresses and ports. It's faster (single-digit millisecond latency), handles millions of requests per second, gives you static Elastic IPs per availability zone, and supports PrivateLink. But it can't do path-based routing or attach WAF.
When to use ALB: web applications with microservices, path or host-based routing, WAF protection, Lambda as a target.
When to use NLB: PrivateLink/VPC Endpoint Services, gaming or IoT protocols (TCP/UDP), static IPs for partner whitelisting, extreme performance requirements, TLS passthrough.
A common pattern is NLB in front of ALB. You get static IPs and PrivateLink from the NLB, plus path routing and WAF from the ALB. NLB supports ALB as a target.
VPC Endpoints vs PrivateLink
When your application in a private subnet needs to call an AWS service (like S3, KMS, or STS), the traffic normally goes through a NAT Gateway to the public internet and back to AWS. That's wasteful and less secure.
VPC Endpoints keep that traffic on the AWS backbone network.
There are two types. Gateway Endpoints work for S3 and DynamoDB only. They're free. You add a route table entry and traffic stays on the AWS network. Always use these.
Interface Endpoints work for 100+ AWS services (KMS, STS, ECR, CloudWatch, Secrets Manager, etc.). They create an ENI with a private IP in your subnet. They cost money (hourly plus per-GB), but they save NAT Gateway data processing costs for high-volume calls and keep traffic private.
PrivateLink is the underlying technology that powers Interface Endpoints. It creates a private connection between your VPC and a service without traffic crossing the public internet.
The important use case: third-party SaaS services like Snowflake, Databricks, and MongoDB Atlas expose their services as PrivateLink endpoint services. You create an Interface Endpoint in your VPC pointing to their service, they accept the connection, and traffic flows over the AWS backbone. No internet, no public IPs, lower latency, smaller attack surface. This is often required for compliance (PCI-DSS, HIPAA, FedRAMP).
Defense in Depth
No single layer is sufficient. Each layer adds protection that the others don't cover:
Internet
|
v
CloudFront + WAF -- L7 filtering, geo-block, rate limit
|
v
ALB + SG + ACM cert -- TLS termination, L3/4 filtering
|
v
Subnet + NACL -- Coarse L3/4 deny rules
|
v
Instance + SG -- Fine-grained L3/4 allow rules
|
v
Data at rest -- KMS encryption
In Part 2, we'll cover the other half: IAM, SCPs, encryption, KMS, and the security services that tie everything together.