AWS VPC: Security Group vs NACL
The AWS VPC network layer can be protected with Security Group and with NACL (Network ACL). These constructs provide a "similar" functionality. Hence it becomes the confusing to understand which one should to use.
First point to understand is that these are complementing constructs. Which means you should use both of them. Together these form a "Swiss cheese model".
Virtual Network Firewall
To understand better we should start with VPC itself. As you know, VPC is a basic protective piece of AWS cloud, inside which (almost) all your AWS resources should live. The sole idea of VPC is to isolate resources. If you've not created a VPC on your own, AWS would have it for you - a default VPC. One distinctive feature of VPC is that it can span availability zones. The VPC is further divided into sub network components, named as Subnets. These subnets are confined to the availability zones.
A virtual firewall protects the traffic entering and leaving the VPC. The Network ACL (NACL) and Security Groups help to build this firewall. Now, there are two distinct traffic flows, as you might have guessed already. The inbound traffic i.e. traffic entering the VPC and the outbound traffic i.e. traffic leaving the VPC. These both flows needs to be protected.
As you would learn below, the Security Group and NACL kind of start in opposite directions. And work to complement each other.
What is the Network ACL?
The traffic entering into VPC would come across first with NACL. NACL would be the last to come across the leaving traffic. The NACL protects the traffic at the network layer. The NACL, uses inbound and outbound rules for this purpose. This is crucial to understand that, NACL allows all traffic to enter and leave the subnet by default. With each VPC, AWS creates a default NACL, which you cannot delete.
You may associate a single NACL to many subnets if required. Only one NACL is associated with any subnet at any moment though (see above). NACL is a set of rules ordered in ascending order. Rules are executed in ascending order until the rule conditions is not met. Rules are not evaluated further, once any rule condition matches.
The NACL is stateless, in simple terms, allowing an inbound connection from an IP on a specific port does not automatically allow outbound traffic for the same connection.
So you can say NACL is an optional form of network protection. This is because, although a subnet must have a NACL attached, be default all the traffic is allowed. But it doesn't mean NACL is useless. In fact when used with security group at finer level, it bolsters the network security.
What is Security Group?
When a VPC is created AWS creates a default Security group as well. You can add and remove rules from a default security group, but you can't delete the security group itself.
The Security group is used for instance level security. and can be applied to many resources even across the subnets. The Security group follows least privilege model. The Security group by default denies all the traffic i.e. Security group can have only "allow" rules. Security group rules are stateful. It remembers the connection that is allowed for inbound traffic. The same connection allows outbound traffic. Hence you need to specify only inbound rules.
The security groups are actually attached to network interfaces. The the rules can have reference to other security groups as well. This enables instances within different subnets to talk to each other.
One more distinctive feature is, unlike NACLs, the rules are applied as a group. That means you may apply many security groups to an instance, again unlike NACLs. The most permissive rule is applied i.e. your instance is only as secure as your weakest rule.
Each instance must have an associated security group, which makes it a required form of the network protection. You cannot disassociate all the security groups from an instance. This ensures you do not accidentally expose your instances to the world.
By now, you would understand why I said, the virtual network firewall constructs - NACL and Security group start in opposite direction and complement each other.
How NACL and Security Groups complement each others?
The NACL operates on the subnet level and hence gets access to traffic first. This allows it to filter the traffic before it reaches to the next level which is Security group. If any traffic is denied by the NACL, security group never sees it. The security group can still deny the traffic allowed by the NACL.
This has an interesting implication on how we can make these two work together. You can either have finer rules defined in NACL and broader rules defined in Security group, and vice versa.
As we saw earlier, Security group is a required form of protection while NACL is an optional form. Thus, it suffices to have only Security group defined. But doing that would not be constructive. Let's see why?
NACL works on the whole subnet level (one subnet == one NACL), i.e. NACL is applied to all instances within that subnet. So even if accidently, an overly permissive security group gets associated with instances, if NACL is restrictive enough the instances would be protected. This is an example of "Swiss cheese model" in working. This lowers the risk of resource misconfiguration, data leakage, and attacks.
Let's sum up our learnings in this infograph.