"Knowing is the half the battle won", a wise soul once said. History is testimony to it. It worked brilliantly in favour of those, who knew exact state of troops, rations and weapons. Not just that, any weaknesses that adversaries may pick on. If you knew nothing about your army divisions roaming , or depleted rations, you were bound to fail. Knowing nothing, or knowing only little is a recipe for disaster.
The phrase resonates not in battles but even in today's technical environments as well. To be fair, modern workloads (read cloud) are no lesser evil than the battlefields. With the dynamics changes happening every moment: the resources created and destroyed at the whim of some weird elasticity rules, configurations updated with each deployment, or production issues and so on. To manage your workloads you need an uptodate information of your workload. And not workloads, but the surrounding systems with which it may interact - the relationships.
Who knows AWS better than AWS itself. So when AWS brings its own native service, which promises the "knowing everything" and "protecting" AWS workloads, why one should not believe?
AWS Config debuted in Nov, 2014 with promise of a complete visibility in AWS Resources, with access to all the configuration history, and with grand promise of being the configuration auditor of the AWS. After 7 long years when we take look back at the promise, one starts wondering why there are so many CSPM startups after all?
AWS Config is AWS configuration auditor. A configuration auditor is specially useful in:
- Cloud Asset Inventory Management
- Cost Management
- Change Management and
- Security Management
The AWS Maze
AWS is a huge conglomerate of services . These services are abstracted layer on the top of basic compute, network and storage services. E.g. if you take a look at the AWS Lambda, behind the scenes it uses EC2 compute infrastructure, uses its network infrastructure for data flow, and isolates from other workloads based on VPC technology. When this service is made available to the user, it is made available through a defined set of APIs (which are wrapped under AWS SDKs for easy adoption and use), and user can control some aspects of the service configurations. As an abstracted service view, AWS exposes certain entities to the user. These are commonly called as Cloud Assets or Cloud Resource types.
In case of AWS Lambda, user need to program code inside the Lambda Functions, while the reusable code would be composed in the form of Layers. Each such layer can be versioned as Layer Version. From security point of view you need to be absolutely sure that only your code runs as lambda, which is accomplished in part by code signing. This code signing is done by a different AWS service, AWS Signer. A user need to provide the inputs for code signing information, in the form of code signing configuration. The lambda can be invoked in response to an event. The source of these events, say SQS needs to be configured. This configuration is collected as Event Source Mapping. If you're running mission critical workload in lambda, you may want to avoid the cold starts. This is done through provisioned concurrency config. And I'm not covering Function EventInvoke Configuration. Now step back a little and take a look at all those italic words in this paragraph. Those are cloud assets.
|Service||Asset Type||AWS Service Relationships|
|Code Signing Configuration||AWS Signer|
|Event Source Mapping||SQS, Kinesis, ...|
|Provisioned Concurrency Config|
|Function EventInvoke Config|
The cloud is a complex maze of cloud assets and the relationships between them.These are asset types and services, that one has to deal with. And each and everyone of it, if you really care about security, cost and operations.
Ignorance is not a bliss in cloud. E.g. If you're blind to provisioned concurrency configuration, it is going to have an impact from cost view - function becomes ineligible for free tier and you would be paying for the reserved capacity. Not just that, the Provisioned Concurrency level counts to the function’s Reserved Concurrency limit and also to the account regional limits. It is common for such values to be tweaked and forgotten during incident resolution processes in production.
Traveling down a foggy road
First tenet of any configuration auditor is visibility. Security follows the visibility.
The AWS Config, unfortunately has a very poor record on this front. In Mar 2019, it had supported 26 AWS services and 72 resource types. In 2021, AWS Config has managed to increase its coverage by whopping 43% to 103 resource types. It is quite an effort isn't?
Now look at the number of AWS services as of now in Jul 2021. We have approximately 230 AWS services. Terraform is de facto IaC standard, which has notion of resources corresponding to the configuration items. Going back to our example of AWS Lambda this one on one corrospondence can be clearly seen. Terraform supports ~630 resource types. Another IaC tool, AWS's own CloudFormation has a support of ~330 resource types. This translates the AWS Config coverage to < 30% when compared to CloudFormation and < 20% when compared to Terraform. Often the complaint about CloudFormation is that it lags way behind Terraform.
Even for the AWS services that AWS Config supports, there are many resources which are not kept track of by AWS Config. You can see from the AWS Config page that, even common AWS services like Route53 and more are not supported by AWS Config. In fact, support for AWS’s oldest service SQS was added only in 2020.
The asset inventory forms the foundation for cost, security and management, this lack of coverage in AWS Config is discouraging.
Nothing is as it should be
Another stated purpose of configuration auditor is Change management. This is specially important when dealing with incidents. Consider for example, rules in load balancer, min/max for auto-scaling group. These things are often changed manually during production issues and if you don't have backups, it's quite easy to mis-configure and then spend time trying to recall previous values. This is where configuration change history shines.
Just imagine, it to be like a git version but applied to the cloud. Having an established baseline or golden snapshot is often a good idea. Obviously, IaC should be the first base for establishing the baseline, but reality is often different from the idealities. Even for the companies that have invested heavily in IaC, there are often resources which were created long back and haven't been included in IaC.
AWS Config has useful features like resource change timeline. If you need to view information across accounts and regions an aggregator needs to be created though. The expectation from AWS Config is to provide this versioning in a handy way.
AWS Config is a regional service, meaning you need to setup this service in all regions for all AWS accounts. When you have a good number of AWS accounts this quickly translates into a sizeable effort. This is an hindrance to adaptation. Often, AWS Config is not enabled in regions where you do not work. If AWS Config is not enabled for a region, it becomes a blind spot. You need to make sure to turn it ON every region (you work).
In short, AWS Config demands quite a lot efforts from its users while configuring the service for multi-accounts and regions. These are:
- Enabling configuration recorder for each region.
- Deciding to either cover all resource types or specific resource types.
- For multi-account + multi-region create an aggregator.
- When you decide to add a new region, remember to setup recorder in each region for each account
- Update the aggregator to include new region.
AWS Config service's multi-step/per regions setup requires sizeable efforts further limiting the visibility.
And it costs nothing
AWS Config pricing model is volume based and have multiple parts, making it complex and indeterministic.
Each resource change recording costs $0.003. Mind that a recording is counted when any relationship changes or configuration changes. In highly dynamic environment, this can quickly become indeterministic. For example, ECS tasks when running in awsvpc networking mode attaches/detaches ENIs, does it count towards configuration record? Same applies to the Lambda when running in user's VPC. You need be careful of situations where accidental configuration change may trigger volume based actions.
Another cost contribution comes from the rule execution, which is again volume based. 100000 rule execution costs $0.001 per rule evaluation per region and so on. In addition, there is cost associated with S3 objects (snapshot and history) that would be saved in S3 bucket.
In short, remember AWS Config pricing is volume based. If you have a dynamic environment the cost can be indeterministic.
The volume not only impacts the cost it may also have an effect on how long AWS Config takes to detect configuration changes. AFAIK, there is no official documentation around this which gives a clear SLA between resource change/creation and Config detecting and evaluating it. Higher the activity more the delay.
So if you're considering AWS Config as configuration auditor, you should consider:
- Limited visibility into cloud assets
- Region based setups efforts
- Indeterministic Pricing model
Thanks for reading!