ACM.110 CloudFormation is an amazing concept but it needs a little TLC
This is a continuation of my series of posts on Automating Cybersecurity Metrics.
In the last post we looked at adding a policy to our VPC Endpoint that provides access to CloudFormation via a private network (i.e. without traversing the Internet.)
Add a Policy to an AWS VPC Endpoint
Some people struggle with CloudFormation — understandably so, based on the things I’ve been writing about and some of the challenges we’ve had trying to deploy resources with CloudFormation in these blog posts. Although I get tripped up and frustrated with CloudFormation at times, I still love it. Whomever designed it was a genius and understands fundamentals of good system design with proper separation of concerns — a topic of my next post.
I wrote about getting started with CloudFormation here and how to make it easier if you’re struggling to learn or simply make it easier to write CloudFormation templates:
There are a lot of little things that can trip you up, but I would argue those things could probably be fixed by AWS if enough people asked. Check out this post on the AWS Wishlist where you can submit feature requests to AWS on Twitter.
You can also request changes to the CloudFormation RoadMap on GitHub.
CloudFormation needs additional attention on error messages
All the nuances with getting the spaces, dashes, and colons in exactly the right place are kind of painful. I agree. This is not a problem with CloudFormation itself. This is a problem caused by insufficient testing and lack of thoughtful error messages, not a problem with the concept or design of CloudFormation itself.
The importance of CloudFormation
It is likely not the team’s fault who supports CloudFormation at AWS that these problems exist. I imagine they are doing the best they can with the resources they have and need more or better resources assigned to the problem to resolve it.
Likely people are clamoring to get on the team that is going to produce the next big thing announced at AWS re:Invent rather than fixing error messages in CloudFormation. The company needs to reward the people who maintain the most fundamental aspects of the whole cloud platform highly so the core of the product maintains its integrity.
That is the problem a lot of companies have. They try to build the next big new shiny thing and don’t focus on the fundamentals. Their product is not easy to use or has issues that don’t meet the customer’s needs and so the customer opts to use another product that is simpler or more aligned with their particular problem. I hope AWS continues to address and improve on CloudFormation because it is so fundamental to everything in the platform.
I would argue that the CloudFormation team or whomever is working on it are some of the most important people at AWS. CloudFormation can help prevent a lot of security problems when used properly. When people can’t figure it out or don’t use it then they go around clicking buttons in their AWS accounts and open up S3 buckets to the world and so on. Make it easy to write a secure CloudFormation stacks that disallow things like public S3 buckets — something I address in my cloud security classes.
The problems may be bubbling up from other teams
The problems with CloudFormation may not be the CloudFormation team’s fault at all. When an error message occurs that is absolutely unhelpful in solving a particular configuration problem, that could be coming from the team that designs the related service. In a rush to get new features to market, CloudFormation and related errors may be an afterthought.
In other cases, the team may be intentionally hiding certain information from error messages in the name of security (KMS?) but I don’t really think it’s adding any security value. In fact, it makes people throw up their hands and skip security to come back and “do it later” when it is too hard to fix and project managers are breathing down their necks. As we all know, later often never comes.
In general, I find that companies spend a lot less time formally testing their infrastructure deployment code — if they test it at all. Perhaps a more rigorous testing process for CloudFormation error messages across teams. The process needs to be enforced throughout AWS to ensure every team writes clear error messages that tells a customer how to fix the error they are getting in CloudFormation.
Test rollbacks and delete statuses to make sure they can be fixed
Whomever is responsible for developing CloudFormation resources that can have dependencies needs to test every path that can get that CloudFormation stack into a bad state. Here are some examples:
A CloudFormation stack cannot be deleted because it has something depending on it. It gets ito a rollback state. Then there’s no way to fix it after that point. BETTER: Allow a customer to accept that issue and return the stack to a normal state. It’s ugly when it sits there in an error state and the customer decides they just want to leave it. At that point there’s no problem — so return it to a valid “green” status.Make the dependency hierarchy easier to discern. Make sure it is easy for a customer to remove resources and fix the underlying problem when they hit a dependency issue. Test every variation of action a customer may take — manually deleting the dependency, deleting the related statck, etc. Ensure that things are always in a recoverable state. Perhaps a way to list out dependencies in the AWS console so it’s easy to see what may be affected when an action is taken on a CloudFormation stack.Can’t deploy stacks in a rollback state. Sometimes stacks get into a rollback state and CloudFormation can’t deploy over it. This is silly. I had to write some code to automatically delete a stack and then redeploy it. AWS could easily do this or provide some switch on the CLI if you don’t want to do that automatically. If it is there — I couldn’t find it. I see methods for overriding certain things but did not resolve that particular problem. The code I wrote is in one of the other blog posts in this series.An underlying resource gets altered outside of CloudFormation (sometimes by AWS when it comes to key policies or trust policies, which I have written about many times is a big problem). Once this happens stacks get into weird states that are very difficult to resolve as I’ve written in the past. These things should be tested and resolved so they don’t require a customer to end up deleting an entire stack of resources just to fix a problem.
Someone at AWS should be tracking error metrics (if they aren’t)
What do I mean by tracking metrics? Once I wrote a system that emailed me every single error message experienced by an end user. I wanted to fix every single problem and bug people were facing.
Amazon could do the same. Track which errors users get most often — and especially those where a person submits a template multiple times that comes up with the same error — and one by one, find a way to make those things easier to resolve via better CloudFormation error messages. Less error messages and faster time to resolve issues will mean less load on AWS systems that support CloudFormation. I imagine customers and AWS will save an inordinate amount of time and money by fixing some of the highest recurring problems.
If you want to see the errors I’ve hit trying to write this blog series the issues are either in the blog posts or sometimes separately called out here in my Bugs That Bite blog where I try to tell people how to fix error messages they get and report bugs.
I explain why I wrote that blog rather than trying to report the issues directly in the first post on that blog. I don’t have a lot of time to interact with companies to help them fix their products or search for security bugs when no bug bounty exists. As a business owner, I need to get paid for my time — except the things I write for free on this blog which is getting the majority of my “donated” time at the moment. I’m just hoping someone who can fix the problems might run across it or someone who has a large account at AWS might point someone there to a blog post that explains the problem so it can be fixed.
Move the error messages closer to the user
Better yet, add parsing in front end tools like the CLI, Python, or whatever tool people happen to be using to deploy their CloudFormation to tell them what the problem is. Present an accurate error as close as possible to the point where the user makes the mistake.
Don’t accept the generic parsing errors from underlying libraries like JSON and YAML as good enough. The error messages should be specific to the structure and requirements of the particular resource being deployed on AWS. If the error message is due to an invalid policy document, explain why it is invalid:
AWS CloudFormation Policy Document Error Messages Could Be Nicer
No I don’t want to use a Cloud IDE. I just want to get the errors from whatever tool I’m using that tell me exactly what the problem is. I also don’t want to use some tool that overlays CloudFormation like the CDK. CloudFormation can be a work of art unto itself and you can write elegant code with proper separation of concerns directly without going through additional layers. I want the underlying error messages to be accurate, not to have to add some tool on top of it to get a better error message, please.
TLC for CloudFormation
I love CloudFormation. OK sometimes it’s a love-hate relationship. But I hope AWS will invest some additional time to fix the things I’ve mentioned in my posts. And by the way — other CloudPlatforms are not any better in my experience. I’m not picking on AWS by any means, because I’ve had worse problems with Azure most definitely, though Azure does some things well too. GCP error messages have also wasted hours of my life. I just happen to be working with AWS in this particular blog series.
I don’t have time to provide a lot of free support but maybe this will help someone. I write these blog posts both because I need the thing I’m building and to get people thinking about how to better secure their cloud systems and then maybe they will hire me for a consulting call through IANS or alternatively cloud security training, or a pentest through my own company.
I hope someone reads this who can give CloudFormation a little more TLC over at AWS. 🙂
Follow for updates.
Teri Radichel
If you liked this story please clap and follow:
Medium: Teri Radichel or Email List: Teri Radichel
Twitter: @teriradichel or @2ndSightLab
Requests services via LinkedIn: Teri Radichel or IANS Research
© 2nd Sight Lab 2022
All the posts in this series:
Automating Cybersecurity Metrics (ACM)GitHub – tradichel/SecurityMetricsAutomation
____________________________________________
Author:
Cybersecurity for Executives in the Age of Cloud on Amazon
Need Cloud Security Training? 2nd Sight Lab Cloud Security Training
Is your cloud secure? Hire 2nd Sight Lab for a penetration test or security assessment.
Have a Cybersecurity or Cloud Security Question? Ask Teri Radichel by scheduling a call with IANS Research.
Cybersecurity & Cloud Security Resources by Teri Radichel: Cybersecurity and Cloud security classes, articles, white papers, presentations, and podcasts
How to Fix CloudFormation was originally published in Cloud Security on Medium, where people are continuing the conversation by highlighting and responding to this story.