Mackenzie Jackson discusses secrets sprawl, the distribution of credentials in source code. The presentation covers finding and remediating this issue at scale.
if you've missed the intro hi everyone I'm Mackenzie I'm security Advocate and a developer advocate here at get guardian and in this talk on this track we're going to be talking about taming secret sprawl and we're going to be focusing a little bit on remediation but remediation at scale to be able to prevent new and also be able to prevent new leaks now I'm going to talk about why remediation is so hard but we'll get into just a little bit of light uh coverage if you're very familiar with the topic don't worry we're not going to spend too much time on the basics but just quickly what am I talking about when I talk about Secrets um so secrets of digital authentication credentials and typically these take the form as API keys or credential peers or security certificates but there's a really key distinction with how I use the word secrets and how you might think of a of a password for example now password is still a secret but when we're talking about it in software development we're talking about as secrets that I used programmatically this is a big distinction it means that it's machine to machine communication typically so you have an application that needs to connect to a database or a third-party service that's not you putting in that secret to connect that's your application made to be used programmatically why is that such an important distinction it's because these is this is the reason why Secrets typically end up inside places they shouldn't because the injected into Source codes typically so why do Secrets leak in code this is a I have a bit of a silly example here but if we think about our our secrets where do they want to be so we want to keep them in a vault or a secret manager um but they often end up inside the code now why is that well I want to give you a scenario here um it's on my screen the one of the most common places that we see Secrets is buried in our history so in this example we've got a git Branch so we're probably all familiar with Git Version Control and in this case you know a developer has been told hey I want you to work on a feed ship branch and I want you to build out this feature so what they've done is they've hard-coded in a secret just to get it quickly working right they know that they're not going to leave it in there for the final version but they just want to get things quickly they just want to get connected to the API to the server to the database and make sure everything's working fine later on they remove it later on they use an environment variable or however they're meant to deal with this once that goes to code review from the feature Branch back to the main branch that secret maybe a hundred commits in the history completely buried and no reviewer has the time to go back 100 commits to have a look at all the mistakes you made to get to this point that's what history is but that Secret's still there it's still in the history so this is why we often find Secrets inside our source control that no one really knows about but they get there in other ways too auto-generated files and logs if you do a debug log it will often print out your environment your environment may contain environmental variables which are typically secrets so if that log gets captured into your Giver repositories because you've done git add all then then that's going to be in your Version Control history and that's going to be really hard to get out just like in the first example what will save you from this sometimes is a DOT get ignore file but we see so many repositories that don't have this it's a simple file that say Hey ignore any log files ignore the environment variable file which was the number one file that contains secrets that we saw so this is something that will really save you uh in the long run we also see uh one of my favorite is the secrets.txt file uh we do see a lot of these and typically the scenario that we see these is that they've probably got some Secrets Management in place with some parameters around how to access that and the developer's gone oh this is so hard I don't have I don't want to deal with with the security team issuing me secrets I'm just going to save them in my secrets.txt and then git ad all ends up in your repository but even still you know they're going to be on your local machine they can be in template files so if you create a Django template they've recently changed this a little bit so it has example in there in this secret but it would have typically created a secret for that and unless you remove that and handle it properly if you Version Control that it's going to be in your source code and then the other one that we see is that we just see a lot of Secrets purposely being put into git repositories you're hidden behind the idea that this is private I need authentication so why can't I put it in there and I'll show you the answer to that question of why this is such a problem particularly in private repositories so we have our secrets in a vault or a secret manager like Doppler um who was just on the panel here so what's the typical path that happens okay developer needs a secret requests it through Vault gets the secret or from to the developer that ends up in our source code repositories through any of the various means that we have but you're saying this source code repository is private why is this a problem if it's private and the reason it's a private is because source code particularly through git will scroll everywhere it'll end up on your networks their SQL end up in your messaging systems you'll end up in your wikis it's on all your developers machines now that are working on that project because they've cloned it in it's in your drives and your backups and it's in your documentation now me as an adversary I'm looking to Target any one of these systems because I know I'm going to be able to find secrets to escalate my privileges so this is kind of really why it can be such a big problem and why it Scrolls and it doesn't matter that the the the source code repository in this case is private now we've already showed you some uh some results from a state of secret scroll uh in the keynote where we talked about how we scanned all the public commits on GitHub over %dscb% billion commits and found over %ntlss% million Secrets these are public secrets um so I don't want to talk too much more about the public side of things I want to go into private repositories and then I want to go into remediation of that so I want to take it away from that public sphere and go into it but before I do that you know just a couple of quick things that you may be interested in you know what are typically the secrets that we find in git repositories and they're probably different to what you think and they're pretty sensitive so for example 25 uh give access to data spaces or data storage and 20 percent so remember we found 10 million so 2 million were cloud provider Keys uh now we validate cloud provider keys so two million valid cloud provider Keys that's a pretty scary number that's out there in public but as I said private repositories are much worse private profit repositories are really where the treasure lives and this is why we've really seen a shift in adversaries recently in targeting the co-depositories of companies now they can do this through a number of different ways one of the ways that they do it is actually through supply chain attacks so they're targeting things so we saw this with circle CI and also code COV um that they're they're targeting these Supply chains to get access into the private repositories of companies why because they're travel treasure troves uh they contain lots and lots of that secret so I want to give you some examples of you know just how much Secrets uh are in there and and why and why that happens so the first thing I want to talk about is have private repositories are really not that private last year we saw a huge amount of companies that had their private Source codes linked or as I like to call it involuntarily open sourced uh so you know the lapsis group at the start of last year course absolute Havoc getting into Microsoft getting into Samsung getting into Nvidia and leaking out their source code publicly and it criticals a huge amount of problem so how did these Bad actors actually get access to these Source codes well less sophisticated than you might actually think so firstly lapses we're mostly just paying employees to get access to it they were either buying credentials off the dark web this is how they got into Uber later on in the year or they were paying employees of certain companies to just give them access to the net record to their private repositories so it's not this really sophisticated attacks that they were doing it's not necessary source code has sprawled everywhere you need minimal access to be able to get a foothold into it uh so really we've seen huge amounts of these source code leaks now I want to hone in on one and that's twitch so Twitch in October 2021 had all of their source code leaked out and this is going to be important when I talk about something particularly in remediation so let's take a look at what we saw in in twitch what actually happened there was a misconfiguration on the git server that allowed remote connection it allowed an adversary to copy to to remotely access all their git repositories and basically clone them and they then shared that on a torrent we got access to that source code and we scanned it and what we actually found was that there was over 6 000 secrets that were inside twitch's source code now bear in mind that there were six thousand repositories this was a huge code base it was three million documents so you might be thinking wow that's so bad six thousand Secrets you'd be wrong this is actually really good and what we typically see in an organization twitch has done really well but we still found 194 AWS credentials in their source code we still found 69 twilio keys and we still found 14 GitHub oauth keys so what do these will do these give me as an attacker ability to persist my attack and elevate my privileges into different areas so whilst twitch has obviously done fairly well in maintaining Secrets they've still got 6 000 Secrets exposed in their repositories and and this is just to show really the state that we're actually in um when it comes to sensitive information in our source code so why is it that we have so much sensitive information if we surely we can find them and remediate them so I want to take you on a bit of a journey now that probably may sound typical for anyone that's working in app set as to why this is so hard so on average if we take a typical company that has around about 400 developers now this is kind of a a smaller company but one that's kind of typical we have a lot of customers around this size we have a lot of customers a lot bigger but 400 is kind of like a typical company that we'll see now the industry standard is about one appsec engineer per 100 developers that's just the reality of the situation so 400 developers equals around about four appsec engineers so now let's say that this organization has said hey I I've listened to McKenzie's talk at code SEC days I want to know how many secrets that we actually have in our repositories and what you will typically find is there'll be about 13 000 secrets of a company this size uh inside private code repositories so this is a pretty big number much much higher than what we just saw in twitch so this is pretty typical um so you've discovered 13 000 secrets now about a thousand of these will be unique because once a secret enters in your repository it will clone itself but how do you know that the Secret's repeated how do you know that these thousand theocrits you've got 13 000 secrets that basically means you've got 13 000 incidents that you have to investigate doesn't matter that they're repeated because at this point you don't have that information so what does that work out for it works out to about 3250 Secrets per for for per appstick engineering so an app stick engineer has to investigate communicate with the developers try and find out what the secret does how critical it is it then they need to rotate that secret redeploy a new one they need to do all of that without creating any downtime and then perhaps they need to kind of clean up that history and make sure that Secret's out they have to do that 3250 times that is totally unmanageable so this is part of the reason why a lot of people just kind of ignore this problem because hey once you know you have 13 000 secrets in your source code you kind of have to do something about it but if you don't know like no harm no fell right uh so so head in the sand approach uh can be typical here so this is why the problem of remediation is so hard um and and it continues to get hard so first of all we have to catch them all right we don't want any false positives that's really that's really difficult because detection is a challenge secrets are probabilistic so they typically are higher entropy strings but not all high entropy strings are secrets so we have to catch them all that's a very complex challenge they're everywhere they're in your source code when you build pipelines or in your slack conversations your jira tickets every week we discuss that at the start the debt keeps growing so you've got 13 000 Secrets now but next week you might have more so that's exhausting and here we go again right rinse and repeat we're rotating we're replacing and in that secret it's going to get back back into your source code and you're gonna have to deal with it all again so it's really a hard challenge but what's the hardest part about this now a lot of people kind of think that Secrets detection the detection part is the biggest challenge and it is a challenge getting high quality secret detection with low false fossils is is big of a challenge but it's not the biggest challenge remediation is absolutely the biggest challenge because this is where we have the smallest amount of teams and this is where I kind of want to bring you up to speed on the Gagarin product and and what we've been doing because I want to introduce you to the methodology and mindset that we use to be able to do this because in this company that I just discussed yes you have 400 Developers for appsec engineers but if you actively Implement devsecops then you have 400 developers that can potentially help you remediate this problem you have 400 people your teams expanded massively and you could actually get on top of it now you need tools to be able to trust these developers and ensure that it happens but this is the methodology that git Guardian approaches in trying to bring everyone together to actually get on top of this because it doesn't matter how many tools you introduce into it if you have four app stick Engineers dealing with this on their own this is going to be absolute nightmare um so I have these pretty slides here with our product that look really nice but I think what I'm going to do is I'm going to go Rogue I hope that's okay if I go Rogue for you for a minute because I want to get off the slideshows and I just want to show you what the product actually looks like and how we can actually help uh solve this product problem because this isn't just about the product it's also about the the mindset shift of how we deal with it so you'll notice on my screen here I'm just in my perimeter I'm in a fake organization so if you see any secrets here then uh don't worry they're fake um but we have all these repositories connected and what we're going to do is let's say we've just rolled out and we've done a historical scan and now all of a sudden I have here Secrets 2774 secrets so this is this is the point where a lot of people are confronted with they get to this point and they go oh my gosh um almost let a swear word slip oh my gosh uh we have how do we actually deal with that so there's a couple of ways we can deal with it but I want to get on to some of the magic soup and so the first one that we can do is we actually do validility checks so we can see hey are these secrets valid hey that's a good place to start let's get into valid Secrets or perhaps ones that we we can't that we can't see are valid and then we can also say hey how many of these secrets are actually exposed publicly and so now we see the types of secrets that we have that are valid and exposed publicly and we can add lots of other filters as well um that relate to that but then you'll see here that we have a lot of uh secrets that have the severity as high and every just get valid ones we'll see we have a few secrets that are just critical so this might lead you to the question how does this critical severity actually get assigned and this is one of the really cool and important parts about it is that we can automatically assign this based on your own rules so I'm just going into the secret detection area and we have severity rules let us know how much coverage we have so in this example organization 81 of the secrets that we found have been able to be assigned a severity and we can set rules for this so valid database credentials obviously very critical uh valid Secrets related to high sensitives detectors obviously very critical so this actually helps us immediately help you on your mediation process because now instead of organizing by all these different things I can just say hey show me all the critical uh all the critical secrets that we have let's start here now we're let's go into one of these secrets and we have here the secret that we have we're able to see it I want to stress again that this is fake so anyone trying to take a picture of their screen to do some crypto mining later it's going to be disappointed um but we have lots of issues here uh that we can see but as I said there is that we've still only got four app stick Engineers all right all right we've helped you prioritize the incident but you still have to get through all of them so how do we actually go about reducing this burden and this load and this is where it starts getting really cool is that we can actually start getting the developer involved so incident sharing is something very cool where we can share this and what that means is that we know who the developer is that's committed this uh in in nearly all cases so what happens as a developer from the organization leaks a credential we can immediately start getting information from them so we can send them a survey which will look like this uh once it loads up uh which will ask them a few questions about what the secret actually is and then they can answer that here um you can already see that the remarks have been made here uh and this is displayed in in the dashboard that we can see the remarks here now what's really interesting about this point is that the developers can actually remediate this them themselves let's say that they know what the secret does they have access to the AWS account that issued it so they're going to mark it as invalid now what's really cool is that you'll see that we know that this secret is valid and we'll keep checking to see if it's valid so once the secret is marked as invalid and if we can see that it's been removed from the timeline then the developer can mark this as resolve now the incident will still be there so you can investigate it but you can actually start getting through the the incidents to themselves so now you can involve the developers into the security process you can trust them to remediate Secrets because we can verify that the secret has been remediated to the standard that you would like so we can also check to see if it's in the get history still for example so if you want to get it removed so that we have these processes of checks so now that you can actually trust your developers to go about this remediation process now I could spend all day on the git Guardian platform but this talk I just wanted to really talk and focus on remediation and how the platform really helps in that critical part about it uh so I'll skip through the slide this is the the workflow that we had that I just showed you about getting the developers involved to help them remediate the incidents and the great thing about get Guardian is that through single sign-on SSO then you can also actually give them access to the dashboard where they can see all their incidents but only their incidents and they can actually be a part of that so it's really about taking a proactive approach in devsecops this this is really thrown out there all the time devsecops devsecops as a buzzword but the actual reality what is devsecops look like in reality it looks like developers playing an active role in Security on their responsibility and that that can be verified by the correct people so that's what it looks like and that's how we start to get control of this uh of these programs so it allows you to have that centralized control but also have dedicated role and team leaders they can also look at that so that role base that are back so you have your centralized appsec team they have visibility over everything and then you have your small uh smaller perimeters that help you navigate between your teams and give different permissions so that different people can see the incidents that affect them that they can control and this goes all the way down from the appsec team to the developers so that we can really accelerate that trust so I want to talk about one other thing now I've got about five minutes left but there's gonna be just enough time to talk about prevention uh because all right we've talked about the remediation process and how the developers can get part of it but what we haven't done is actually stop the bleeding so yes better education getting them involved will help them be more aware but you're still going to have secrets coming into that so I wanted to talk about how we can actually get them more involved to be empowered to stop leaking secrets and this is we've got some great tools that can really help you to do this so I don't know if anyone here has tried to get developers to use security tools like sometimes it works fantastic and sometimes they just don't want to leave their environments so if you are going to give them security tools it's my recommendation that they fit in to their workflow right let the tools fit into what they regularly used to so that they can actually use it and guardian has done a great job at integrating um secret detection into the clcd to this clis the command line interfaces that developers are used to working so right now what we've just talked about is this we don't have guard rails on we have our remote environment and you'll see the little git Guardian owl on the screen here right here this is connected to our git server and this is where we're running secret detection but we we haven't got anything happening on the developers machine we don't have any secret detection actively happening on the developers team but if we start putting Secrets detection in other areas then we can drastically reduce the work of that appsec team because we can reduce the secrets that are entering into there so again I want to go a little bit Rogue um I apologize to my team for doing this but if you're used to some developer environments this may look uh familiar to you kind of make it a bit bigger so what I have here is just a regular a regular environment and I have a DOT EnV file so if you notice in the polls.env file is the number one file that was that leaked secrets so I'm sticking with with uh with that Trend and we have here an AWS access key um so now let's say in the example that I've done git add all uh to this so I've accidentally captured this environment variable file uh into it and I'm going about my my work row and I try and commit this into it so if we run a commit here what's going to happen is we're going to be blocked and this is because git Guardian is running on the CLI and it's detected at AWS key it's detected that it's valid again it's a dummy credential it's a honey token you're going to be disappointed if you want to do crypto mining later but it'll also give the developer instructions of how to remediate it what's the key about this hey we're still in the developers workflow we haven't left but what we have done is actively prevent this secret from entering into the remote repository so this is all about enabling developers in that devsecops um so that we can actually start to get on top of this really unattainable problem this this problem seems so big at the start uh but it can but it can be solved using the correct collaboration and really embracing devsecops