1. Planning
The planning phase is key as it answers essential questions and this is when most of the architectural decisions are made. Some of these decisions are relative to access control, network infrastructure and data security. For each of them adding automation and therefore security allows fast replication.
- Access control: you can leverage Infrastructure as Code (IaC) tools (Terraform for example) in which you can easily define different groups, roles, and permissions with code. Adding various plugins, you can even use your IaC tool to manage your users on other platforms, as in GitHub. When you have all the groups and roles and permissions defined as code, you are only one commit away from access control updates. If you need to onboard a new team member not only to a specific AWS group but also to a GitHub group, instead of doing it manually in two places, you can do it “as code” with minimum changes and apply. The complexity of organizations and the need for better security imply more role-based access controls, but automation tools exist and they are also less error-prone.
- Network infrastructure: you will be able to use IaC but you will also need to add a security layer on top of it, as like everything that is coded, it can contain errors. You can use tfsec and terrascan on Terraform. They scan your code statically, and based on predefined rules, they can generate alerts and give you advice on improving your security. If you prefer CloudFormation, use cfn-lint which even allows you to define your own security rules.
- Data: when you have your infrastructure ready, the next big thing to think about is your data. You need to put data encryption in place to make sure data is securely stored at rest. You also need to add a transport layer security when you are talking with the databases to secure the traffic in transit. Besides the previously mentioned static code analysis tools and ready-to-use services, you can even automate your own pipeline. This is very useful when you have multiple agile teams. If a policy is violated, fix it right away with code, then notify the owner automatically - this is the DevSecOps way.
---
2. Development
During the development phase, you need to address three major topics related to security: Code repository security, Continuous Integration (CI) security and container image security.
- Code repository security needs to address questions around the repository access control, considering the principle of least privilege but also checking that no repository is made public by accident or that they do not contain sensitive elements such as secrets. With multi-team, agile, microservice setup you need automation to skip time-consuming and error-prone tasks but also to avoid your SDLC to really slow down. We already mentioned git user management automation, there are also a number of tools for static code analysis and secrets detection. You can even include this detection in your CI pipeline and integrate it with your alerting workflow to eliminate the risk from the origin.
- CI security can be overlooked but this is a very important element as if your CI is not set up correctly it can generate data leaks. Access to your CI needs to be controlled to give the proper permissions and access rights. Doing this manually will definitely impact your SDLC velocity, hence again the need to automate. You can integrate your CI with your Git or company SSO for authentication. For authorization, you may rely on the groups' information provided by those SSO services. Secrets are also a security topic for CIs. Modern CIs allow to store secrets at a project level. Once configured, you can use it directly in the build without putting the sensitive data in clear text format. Even if you try to print the variable containing sensitive data, it will not show up. For instance, to use secrets directly from a Jenkins pipeline, you can store your secrets in a Kubernetes cluster and use Kubernetes Credentials Provider or HashiCorp Vault Plugin.
- Moving now to containers’ security. While containers have simplified the deployment, scaling, and failover, they bring challenges. If the image has some known vulnerabilities, your containers could be exploited, and the integrity of the whole machine or even the entire system can be compromised. Automation is again the go-to solution to ensure the images you built don't contain vulnerabilities. Some CLI-based tools like Trivy can be easily integrated to your CI pipelines. Automation will significantly reduce your operational overhead. No more managing machines (be it physical or virtual), no more configuration management, no more patching, everything automated, less toil - all means your SDLC is not only more secure but also faster.
---
3. Integration and deployment
Kubernetes (container orchestration) is the current solution to streamline integration and deployment in our agile world. But Kubernetes is a complex system and comes with its security challenges, the most important one being secrets.
The worst thing you could do is define secrets in YAML files and store them in git repos. You could use Kubernetes’ secrets as the single source of truth, but then what about secrets created by people who don't need access to Kubernetes? This is where a secrets manager is a good choice: it acts as a single source of truth to store your sensitive data. You will get an automated process that eliminates the toil of manual labor but also guarantees security and rapidity. Learn more about K8s security in our dedicated article.
---
4. Maintenance
Incident response playbooks are a crucial component of the DevSecOps. Playbooks are a set of generalized and summarized processes to have a consistent way of handling issues for quicker responding and resolving. Learning from every incident is also part of the playbook, whether it's a security issue or a new vulnerability.
The content can include everything from runbooks and checklists to templates, training exercises, security attack scenarios, and simulation drills. The goal is simple: having a set of policies, processes, and practices for quickly responding to and resolving unplanned outages, thus helping teams fix online issues quicker and reduce the size of the circle of impact.