Giovanni Salvador
Apr 6, 20235 min
Updated: Jun 28, 2023
Azure Kubernetes, AKS, PCI-DSS, Checklist
Disable administrator access on the container registry.
Ensure that jump boxes and build agents follow user management procedures, removing needed system users.
Do not generate or provide SSH key access to nodes to administrator user. If emergency access is necessary, use the Azure recovery process to get just-in-time access.
Azure responsibilities: Azure Active Directory has password policies that are enforced on the new passwords supplied by users. If you change a password, validation of older password is required. Administrator reset passwords are required to be changed upon subsequent login.
Don’t connect wireless network to the cloud.
Don't run kubectl commands straight from engineers laptops.
Don't connect to databases with PCI data from engineers' laptops.
Implement the recommendations in the Azure security benchmark. It provides a single, consolidated view of Azure security recommendations, covering industry frameworks such as PCI-DSS 3.2.1, and others.
Use Microsoft Defender for Cloud features and Azure Policy to help track against the standards.
Build additional automated checks in Azure Policy and Azure Tenant Security Solution (AzTS).
Document the desired configuration state of all components in the CDE, especially for AKS nodes, jump box, build agents, and other components that interact with the cluster.
Azure responsibilities: Azure provides security configuration standards that are consistent with industry-accepted hardening standards. The standards are reviewed at least annually.
Deploy in-scope and out-of-scope components in separate clusters.
Or
Use segmentation strategies, by having separate node pools, to maintain the separation.
Use of Kubernetes taints, in-scope and out-of-scope pods are deployed to separate node pools.
Ensure that in-scope and out-of-scope pods never share a node VM.
One instance of a container is responsible for only one function in the system.
Workload use pod-managed identity. It must not inherit any cluster-level or node-level identity.
Use external storage instead of on-node (in-cluster) storage where possible.
And
Keep cluster pods reserved exclusively for work that must be performed as part of the operation of card holder data processing.
Move the out-of-scope operations outside the cluster. This guidance applies to build agents, unrelated workloads, or activities such as having a jump box inside the cluster.
Features were reviewed and the implications documented before enabling them.
Check default settings to identify features that are not necessary or are overly permissive. An example of this is the ciphers in the default SSL policy for Azure Application Gateway. Check if the policy is overly permissive. The recommendation is to create a custom policy by selecting only the ciphers you need.
For components where you have complete control, remove all unnecessary system services from the images (for example jump boxes and build agents).
For components where you only have visibility, such as AKS nodes, document what Azure installs on the nodes. Consider using DaemonSets to provide any additional auditing necessary for these cloud-controlled components.
Application Gateway has integrated WAF enabled, and negotiates the TLS handshake for the request sent to its public endpoint, allowing only secure ciphers. The reference implementation only supports TLS 1.2 and approved ciphers.
Document exceptions and monitor if a weaker protocol is used beyond the legacy device. Disable that protocol immediately after that legacy interaction is discontinued.
Application Gateway must not respond to requests on port 80.
Do not perform redirects at the application level. This reference implementation has an NSG rule on that blocks port 80 traffic. The rule is on the subnet with Application Gateway.
Document exceptions for workload security compliance profiles or other controls (for example, limits and quotas). You must monitor to ensure that only expected exception functionality is performed.
All Azure services used in the architecture must follow the recommendations provided by Azure security benchmark.
Compare your configuration against the baseline implementation for that service. For more information about the security baselines, see Security baselines for Azure.
Use the Open Policy Agent admission controller, which works in conjunction with Azure Policy, to detect and prevent misconfigurations on the cluster.
Use Azure Policy to restrict the type and configuration of Azure resources. Example: create a policy that denies the creation of AKS clusters that aren't private.
Process created for periodically reviewing all exceptions documented. Also document the review itself, who-when-where-why.
Azure responsibilities: Azure ensures that only authorised personnel are able to configure Azure platform security controls, by using multi-factor access controls and a documented business need.
Don't install software on jump boxes or build agents that don't participate in the processing of a transaction or provide observability for compliance requirements, such as security agents. This recommendation also applies to the cluster entities, such as DaemonSet and pods.
Make sure all installations are detected and logged.
All administrative access to the cluster is be done by using the console.
The cluster's control plane is not exposed.
Azure responsibilities: Azure ensures the use of strong cryptography is enforced when accessing the hypervisor infrastructure. It ensures that customers using the Microsoft Azure Management Portal are able to access their service/IaaS consoles with strong cryptography.
Ensure that all Azure resources used in the architecture are tagged properly to aid data classification, and indicate whether the service is in-scope or out-of-scope.
Tag resources meticulously to enable querying for resources, keep an inventory, help track costs, and set alerts.
Maintain a snapshot of the documentation periodically.
Avoid tagging in-scope or out-of-scope resources at a granular level, as out-of-scope resources might become in-scope as the solution evolves. Instead, tag at a higher level, at the subscription and/or cluster level.
Tag in-cluster objects by applying Kubernetes labels to organise objects, select a collection of objects, and report inventory.
Thorough documentation about the processes and policies are in place.
Personnel are be trained in the security features and configuration settings of each Azure resource.
People operating regulated environments are be educated, informed, and incentivised to support the security assurances.
Treat AKS nodes as a dedicated host for the workload. Azure provides security assurances for any hosted environment components that are shared. It's highly recommended that you treat your AKS nodes as a dedicated host for this workload. That is, all compute should be in a single tenant model and not shared with other workloads you may operate.
Don’t share compute with other workloads you may operate.
Consider using Azure Dedicated Hosts to provide complete compute isolation at the Azure infrastructure level, if desired. Understand the significant cost and capacity planning impact of using Azure Dedicated Hosts before making this architectural choice.