Kubernetes & Cloud Security Hardening Guide

1. Kubernetes Hardening

1.1 RBAC — Least Privilege Access

Role-Based Access Control is your first line of defense inside a cluster. Most misconfigurations come from over-permissive bindings created during initial setup and never reviewed.

Most common finding: ClusterRoleBindings with cluster-admin attached to service accounts used by applications. This gives the workload full control over the entire cluster.

Audit all ClusterRoleBindings: Run kubectl get clusterrolebindings -o wide and review anything bound to cluster-admin. Every binding should have a documented justification.
Use namespace-scoped Roles instead of ClusterRoles wherever the workload doesn't need cluster-wide access.
Avoid wildcard permissions: verbs: ["*"] or resources: ["*"] in Role specs should be treated as a finding.
Never bind service accounts to cluster-admin unless you have a documented and reviewed reason. Operators (like ArgoCD) often need elevated access — scope it to what they actually use.
Disable automounting of service account tokens for pods that don't need Kubernetes API access: automountServiceAccountToken: false

1.2 Pod Security Standards

Kubernetes replaced PodSecurityPolicy (removed in 1.25) with Pod Security Standards (PSS) enforced via the Pod Security Admission controller. Many clusters still have no policy enforced.

PSS Level	What it prevents	Recommended for
Restricted	Privileged containers, host namespaces, unsafe sysctls, root users	All application workloads
Baseline	Most known privilege escalation paths	Operators and system components
Privileged	Nothing — unrestricted	Only trusted system namespaces (kube-system)

Label namespaces with pod-security.kubernetes.io/enforce: restricted for application workloads.
Use enforce: warn first to identify workloads that need remediation before switching to enforce.
Run containers as non-root: securityContext.runAsNonRoot: true and set an explicit runAsUser.
Drop all capabilities and add back only what's needed: capabilities: { drop: ["ALL"], add: ["NET_BIND_SERVICE"] }
Set readOnlyRootFilesystem: true wherever the app supports it.
Disable privilege escalation: allowPrivilegeEscalation: false

1.3 Network Policies

By default, all pods in a Kubernetes cluster can communicate with all other pods — including across namespaces. Without NetworkPolicies, a compromised pod can reach your database, your secrets store, and your other services without restriction.

Baseline rule: Every namespace should have a default-deny-all ingress and egress policy. Then explicitly open only what's needed.

Start with a default deny-all policy in every namespace that hosts workloads.
Allow only required ingress — e.g., only the frontend can reach the backend, only the backend can reach the database.
Restrict egress: workloads should not be able to reach arbitrary external IPs unless required.
Use namespace selectors in addition to pod selectors to prevent cross-namespace lateral movement.
If using Cilium, leverage CiliumNetworkPolicy for L7-aware policies (HTTP method, path, DNS).

1.4 Secrets Management

Kubernetes Secrets are base64-encoded, not encrypted — and are often stored in etcd unencrypted. Hardcoded secrets in manifests committed to Git are one of the most common findings in assessments.

Never commit secrets to Git. Use a secrets management system — Doppler, Vault, AWS Secrets Manager, or GCP Secret Manager — and inject at runtime.
Enable encryption at rest for etcd. Configure EncryptionConfiguration with AES-GCM or KMS provider.
Use the External Secrets Operator (ESO) to sync secrets from your external store into Kubernetes Secrets automatically.
Restrict access to Secrets via RBAC — get, list, and watch on Secrets are often granted too broadly.
Audit your Git history for accidentally committed secrets. Tools like trufflehog and gitleaks can scan for them.

1.5 API Server Hardening

Disable anonymous authentication: --anonymous-auth=false
Enable audit logging: configure --audit-log-path and a policy file that captures reads and writes to sensitive resources.
Restrict access to the API server by network — it should not be publicly reachable from the internet.
Use --authorization-mode=Node,RBAC — never AlwaysAllow.
Disable the insecure port: --insecure-port=0
Set --profiling=false to disable API server profiling endpoints.

1.6 Image Security

Use specific image tags or SHA digests — never :latest. Floating tags can pull different code between deployments.
Scan images for vulnerabilities in CI before deployment. Trivy and Grype are both free and effective.
Use a private registry with access controls — don't pull directly from public registries in production.
Implement an admission controller (OPA/Gatekeeper or Kyverno) to enforce image policies — e.g., only images from your trusted registry are allowed.
Keep base images minimal. Distroless or Alpine-based images have a significantly smaller attack surface than Ubuntu or Debian full images.

2. Cloud Configuration

2.1 IAM — Least Privilege

Overly permissive IAM is the single most common cloud finding. The combination of AdministratorAccess attached to service accounts and long-lived access keys is the leading cause of cloud breaches.

Key principle: Every identity (human or service) should have only the permissions it needs to do its job — nothing more. Review this quarterly.

Eliminate long-lived access keys for services running in cloud environments — use IAM roles for EC2/GKE Workload Identity/Azure Managed Identity instead.
Never use root/owner accounts for day-to-day operations. Create a separate admin account with MFA enforced.
Enable MFA for all human IAM users, especially those with write or admin access.
Audit and remove unused IAM users, roles, and access keys. AWS IAM Access Analyzer and GCP Policy Analyzer can help.
Use permission boundaries in AWS to limit the maximum permissions a role can grant, even if the policy is misconfigured.
Prefer managed identity over service account keys in GCP — keys can be exported and exfiltrated; managed identities cannot.

2.2 Storage Security

Exposed S3 buckets and GCS buckets have been responsible for some of the largest data breaches on record. "Public" access is often set unintentionally during development and never removed.

Block all public access at the account level in AWS (S3 Block Public Access settings). Enable this at the organization level in AWS Organizations.
Enable bucket versioning and object lock for buckets containing critical data.
Enforce server-side encryption for all buckets — at minimum SSE-S3, preferably SSE-KMS with a customer-managed key.
Review bucket policies and ACLs regularly. Any policy that contains "Principal": "*" should be treated as a critical finding.
Enable access logging on buckets that contain sensitive data — CloudTrail data events for S3 in AWS.

2.3 Network Security

No security group or firewall rule should allow inbound access from 0.0.0.0/0 except for ports 80 and 443 on public-facing load balancers.
SSH (port 22) and RDP (port 3389) should never be open to the internet. Use a bastion host, Session Manager (AWS), or Identity-Aware Proxy (GCP) instead.
Enforce VPC flow logs in AWS and VPC flow logs in GCP for all production networks.
Use private endpoints / Private Service Connect to access cloud services without traversing the public internet.
Segment workloads by VPC and use VPC peering or Transit Gateway with explicit route control — don't put everything in one flat network.

2.4 Logging and Monitoring

You can't detect what you don't log. The most critical logs are often disabled by default and have cost implications — but the cost of not having them during an incident is much higher.

Enable CloudTrail (AWS) or Cloud Audit Logs (GCP) for management and data events in all regions and all accounts.
Set CloudTrail log file validation to detect tampering.
Configure alerts on high-risk events: root account login, policy changes, security group modifications, large data exports.
Enable GuardDuty (AWS) or Security Command Center (GCP) — these are the fastest wins for threat detection with minimal configuration.
Centralize logs in an account or project that developers cannot modify — an attacker with write access to the account they compromised can delete their own trail.

2.5 Kubernetes on Cloud (EKS / GKE / AKS)

Enable Workload Identity (GKE) or IRSA / Pod Identity (EKS) to give pods cloud credentials — never mount service account key files into pods.
Use private clusters: API server endpoint not publicly accessible, nodes on private subnets.
Enable managed node upgrades and stay within two minor versions of the current Kubernetes release. EOL Kubernetes versions stop receiving security patches.
Enable network policy enforcement at the CNI level (Calico, Cilium, or the cloud provider's native network policy).
Review the cloud provider's security benchmarks — GKE Security Posture, EKS Best Practices Guide, and AKS security baseline all have specific guidance for their platform.

3. Compliance Readiness

3.1 SOC 2 — What engineering teams actually need to do

SOC 2 is the most common compliance requirement for B2B SaaS companies. The Trust Service Criteria (TSC) map to specific technical controls your engineering team owns.

TSC Category	What you need	Common gap
Logical Access	MFA everywhere, access reviews, role-based access	Shared accounts, no offboarding process
Change Management	Code review process, deployment approvals, audit trail	Direct pushes to main, no PR required
Incident Response	Documented IR plan, on-call rotation, post-mortems	No documented plan, no runbooks
Monitoring	Alerting on security events, log retention 90+ days	No centralized logging, no alerts on privilege changes
Availability	SLOs defined, backup and recovery tested	Backups never tested for restore

3.2 CIS Kubernetes Benchmark — Priority Controls

The CIS Kubernetes Benchmark has 100+ controls. The ones below have the highest risk-to-effort ratio and should be your first pass.

CRITICAL — Enable RBAC and disable ABAC (--authorization-mode must include RBAC)
CRITICAL — Disable anonymous authentication on API server and kubelet
CRITICAL — Enable etcd encryption at rest
HIGH — Enable audit logging with a policy that captures sensitive resource access
HIGH — Configure NetworkPolicies in all workload namespaces
HIGH — Enforce Pod Security Standards at the namespace level
MED — Disable service account token auto-mounting for pods that don't need it
MED — Set resource requests and limits on all containers
LOW — Enable node restriction admission plugin

3.3 Secrets and Data Classification

Classify the data your application stores — not all data requires the same controls. PII, PHI, financial data, and credentials need stricter handling than logging data.
Document your data flows: where does sensitive data enter, where does it go, who can access it, and how is it deleted? Auditors will ask this.
Implement a secrets rotation policy. Credentials should have a maximum lifetime and be rotated automatically where possible.
Never log sensitive data. Review your application logs for accidental PII leakage — full request bodies, auth headers, and error messages are common offenders.

4. Where to Start — Priority Action List

If you're doing a first pass on security hardening, this is the order of operations that gives you the most risk reduction for the least effort.

Audit RBAC bindings — find and remove any cluster-admin bindings that don't have a clear justification.
Enable MFA for all cloud IAM users and your Kubernetes API access (if using SSO/OIDC).
Rotate and remove long-lived credentials — access keys, service account key files, hardcoded tokens.
Enable cloud audit logging — CloudTrail, Cloud Audit Logs, or Azure Monitor. Don't let this stay off.
Block public access on storage — S3 Block Public Access at the account level, GCS uniform bucket-level access.
Review security group / firewall rules — remove any rule allowing inbound 0.0.0.0/0 on non-web ports.
Apply Pod Security Standards — start in warn mode, fix issues, then switch to enforce.
Add NetworkPolicies — default-deny in workload namespaces, then open only required traffic.
Scan your container images — integrate Trivy or Grype into CI and fail the build on critical findings.
Set up GuardDuty or Security Command Center — one-click enable, immediate threat detection coverage.

Note: This guide covers the highest-impact controls but isn't exhaustive. Production security requires ongoing review, not a one-time pass. If you want a professional set of eyes on your specific setup, the assessment is a good starting point.

Kubernetes & Cloud Security Hardening Guide

1. Kubernetes Hardening

1.1 RBAC — Least Privilege Access

1.2 Pod Security Standards

1.3 Network Policies

1.4 Secrets Management

1.5 API Server Hardening

1.6 Image Security

2. Cloud Configuration

2.1 IAM — Least Privilege

2.2 Storage Security

2.3 Network Security

2.4 Logging and Monitoring

2.5 Kubernetes on Cloud (EKS / GKE / AKS)

3. Compliance Readiness

3.1 SOC 2 — What engineering teams actually need to do

3.2 CIS Kubernetes Benchmark — Priority Controls

3.3 Secrets and Data Classification

4. Where to Start — Priority Action List

Want a professional review of your specific setup?