Securing AKS Workloads with Azure Key Vault CSI Driver: Beyond Environment Variables
Deep technical guide to implementing Azure Key Vault Secrets Store CSI Driver in AKS for secure secret management, including workload identity integration, rotation strategies, and production patterns.
Let me be honest: managing secrets in Kubernetes has always been a pain point. Sure, Kubernetes Secrets work as a native mechanism, but they’re just base64-encoded (not actually encrypted at rest by default) and lack the enterprise-grade features you’d expect—things like automatic rotation, audit logging, and centralized management. Azure Key Vault CSI Driver solves these problems by mounting secrets directly from Azure Key Vault into pods as volumes.
In this article, I’ll walk you through a production-grade implementation based on patterns we’ve deployed across multiple enterprise environments. We’ll cover workload identity integration, rotation strategies, and the operational lessons I’ve learned along the way.
The Problem with Traditional Kubernetes Secrets
Before we dive into the solution, let’s talk about why traditional approaches fall short:
# Anti-pattern: Hardcoded secrets in manifests
apiVersion: v1
kind: Secret
metadata:
name: db-credentials
type: Opaque
data:
username: YWRtaW4= # base64, not encrypted
password: cGFzc3dvcmQxMjM=
Issues:
- Secrets stored in Git (even if base64 encoded)
- No rotation mechanism
- No audit trail of secret access
- Encrypted at rest only if etcd encryption enabled
- Difficult to manage across multiple clusters
Azure Key Vault CSI Driver Architecture
The CSI driver enables pods to mount secrets from Azure Key Vault as volumes:
graph TB
subgraph AKS["AKS Cluster"]
Pod["Application Pod"]
SPC["SecretProviderClass
(CRD)"]
CSI["CSI Driver
DaemonSet"]
Pod -->|"1. Mounts volume"| SPC
SPC -->|"2. Defines secrets"| CSI
Pod -.->|"3. Volume mount"| CSI
end
CSI -->|"4. Workload Identity
OIDC Token"| AAD["Azure AD"]
AAD -->|"5. Validates identity"| KV["Azure Key Vault"]
KV -->|"6. Returns secrets"| CSI
CSI -.->|"7. Mounts as files"| Pod
subgraph KV_Contents["Key Vault Contents"]
Secrets["Secrets"]
Certs["Certificates"]
Keys["Keys"]
end
KV --- KV_Contents
style AKS fill:#e1f5ff
style KV fill:#d4edda
style AAD fill:#fff3cd
style Pod fill:#f8d7da
Prerequisites and Installation
Enable CSI Driver on AKS Cluster
# New cluster with CSI driver and workload identity
az aks create \
--resource-group production-rg \
--name prod-aks-cluster \
--kubernetes-version 1.28.5 \
--node-count 3 \
--enable-addons azure-keyvault-secrets-provider \
--enable-oidc-issuer \
--enable-workload-identity \
--generate-ssh-keys
# Existing cluster - enable CSI driver
az aks enable-addons \
--resource-group production-rg \
--name prod-aks-cluster \
--addons azure-keyvault-secrets-provider
# Enable workload identity if not already enabled
az aks update \
--resource-group production-rg \
--name prod-aks-cluster \
--enable-oidc-issuer \
--enable-workload-identity
Terraform Implementation
resource "azurerm_kubernetes_cluster" "aks" {
name = "prod-aks-cluster"
location = azurerm_resource_group.aks.location
resource_group_name = azurerm_resource_group.aks.name
kubernetes_version = "1.28.5"
dns_prefix = "prod-aks"
default_node_pool {
name = "default"
node_count = 3
vm_size = "Standard_D4s_v5"
}
identity {
type = "SystemAssigned"
}
# Enable Key Vault CSI driver
key_vault_secrets_provider {
secret_rotation_enabled = true
secret_rotation_interval = "2m"
}
# Enable workload identity
oidc_issuer_enabled = true
workload_identity_enabled = true
network_profile {
network_plugin = "azure"
network_policy = "calico"
}
}
# Get OIDC issuer URL for workload identity federation
output "oidc_issuer_url" {
value = azurerm_kubernetes_cluster.aks.oidc_issuer_url
}
Verify Installation
# Check CSI driver pods
kubectl get pods -n kube-system -l app=secrets-store-csi-driver
# Check provider pods
kubectl get pods -n kube-system -l app=csi-secrets-store-provider-azure
# Verify workload identity mutating webhook
kubectl get mutatingwebhookconfigurations | grep azure-wi
Workload Identity Configuration
Workload Identity is the recommended authentication method (replaces deprecated pod-managed identity).
Authentication Methods Comparison
| Method | Security | Complexity | Maintenance | Recommendation |
|---|---|---|---|---|
| Workload Identity | ✅ High (OIDC federation) | Medium | Low (automatic token rotation) | ✅ Recommended |
| Pod-Managed Identity | Medium (Azure metadata) | Medium | Medium | ⚠️ Deprecated (use WI instead) |
| Service Principal | Low (static credentials) | Low | High (manual rotation) | ❌ Not recommended |
| Access Policies | Medium | Low | Medium | ⚠️ Legacy (use RBAC instead) |
Why Workload Identity?
I can’t stress this enough: Workload Identity is the way to go. Here’s why:
- No secrets stored in cluster
- Automatic token rotation (1 hour default)
- Fine-grained RBAC permissions
- Supports multiple service accounts
- Azure-native OIDC integration
Step 1: Create Azure Key Vault
# Create Key Vault
az keyvault create \
--name prod-app-kv \
--resource-group production-rg \
--location eastus \
--enable-rbac-authorization true
# Add secrets
az keyvault secret set \
--vault-name prod-app-kv \
--name database-password \
--value "SuperSecretPassword123!"
az keyvault secret set \
--vault-name prod-app-kv \
--name api-key \
--value "sk-proj-abc123xyz789"
Step 2: Create Managed Identity and Federate with Service Account
# Create user-assigned managed identity
az identity create \
--resource-group production-rg \
--name aks-workload-identity
# Get identity details
IDENTITY_CLIENT_ID=$(az identity show \
--resource-group production-rg \
--name aks-workload-identity \
--query clientId -o tsv)
IDENTITY_PRINCIPAL_ID=$(az identity show \
--resource-group production-rg \
--name aks-workload-identity \
--query principalId -o tsv)
# Grant Key Vault permissions
az role assignment create \
--role "Key Vault Secrets User" \
--assignee $IDENTITY_PRINCIPAL_ID \
--scope /subscriptions/{subscription-id}/resourceGroups/production-rg/providers/Microsoft.KeyVault/vaults/prod-app-kv
# Get AKS OIDC issuer
OIDC_ISSUER=$(az aks show \
--resource-group production-rg \
--name prod-aks-cluster \
--query oidcIssuerProfile.issuerUrl -o tsv)
# Create federated credential
az identity federated-credential create \
--name aks-federated-credential \
--identity-name aks-workload-identity \
--resource-group production-rg \
--issuer $OIDC_ISSUER \
--subject system:serviceaccount:production:app-service-account
Terraform Implementation
# User-assigned managed identity
resource "azurerm_user_assigned_identity" "aks_workload" {
name = "aks-workload-identity"
resource_group_name = azurerm_resource_group.aks.name
location = azurerm_resource_group.aks.location
}
# Key Vault
resource "azurerm_key_vault" "app" {
name = "prod-app-kv"
location = azurerm_resource_group.aks.location
resource_group_name = azurerm_resource_group.aks.name
tenant_id = data.azurerm_client_config.current.tenant_id
sku_name = "standard"
enable_rbac_authorization = true
purge_protection_enabled = true
network_acls {
default_action = "Deny"
bypass = "AzureServices"
ip_rules = []
}
}
# Grant managed identity access to Key Vault
resource "azurerm_role_assignment" "kv_secrets_user" {
scope = azurerm_key_vault.app.id
role_definition_name = "Key Vault Secrets User"
principal_id = azurerm_user_assigned_identity.aks_workload.principal_id
}
# Federated identity credential
resource "azurerm_federated_identity_credential" "aks" {
name = "aks-federated-credential"
resource_group_name = azurerm_resource_group.aks.name
parent_id = azurerm_user_assigned_identity.aks_workload.id
audience = ["api://AzureADTokenExchange"]
issuer = azurerm_kubernetes_cluster.aks.oidc_issuer_url
subject = "system:serviceaccount:production:app-service-account"
}
# Outputs for Kubernetes configuration
output "workload_identity_client_id" {
value = azurerm_user_assigned_identity.aks_workload.client_id
}
output "key_vault_name" {
value = azurerm_key_vault.app.name
}
Step 3: Create Kubernetes Service Account
# service-account.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: app-service-account
namespace: production
annotations:
azure.workload.identity/client-id: "${WORKLOAD_IDENTITY_CLIENT_ID}"
labels:
azure.workload.identity/use: "true"
Apply the service account:
# Get client ID from Terraform or Azure CLI
WORKLOAD_IDENTITY_CLIENT_ID=$(terraform output -raw workload_identity_client_id)
# Apply with substitution
envsubst < service-account.yaml | kubectl apply -f -
SecretProviderClass Configuration
Now we get to the heart of it. The SecretProviderClass is where you define which secrets to mount and how to handle them:
# secret-provider-class.yaml
apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
name: app-secrets
namespace: production
spec:
provider: azure
parameters:
usePodIdentity: "false" # Not using pod identity
useVMManagedIdentity: "false" # Not using VM identity
clientID: "${WORKLOAD_IDENTITY_CLIENT_ID}" # Workload identity client ID
keyvaultName: "prod-app-kv"
cloudName: "AzurePublicCloud"
objects: |
array:
- |
objectName: database-password
objectType: secret
objectVersion: "" # Empty = latest version
- |
objectName: api-key
objectType: secret
objectVersion: ""
- |
objectName: tls-cert
objectType: cert
objectVersion: ""
tenantId: "${AZURE_TENANT_ID}"
# Optional: Sync as Kubernetes secret
secretObjects:
- secretName: app-secrets-k8s
type: Opaque
data:
- objectName: database-password
key: db-password
- objectName: api-key
key: api-key
Key parameters:
objects: Array of secrets/certificates to retrieve from Key VaultobjectVersion: Specific version or empty for latestsecretObjects: Optional sync to Kubernetes secret for env vars
Application Deployment Patterns
Pattern 1: Volume Mount (Recommended)
# deployment-volume-mount.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-deployment
namespace: production
spec:
replicas: 3
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
azure.workload.identity/use: "true" # Enable workload identity injection
spec:
serviceAccountName: app-service-account
containers:
- name: app
image: myregistry.azurecr.io/myapp:v1.0.0
volumeMounts:
- name: secrets-store
mountPath: "/mnt/secrets-store"
readOnly: true
env:
# Read secrets from mounted files
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: app-secrets-k8s # Created by secretObjects
key: db-password
# Or read directly from file in application
- name: SECRETS_PATH
value: "/mnt/secrets-store"
volumes:
- name: secrets-store
csi:
driver: secrets-store.csi.k8s.io
readOnly: true
volumeAttributes:
secretProviderClass: "app-secrets"
Pattern 2: Environment Variables via Sync
Some applications really want environment variables. I get it. Here’s how to make that work:
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-deployment-envvar
namespace: production
spec:
replicas: 3
template:
metadata:
labels:
app: myapp
azure.workload.identity/use: "true"
spec:
serviceAccountName: app-service-account
containers:
- name: app
image: myregistry.azurecr.io/myapp:v1.0.0
env:
- name: DATABASE_PASSWORD
valueFrom:
secretKeyRef:
name: app-secrets-k8s
key: db-password
- name: API_KEY
valueFrom:
secretKeyRef:
name: app-secrets-k8s
key: api-key
# Still need volume mount to trigger secret sync
volumes:
- name: secrets-store
csi:
driver: secrets-store.csi.k8s.io
readOnly: true
volumeAttributes:
secretProviderClass: "app-secrets"
Important: Here’s something that trips people up—the volume must be mounted (even if you’re not directly using it) for the sync to occur. Don’t skip this step.
Pattern 3: Certificate Mounting for TLS
# secret-provider-class-tls.yaml
apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
name: app-tls-secrets
namespace: production
spec:
provider: azure
parameters:
usePodIdentity: "false"
useVMManagedIdentity: "false"
clientID: "${WORKLOAD_IDENTITY_CLIENT_ID}"
keyvaultName: "prod-app-kv"
objects: |
array:
- |
objectName: tls-certificate
objectType: cert
objectAlias: tls.crt
- |
objectName: tls-private-key
objectType: secret
objectAlias: tls.key
tenantId: "${AZURE_TENANT_ID}"
secretObjects:
- secretName: app-tls-secret
type: kubernetes.io/tls
data:
- objectName: tls.crt
key: tls.crt
- objectName: tls.key
key: tls.key
---
# Ingress using the TLS secret
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: app-ingress
namespace: production
spec:
ingressClassName: nginx
tls:
- hosts:
- app.example.com
secretName: app-tls-secret # Auto-synced from Key Vault
rules:
- host: app.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: app-service
port:
number: 80
Secret Rotation Strategy
Automatic Rotation Configuration
One of my favorite features of this setup is automatic rotation. The CSI driver handles this for you:
# Configure rotation interval (already set in Terraform)
az aks update \
--resource-group production-rg \
--name prod-aks-cluster \
--enable-secret-rotation \
--rotation-poll-interval 2m
How it works:
- CSI driver polls Key Vault every
rotation-poll-interval - If secret version changed, updates mounted files
- Application must reload secrets from files
Now here’s the catch: the CSI driver updates the files, but your application needs to actually reload them. Let me show you two approaches I’ve used successfully.
Application-Side Reload Implementation
Option 1: File Watcher (Recommended)
// Go example using fsnotify
package main
import (
"github.com/fsnotify/fsnotify"
"io/ioutil"
"log"
)
func watchSecrets(secretPath string) {
watcher, _ := fsnotify.NewWatcher()
defer watcher.Close()
watcher.Add(secretPath)
for {
select {
case event := <-watcher.Events:
if event.Op&fsnotify.Write == fsnotify.Write {
log.Println("Secret updated, reloading...")
reloadConfig()
}
}
}
}
func reloadConfig() {
dbPassword, _ := ioutil.ReadFile("/mnt/secrets-store/database-password")
// Reconnect to database with new password
reinitializeDBConnection(string(dbPassword))
}
Option 2: Sidecar with SIGHUP
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-with-reload
spec:
template:
spec:
containers:
- name: app
image: myapp:v1.0.0
volumeMounts:
- name: secrets-store
mountPath: "/mnt/secrets-store"
# Sidecar container
- name: secret-reload-sidecar
image: busybox:latest
command:
- /bin/sh
- -c
- |
LAST_MODIFIED=$(stat -c %Y /mnt/secrets-store/database-password)
while true; do
sleep 30
CURRENT_MODIFIED=$(stat -c %Y /mnt/secrets-store/database-password)
if [ "$CURRENT_MODIFIED" != "$LAST_MODIFIED" ]; then
echo "Secret changed, sending SIGHUP to app"
killall -HUP myapp
LAST_MODIFIED=$CURRENT_MODIFIED
fi
done
volumeMounts:
- name: secrets-store
mountPath: "/mnt/secrets-store"
volumes:
- name: secrets-store
csi:
driver: secrets-store.csi.k8s.io
readOnly: true
volumeAttributes:
secretProviderClass: "app-secrets"
Troubleshooting Common Issues
I’ve spent more hours than I’d like to admit debugging these issues. Here are the most common problems and how to fix them.
Issue 1: Pod fails with “failed to get keyvault client”
Diagnosis:
# Check pod events
kubectl describe pod <pod-name> -n production
# Check CSI driver logs
kubectl logs -n kube-system -l app=csi-secrets-store-provider-azure --tail=100
Common causes:
- Workload identity not properly configured
- Service account missing annotation
- Federated credential subject mismatch
Fix:
# Verify service account annotation
kubectl get sa app-service-account -n production -o yaml | grep azure.workload.identity
# Verify federated credential subject matches
az identity federated-credential show \
--name aks-federated-credential \
--identity-name aks-workload-identity \
--resource-group production-rg \
--query subject
Issue 2: Secrets not syncing to Kubernetes secret
Diagnosis:
# Check if Kubernetes secret created
kubectl get secret app-secrets-k8s -n production
# Verify volume mount in pod
kubectl exec -it <pod-name> -n production -- ls -la /mnt/secrets-store
Fix:
Ensure secretObjects defined in SecretProviderClass and volume mounted in pod spec.
Issue 3: Permission denied accessing Key Vault
Diagnosis:
# Check role assignments
az role assignment list \
--assignee $IDENTITY_PRINCIPAL_ID \
--scope /subscriptions/{sub}/resourceGroups/production-rg/providers/Microsoft.KeyVault/vaults/prod-app-kv
Fix:
# Grant proper role
az role assignment create \
--role "Key Vault Secrets User" \
--assignee $IDENTITY_PRINCIPAL_ID \
--scope /subscriptions/{sub}/resourceGroups/production-rg/providers/Microsoft.KeyVault/vaults/prod-app-kv
Security Best Practices
1. Use Separate Key Vaults per Environment
# Terraform: Environment-specific Key Vaults
locals {
environments = ["dev", "staging", "prod"]
}
resource "azurerm_key_vault" "env" {
for_each = toset(local.environments)
name = "${each.key}-app-kv"
resource_group_name = azurerm_resource_group.aks.name
location = azurerm_resource_group.aks.location
tenant_id = data.azurerm_client_config.current.tenant_id
sku_name = "standard"
enable_rbac_authorization = true
purge_protection_enabled = each.key == "prod" ? true : false
}
2. Enable Key Vault Diagnostic Logging
az monitor diagnostic-settings create \
--name kv-audit-logs \
--resource /subscriptions/{sub}/resourceGroups/production-rg/providers/Microsoft.KeyVault/vaults/prod-app-kv \
--logs '[{"category": "AuditEvent", "enabled": true}]' \
--workspace /subscriptions/{sub}/resourceGroups/production-rg/providers/Microsoft.OperationalInsights/workspaces/prod-workspace
3. Implement Network Restrictions
resource "azurerm_key_vault" "app" {
name = "prod-app-kv"
# ... other config
network_acls {
default_action = "Deny"
bypass = "AzureServices"
# Allow only from AKS subnet
virtual_network_subnet_ids = [
azurerm_subnet.aks.id
]
}
}
4. Use Pod Security Standards
apiVersion: v1
kind: Namespace
metadata:
name: production
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/audit: restricted
pod-security.kubernetes.io/warn: restricted
Performance Considerations
Caching and Performance Impact
Let’s talk about what this costs you in terms of performance. The good news is that the CSI driver is pretty smart about caching:
- Initial mount: ~200-500ms (Key Vault API call)
- Subsequent pod starts (same node): <10ms (cached)
- Rotation polling: 2-minute interval (configurable)
In practice, I’ve found this to be negligible. But if you’re running in a non-critical environment, you can optimize further:
Optimization:
# Reduce rotation interval for non-critical environments
# Via Terraform
key_vault_secrets_provider {
secret_rotation_enabled = true
secret_rotation_interval = "10m" # Less frequent polling
}
Migration Strategy from Kubernetes Secrets
Phase 1: Parallel Operation
# Keep existing Kubernetes secret, add Key Vault mount
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-migration
spec:
template:
spec:
containers:
- name: app
env:
# Old: From Kubernetes secret (fallback)
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: old-k8s-secret
key: password
# New: From Key Vault (preferred)
- name: DB_PASSWORD_KV
valueFrom:
secretKeyRef:
name: app-secrets-k8s
key: db-password
# Application checks DB_PASSWORD_KV first, falls back to DB_PASSWORD
Phase 2: Application Update
Update application to prefer Key Vault source.
Phase 3: Remove Old Secrets
# After validation, remove Kubernetes secrets
kubectl delete secret old-k8s-secret -n production
Conclusion
If you’re running production workloads on AKS, you need enterprise-grade secret management. Period. Azure Key Vault CSI Driver with Workload Identity gives you exactly that: centralized secret storage, automatic rotation, audit logging, and fine-grained RBAC—all the things that native Kubernetes secrets simply can’t deliver.
Here are my key takeaways from implementing this across multiple production environments:
- Always use Workload Identity over pod-managed identity (trust me on this one)
- Implement application-side reload for rotation—don’t assume it’ll just work
- Enable diagnostic logging from day one for audit trails
- Use separate Key Vaults per environment (no shortcuts here)
- Monitor CSI driver performance and caching behavior
Follow these patterns, and you’ll have secure, scalable secret management that actually meets enterprise security standards. I’ve seen too many teams try to cut corners with secret management, and it always comes back to bite them.
About StriveNimbus: We specialize in AKS security architecture, including zero-trust implementations, workload identity migrations, and compliance automation. Our team has secured hundreds of production AKS clusters across regulated industries. Contact us for security assessment and implementation support.