Securing AKS Workloads with Azure Key Vault CSI Driver: Beyond Environment Variables

Deep technical guide to implementing Azure Key Vault Secrets Store CSI Driver in AKS for secure secret management, including workload identity integration, rotation strategies, and production patterns.

Let me be honest: managing secrets in Kubernetes has always been a pain point. Sure, Kubernetes Secrets work as a native mechanism, but they’re just base64-encoded (not actually encrypted at rest by default) and lack the enterprise-grade features you’d expect—things like automatic rotation, audit logging, and centralized management. Azure Key Vault CSI Driver solves these problems by mounting secrets directly from Azure Key Vault into pods as volumes.

In this article, I’ll walk you through a production-grade implementation based on patterns we’ve deployed across multiple enterprise environments. We’ll cover workload identity integration, rotation strategies, and the operational lessons I’ve learned along the way.

The Problem with Traditional Kubernetes Secrets

Before we dive into the solution, let’s talk about why traditional approaches fall short:

# Anti-pattern: Hardcoded secrets in manifests
apiVersion: v1
kind: Secret
metadata:
  name: db-credentials
type: Opaque
data:
  username: YWRtaW4=  # base64, not encrypted
  password: cGFzc3dvcmQxMjM=

Issues:

  • Secrets stored in Git (even if base64 encoded)
  • No rotation mechanism
  • No audit trail of secret access
  • Encrypted at rest only if etcd encryption enabled
  • Difficult to manage across multiple clusters

Azure Key Vault CSI Driver Architecture

The CSI driver enables pods to mount secrets from Azure Key Vault as volumes:

graph TB
    subgraph AKS["AKS Cluster"]
        Pod["Application Pod"]
        SPC["SecretProviderClass
(CRD)"] CSI["CSI Driver
DaemonSet"] Pod -->|"1. Mounts volume"| SPC SPC -->|"2. Defines secrets"| CSI Pod -.->|"3. Volume mount"| CSI end CSI -->|"4. Workload Identity
OIDC Token"| AAD["Azure AD"] AAD -->|"5. Validates identity"| KV["Azure Key Vault"] KV -->|"6. Returns secrets"| CSI CSI -.->|"7. Mounts as files"| Pod subgraph KV_Contents["Key Vault Contents"] Secrets["Secrets"] Certs["Certificates"] Keys["Keys"] end KV --- KV_Contents style AKS fill:#e1f5ff style KV fill:#d4edda style AAD fill:#fff3cd style Pod fill:#f8d7da

Prerequisites and Installation

Enable CSI Driver on AKS Cluster

# New cluster with CSI driver and workload identity
az aks create \
  --resource-group production-rg \
  --name prod-aks-cluster \
  --kubernetes-version 1.28.5 \
  --node-count 3 \
  --enable-addons azure-keyvault-secrets-provider \
  --enable-oidc-issuer \
  --enable-workload-identity \
  --generate-ssh-keys

# Existing cluster - enable CSI driver
az aks enable-addons \
  --resource-group production-rg \
  --name prod-aks-cluster \
  --addons azure-keyvault-secrets-provider

# Enable workload identity if not already enabled
az aks update \
  --resource-group production-rg \
  --name prod-aks-cluster \
  --enable-oidc-issuer \
  --enable-workload-identity

Terraform Implementation

resource "azurerm_kubernetes_cluster" "aks" {
  name                = "prod-aks-cluster"
  location            = azurerm_resource_group.aks.location
  resource_group_name = azurerm_resource_group.aks.name
  kubernetes_version  = "1.28.5"
  dns_prefix          = "prod-aks"

  default_node_pool {
    name       = "default"
    node_count = 3
    vm_size    = "Standard_D4s_v5"
  }

  identity {
    type = "SystemAssigned"
  }

  # Enable Key Vault CSI driver
  key_vault_secrets_provider {
    secret_rotation_enabled  = true
    secret_rotation_interval = "2m"
  }

  # Enable workload identity
  oidc_issuer_enabled       = true
  workload_identity_enabled = true

  network_profile {
    network_plugin = "azure"
    network_policy = "calico"
  }
}

# Get OIDC issuer URL for workload identity federation
output "oidc_issuer_url" {
  value = azurerm_kubernetes_cluster.aks.oidc_issuer_url
}

Verify Installation

# Check CSI driver pods
kubectl get pods -n kube-system -l app=secrets-store-csi-driver

# Check provider pods
kubectl get pods -n kube-system -l app=csi-secrets-store-provider-azure

# Verify workload identity mutating webhook
kubectl get mutatingwebhookconfigurations | grep azure-wi

Workload Identity Configuration

Workload Identity is the recommended authentication method (replaces deprecated pod-managed identity).

Authentication Methods Comparison

MethodSecurityComplexityMaintenanceRecommendation
Workload Identity✅ High (OIDC federation)MediumLow (automatic token rotation)Recommended
Pod-Managed IdentityMedium (Azure metadata)MediumMedium⚠️ Deprecated (use WI instead)
Service PrincipalLow (static credentials)LowHigh (manual rotation)❌ Not recommended
Access PoliciesMediumLowMedium⚠️ Legacy (use RBAC instead)

Why Workload Identity?

I can’t stress this enough: Workload Identity is the way to go. Here’s why:

  • No secrets stored in cluster
  • Automatic token rotation (1 hour default)
  • Fine-grained RBAC permissions
  • Supports multiple service accounts
  • Azure-native OIDC integration

Step 1: Create Azure Key Vault

# Create Key Vault
az keyvault create \
  --name prod-app-kv \
  --resource-group production-rg \
  --location eastus \
  --enable-rbac-authorization true

# Add secrets
az keyvault secret set \
  --vault-name prod-app-kv \
  --name database-password \
  --value "SuperSecretPassword123!"

az keyvault secret set \
  --vault-name prod-app-kv \
  --name api-key \
  --value "sk-proj-abc123xyz789"

Step 2: Create Managed Identity and Federate with Service Account

# Create user-assigned managed identity
az identity create \
  --resource-group production-rg \
  --name aks-workload-identity

# Get identity details
IDENTITY_CLIENT_ID=$(az identity show \
  --resource-group production-rg \
  --name aks-workload-identity \
  --query clientId -o tsv)

IDENTITY_PRINCIPAL_ID=$(az identity show \
  --resource-group production-rg \
  --name aks-workload-identity \
  --query principalId -o tsv)

# Grant Key Vault permissions
az role assignment create \
  --role "Key Vault Secrets User" \
  --assignee $IDENTITY_PRINCIPAL_ID \
  --scope /subscriptions/{subscription-id}/resourceGroups/production-rg/providers/Microsoft.KeyVault/vaults/prod-app-kv

# Get AKS OIDC issuer
OIDC_ISSUER=$(az aks show \
  --resource-group production-rg \
  --name prod-aks-cluster \
  --query oidcIssuerProfile.issuerUrl -o tsv)

# Create federated credential
az identity federated-credential create \
  --name aks-federated-credential \
  --identity-name aks-workload-identity \
  --resource-group production-rg \
  --issuer $OIDC_ISSUER \
  --subject system:serviceaccount:production:app-service-account

Terraform Implementation

# User-assigned managed identity
resource "azurerm_user_assigned_identity" "aks_workload" {
  name                = "aks-workload-identity"
  resource_group_name = azurerm_resource_group.aks.name
  location            = azurerm_resource_group.aks.location
}

# Key Vault
resource "azurerm_key_vault" "app" {
  name                       = "prod-app-kv"
  location                   = azurerm_resource_group.aks.location
  resource_group_name        = azurerm_resource_group.aks.name
  tenant_id                  = data.azurerm_client_config.current.tenant_id
  sku_name                   = "standard"
  enable_rbac_authorization  = true
  purge_protection_enabled   = true

  network_acls {
    default_action = "Deny"
    bypass         = "AzureServices"
    ip_rules       = []
  }
}

# Grant managed identity access to Key Vault
resource "azurerm_role_assignment" "kv_secrets_user" {
  scope                = azurerm_key_vault.app.id
  role_definition_name = "Key Vault Secrets User"
  principal_id         = azurerm_user_assigned_identity.aks_workload.principal_id
}

# Federated identity credential
resource "azurerm_federated_identity_credential" "aks" {
  name                = "aks-federated-credential"
  resource_group_name = azurerm_resource_group.aks.name
  parent_id           = azurerm_user_assigned_identity.aks_workload.id
  audience            = ["api://AzureADTokenExchange"]
  issuer              = azurerm_kubernetes_cluster.aks.oidc_issuer_url
  subject             = "system:serviceaccount:production:app-service-account"
}

# Outputs for Kubernetes configuration
output "workload_identity_client_id" {
  value = azurerm_user_assigned_identity.aks_workload.client_id
}

output "key_vault_name" {
  value = azurerm_key_vault.app.name
}

Step 3: Create Kubernetes Service Account

# service-account.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: app-service-account
  namespace: production
  annotations:
    azure.workload.identity/client-id: "${WORKLOAD_IDENTITY_CLIENT_ID}"
  labels:
    azure.workload.identity/use: "true"

Apply the service account:

# Get client ID from Terraform or Azure CLI
WORKLOAD_IDENTITY_CLIENT_ID=$(terraform output -raw workload_identity_client_id)

# Apply with substitution
envsubst < service-account.yaml | kubectl apply -f -

SecretProviderClass Configuration

Now we get to the heart of it. The SecretProviderClass is where you define which secrets to mount and how to handle them:

# secret-provider-class.yaml
apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
  name: app-secrets
  namespace: production
spec:
  provider: azure
  parameters:
    usePodIdentity: "false"               # Not using pod identity
    useVMManagedIdentity: "false"         # Not using VM identity
    clientID: "${WORKLOAD_IDENTITY_CLIENT_ID}"  # Workload identity client ID
    keyvaultName: "prod-app-kv"
    cloudName: "AzurePublicCloud"
    objects: |
      array:
        - |
          objectName: database-password
          objectType: secret
          objectVersion: ""              # Empty = latest version
        - |
          objectName: api-key
          objectType: secret
          objectVersion: ""
        - |
          objectName: tls-cert
          objectType: cert
          objectVersion: ""
    tenantId: "${AZURE_TENANT_ID}"
  # Optional: Sync as Kubernetes secret
  secretObjects:
  - secretName: app-secrets-k8s
    type: Opaque
    data:
    - objectName: database-password
      key: db-password
    - objectName: api-key
      key: api-key

Key parameters:

  • objects: Array of secrets/certificates to retrieve from Key Vault
  • objectVersion: Specific version or empty for latest
  • secretObjects: Optional sync to Kubernetes secret for env vars

Application Deployment Patterns

# deployment-volume-mount.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-deployment
  namespace: production
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
        azure.workload.identity/use: "true"  # Enable workload identity injection
    spec:
      serviceAccountName: app-service-account
      containers:
      - name: app
        image: myregistry.azurecr.io/myapp:v1.0.0
        volumeMounts:
        - name: secrets-store
          mountPath: "/mnt/secrets-store"
          readOnly: true
        env:
        # Read secrets from mounted files
        - name: DB_PASSWORD
          valueFrom:
            secretKeyRef:
              name: app-secrets-k8s  # Created by secretObjects
              key: db-password
        # Or read directly from file in application
        - name: SECRETS_PATH
          value: "/mnt/secrets-store"
      volumes:
      - name: secrets-store
        csi:
          driver: secrets-store.csi.k8s.io
          readOnly: true
          volumeAttributes:
            secretProviderClass: "app-secrets"

Pattern 2: Environment Variables via Sync

Some applications really want environment variables. I get it. Here’s how to make that work:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-deployment-envvar
  namespace: production
spec:
  replicas: 3
  template:
    metadata:
      labels:
        app: myapp
        azure.workload.identity/use: "true"
    spec:
      serviceAccountName: app-service-account
      containers:
      - name: app
        image: myregistry.azurecr.io/myapp:v1.0.0
        env:
        - name: DATABASE_PASSWORD
          valueFrom:
            secretKeyRef:
              name: app-secrets-k8s
              key: db-password
        - name: API_KEY
          valueFrom:
            secretKeyRef:
              name: app-secrets-k8s
              key: api-key
      # Still need volume mount to trigger secret sync
      volumes:
      - name: secrets-store
        csi:
          driver: secrets-store.csi.k8s.io
          readOnly: true
          volumeAttributes:
            secretProviderClass: "app-secrets"

Important: Here’s something that trips people up—the volume must be mounted (even if you’re not directly using it) for the sync to occur. Don’t skip this step.

Pattern 3: Certificate Mounting for TLS

# secret-provider-class-tls.yaml
apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
  name: app-tls-secrets
  namespace: production
spec:
  provider: azure
  parameters:
    usePodIdentity: "false"
    useVMManagedIdentity: "false"
    clientID: "${WORKLOAD_IDENTITY_CLIENT_ID}"
    keyvaultName: "prod-app-kv"
    objects: |
      array:
        - |
          objectName: tls-certificate
          objectType: cert
          objectAlias: tls.crt
        - |
          objectName: tls-private-key
          objectType: secret
          objectAlias: tls.key
    tenantId: "${AZURE_TENANT_ID}"
  secretObjects:
  - secretName: app-tls-secret
    type: kubernetes.io/tls
    data:
    - objectName: tls.crt
      key: tls.crt
    - objectName: tls.key
      key: tls.key
---
# Ingress using the TLS secret
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: app-ingress
  namespace: production
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - app.example.com
    secretName: app-tls-secret  # Auto-synced from Key Vault
  rules:
  - host: app.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: app-service
            port:
              number: 80

Secret Rotation Strategy

Automatic Rotation Configuration

One of my favorite features of this setup is automatic rotation. The CSI driver handles this for you:

# Configure rotation interval (already set in Terraform)
az aks update \
  --resource-group production-rg \
  --name prod-aks-cluster \
  --enable-secret-rotation \
  --rotation-poll-interval 2m

How it works:

  1. CSI driver polls Key Vault every rotation-poll-interval
  2. If secret version changed, updates mounted files
  3. Application must reload secrets from files

Now here’s the catch: the CSI driver updates the files, but your application needs to actually reload them. Let me show you two approaches I’ve used successfully.

Application-Side Reload Implementation

Option 1: File Watcher (Recommended)

// Go example using fsnotify
package main

import (
    "github.com/fsnotify/fsnotify"
    "io/ioutil"
    "log"
)

func watchSecrets(secretPath string) {
    watcher, _ := fsnotify.NewWatcher()
    defer watcher.Close()

    watcher.Add(secretPath)

    for {
        select {
        case event := <-watcher.Events:
            if event.Op&fsnotify.Write == fsnotify.Write {
                log.Println("Secret updated, reloading...")
                reloadConfig()
            }
        }
    }
}

func reloadConfig() {
    dbPassword, _ := ioutil.ReadFile("/mnt/secrets-store/database-password")
    // Reconnect to database with new password
    reinitializeDBConnection(string(dbPassword))
}

Option 2: Sidecar with SIGHUP

apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-with-reload
spec:
  template:
    spec:
      containers:
      - name: app
        image: myapp:v1.0.0
        volumeMounts:
        - name: secrets-store
          mountPath: "/mnt/secrets-store"
      # Sidecar container
      - name: secret-reload-sidecar
        image: busybox:latest
        command:
        - /bin/sh
        - -c
        - |
          LAST_MODIFIED=$(stat -c %Y /mnt/secrets-store/database-password)
          while true; do
            sleep 30
            CURRENT_MODIFIED=$(stat -c %Y /mnt/secrets-store/database-password)
            if [ "$CURRENT_MODIFIED" != "$LAST_MODIFIED" ]; then
              echo "Secret changed, sending SIGHUP to app"
              killall -HUP myapp
              LAST_MODIFIED=$CURRENT_MODIFIED
            fi
          done
        volumeMounts:
        - name: secrets-store
          mountPath: "/mnt/secrets-store"
      volumes:
      - name: secrets-store
        csi:
          driver: secrets-store.csi.k8s.io
          readOnly: true
          volumeAttributes:
            secretProviderClass: "app-secrets"

Troubleshooting Common Issues

I’ve spent more hours than I’d like to admit debugging these issues. Here are the most common problems and how to fix them.

Issue 1: Pod fails with “failed to get keyvault client”

Diagnosis:

# Check pod events
kubectl describe pod <pod-name> -n production

# Check CSI driver logs
kubectl logs -n kube-system -l app=csi-secrets-store-provider-azure --tail=100

Common causes:

  • Workload identity not properly configured
  • Service account missing annotation
  • Federated credential subject mismatch

Fix:

# Verify service account annotation
kubectl get sa app-service-account -n production -o yaml | grep azure.workload.identity

# Verify federated credential subject matches
az identity federated-credential show \
  --name aks-federated-credential \
  --identity-name aks-workload-identity \
  --resource-group production-rg \
  --query subject

Issue 2: Secrets not syncing to Kubernetes secret

Diagnosis:

# Check if Kubernetes secret created
kubectl get secret app-secrets-k8s -n production

# Verify volume mount in pod
kubectl exec -it <pod-name> -n production -- ls -la /mnt/secrets-store

Fix: Ensure secretObjects defined in SecretProviderClass and volume mounted in pod spec.

Issue 3: Permission denied accessing Key Vault

Diagnosis:

# Check role assignments
az role assignment list \
  --assignee $IDENTITY_PRINCIPAL_ID \
  --scope /subscriptions/{sub}/resourceGroups/production-rg/providers/Microsoft.KeyVault/vaults/prod-app-kv

Fix:

# Grant proper role
az role assignment create \
  --role "Key Vault Secrets User" \
  --assignee $IDENTITY_PRINCIPAL_ID \
  --scope /subscriptions/{sub}/resourceGroups/production-rg/providers/Microsoft.KeyVault/vaults/prod-app-kv

Security Best Practices

1. Use Separate Key Vaults per Environment

# Terraform: Environment-specific Key Vaults
locals {
  environments = ["dev", "staging", "prod"]
}

resource "azurerm_key_vault" "env" {
  for_each = toset(local.environments)

  name                = "${each.key}-app-kv"
  resource_group_name = azurerm_resource_group.aks.name
  location            = azurerm_resource_group.aks.location
  tenant_id           = data.azurerm_client_config.current.tenant_id
  sku_name           = "standard"

  enable_rbac_authorization = true
  purge_protection_enabled  = each.key == "prod" ? true : false
}

2. Enable Key Vault Diagnostic Logging

az monitor diagnostic-settings create \
  --name kv-audit-logs \
  --resource /subscriptions/{sub}/resourceGroups/production-rg/providers/Microsoft.KeyVault/vaults/prod-app-kv \
  --logs '[{"category": "AuditEvent", "enabled": true}]' \
  --workspace /subscriptions/{sub}/resourceGroups/production-rg/providers/Microsoft.OperationalInsights/workspaces/prod-workspace

3. Implement Network Restrictions

resource "azurerm_key_vault" "app" {
  name = "prod-app-kv"
  # ... other config

  network_acls {
    default_action = "Deny"
    bypass         = "AzureServices"

    # Allow only from AKS subnet
    virtual_network_subnet_ids = [
      azurerm_subnet.aks.id
    ]
  }
}

4. Use Pod Security Standards

apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted

Performance Considerations

Caching and Performance Impact

Let’s talk about what this costs you in terms of performance. The good news is that the CSI driver is pretty smart about caching:

  • Initial mount: ~200-500ms (Key Vault API call)
  • Subsequent pod starts (same node): <10ms (cached)
  • Rotation polling: 2-minute interval (configurable)

In practice, I’ve found this to be negligible. But if you’re running in a non-critical environment, you can optimize further:

Optimization:

# Reduce rotation interval for non-critical environments
# Via Terraform
key_vault_secrets_provider {
  secret_rotation_enabled  = true
  secret_rotation_interval = "10m"  # Less frequent polling
}

Migration Strategy from Kubernetes Secrets

Phase 1: Parallel Operation

# Keep existing Kubernetes secret, add Key Vault mount
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-migration
spec:
  template:
    spec:
      containers:
      - name: app
        env:
        # Old: From Kubernetes secret (fallback)
        - name: DB_PASSWORD
          valueFrom:
            secretKeyRef:
              name: old-k8s-secret
              key: password
        # New: From Key Vault (preferred)
        - name: DB_PASSWORD_KV
          valueFrom:
            secretKeyRef:
              name: app-secrets-k8s
              key: db-password
        # Application checks DB_PASSWORD_KV first, falls back to DB_PASSWORD

Phase 2: Application Update

Update application to prefer Key Vault source.

Phase 3: Remove Old Secrets

# After validation, remove Kubernetes secrets
kubectl delete secret old-k8s-secret -n production

Conclusion

If you’re running production workloads on AKS, you need enterprise-grade secret management. Period. Azure Key Vault CSI Driver with Workload Identity gives you exactly that: centralized secret storage, automatic rotation, audit logging, and fine-grained RBAC—all the things that native Kubernetes secrets simply can’t deliver.

Here are my key takeaways from implementing this across multiple production environments:

  • Always use Workload Identity over pod-managed identity (trust me on this one)
  • Implement application-side reload for rotation—don’t assume it’ll just work
  • Enable diagnostic logging from day one for audit trails
  • Use separate Key Vaults per environment (no shortcuts here)
  • Monitor CSI driver performance and caching behavior

Follow these patterns, and you’ll have secure, scalable secret management that actually meets enterprise security standards. I’ve seen too many teams try to cut corners with secret management, and it always comes back to bite them.


About StriveNimbus: We specialize in AKS security architecture, including zero-trust implementations, workload identity migrations, and compliance automation. Our team has secured hundreds of production AKS clusters across regulated industries. Contact us for security assessment and implementation support.