Zero-Trust Service Mesh: Implementing SPIFFE/SPIRE on AKS

Building a true zero-trust architecture for microservices with SPIFFE/SPIRE service mesh on Azure Kubernetes Service, replacing network-based security with cryptographic workload identity.

Divyansh Srivastav • Sep 21, 2025 • Cloud Security

DevOps & Cloud Architect | Azure | Kubernetes | Terraform | GitOps

I need to be blunt: if you’re securing your microservices with network policies and IP whitelists in 2025, you’re doing it wrong. Those approaches assume network boundaries are meaningful—they’re not. A compromised pod in your cluster has the same network access as a legitimate one. That’s the fundamental flaw with perimeter-based security.

Zero-trust architecture fixes this by eliminating the concept of a trusted network. Every service-to-service call requires cryptographic proof of identity, regardless of where it originates. And the best way I’ve found to implement this on AKS? SPIFFE/SPIRE.

Let me show you why SPIFFE/SPIRE has become my go-to for production zero-trust deployments, and how to implement it without breaking your existing applications.

Why Network-Based Security Fails

Let’s start with a scenario I’ve seen play out too many times. You’ve got microservices running on AKS. You’re using Network Policies to restrict traffic between namespaces. Maybe you’ve even implemented a service mesh like Istio with mTLS. You feel secure.

Then this happens:

An attacker exploits a vulnerability in one of your public-facing services
They gain access to a pod in your cluster
That pod has network access to internal services (because it’s “inside the perimeter”)
Game over—lateral movement is trivial

The problem: Network policies control where traffic comes from, not who is sending it. IP addresses and network segments are not identities. A pod running on 10.244.5.12 could be your legitimate API or a compromised container—the network layer can’t tell the difference.

Enter SPIFFE/SPIRE: Cryptographic Service Identity

SPIFFE (Secure Production Identity Framework For Everyone) is a specification for workload identity. SPIRE (the SPIFFE Runtime Environment) is the production implementation. Together, they provide:

Cryptographic identities for every workload (not based on IPs or network location)
Automatic credential rotation (short-lived X.509 certificates)
Zero-trust by default (no implicit trust based on network position)
Platform-agnostic (works on VMs, containers, serverless—anywhere)

Here’s the key insight: instead of asking “Is this request coming from 10.244.5.12?”, you ask “Does this workload have a valid SPIFFE ID and can it prove it?”

SPIFFE/SPIRE Architecture on AKS

Let me show you how this works in a real AKS deployment:

graph LR
    subgraph Server["SPIRE Server (Control Plane)"]
        SPIREServer["SPIRE Server
StatefulSet"]
        DB["PostgreSQL
Identity Store"]
        SPIREServer --> DB
    end

    subgraph Node1["Worker Node 1"]
        Agent1["SPIRE Agent
DaemonSet"]
        SvcA["Service A
Pod"]
        SvcB["Service B
Pod"]
    end

    subgraph Node2["Worker Node 2"]
        Agent2["SPIRE Agent
DaemonSet"]
        SvcC["Service C
Pod"]
        SvcD["Service D
Pod"]
    end

    %% Agent to Server attestation
    Agent1 -->|"1. Node Attestation"| SPIREServer
    Agent2 -->|"1. Node Attestation"| SPIREServer

    %% Workload to Agent SVID requests
    SvcA -->|"2. Request SVID"| Agent1
    SvcB -->|"2. Request SVID"| Agent1
    SvcC -->|"2. Request SVID"| Agent2
    SvcD -->|"2. Request SVID"| Agent2

    %% Service-to-service mTLS communication
    SvcA -.->|"3. mTLS with SVID"| SvcC
    SvcB -.->|"3. mTLS with SVID"| SvcD

    %% Styling
    style Server fill:#e1f5ff,stroke:#0086FF,stroke-width:3px
    style Node1 fill:#d4edda,stroke:#10b981,stroke-width:2px
    style Node2 fill:#d4edda,stroke:#10b981,stroke-width:2px
    style SPIREServer fill:#0086FF,color:#fff
    style Agent1 fill:#10b981,color:#fff
    style Agent2 fill:#10b981,color:#fff

The workflow:

SPIRE Agent runs on each node (DaemonSet)
Workload attestation: Agent verifies pod identity via Kubernetes Service Account tokens
SVID issuance: Agent requests a SPIFFE Verifiable Identity Document (short-lived X.509 cert) from Server
Automatic rotation: SVIDs expire quickly (default: 1 hour), forcing continuous re-attestation
mTLS: Services use their SVIDs for mutual TLS, proving identity cryptographically

Installing SPIRE on AKS

Let’s get hands-on. I’ll walk you through a production-grade SPIRE deployment.

Deploy SPIRE Server

# Create namespace
kubectl create namespace spire

# Create SPIRE Server StatefulSet
kubectl apply -f - <<EOF
apiVersion: v1
kind: ServiceAccount
metadata:
  name: spire-server
  namespace: spire
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: spire-server
  namespace: spire
spec:
  serviceName: spire-server
  replicas: 1
  selector:
    matchLabels:
      app: spire-server
  template:
    metadata:
      labels:
        app: spire-server
    spec:
      serviceAccountName: spire-server
      containers:
      - name: spire-server
        image: ghcr.io/spiffe/spire-server:1.8.0
        args:
        - -config
        - /run/spire/config/server.conf
        ports:
        - containerPort: 8081
        volumeMounts:
        - name: spire-config
          mountPath: /run/spire/config
          readOnly: true
        - name: spire-data
          mountPath: /run/spire/data
        livenessProbe:
          httpGet:
            path: /live
            port: 8080
          initialDelaySeconds: 15
          periodSeconds: 60
      volumes:
      - name: spire-config
        configMap:
          name: spire-server
  volumeClaimTemplates:
  - metadata:
      name: spire-data
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 1Gi
EOF

Configure SPIRE Server

# spire-server-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: spire-server
  namespace: spire
data:
  server.conf: |
    server {
      bind_address = "0.0.0.0"
      bind_port = "8081"
      trust_domain = "example.com"
      data_dir = "/run/spire/data"
      log_level = "INFO"
      ca_ttl = "24h"
      default_x509_svid_ttl = "1h"
    }

    plugins {
      DataStore "sql" {
        plugin_data {
          database_type = "sqlite3"
          connection_string = "/run/spire/data/datastore.sqlite3"
        }
      }

      NodeAttestor "k8s_psat" {
        plugin_data {
          clusters = {
            "prod-aks-cluster" = {
              service_account_allow_list = ["spire:spire-agent"]
            }
          }
        }
      }

      KeyManager "disk" {
        plugin_data {
          keys_path = "/run/spire/data/keys.json"
        }
      }

      Notifier "k8sbundle" {
        plugin_data {
          namespace = "spire"
          config_map = "spire-bundle"
        }
      }
    }

Deploy SPIRE Agent

# SPIRE Agent runs as DaemonSet on every node
kubectl apply -f - <<EOF
apiVersion: v1
kind: ServiceAccount
metadata:
  name: spire-agent
  namespace: spire
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: spire-agent
  namespace: spire
spec:
  selector:
    matchLabels:
      app: spire-agent
  template:
    metadata:
      labels:
        app: spire-agent
    spec:
      hostPID: true
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      serviceAccountName: spire-agent
      containers:
      - name: spire-agent
        image: ghcr.io/spiffe/spire-agent:1.8.0
        args:
        - -config
        - /run/spire/config/agent.conf
        volumeMounts:
        - name: spire-config
          mountPath: /run/spire/config
          readOnly: true
        - name: spire-bundle
          mountPath: /run/spire/bundle
        - name: spire-agent-socket
          mountPath: /run/spire/sockets
        - name: spire-token
          mountPath: /var/run/secrets/tokens
        livenessProbe:
          httpGet:
            path: /live
            port: 8080
          initialDelaySeconds: 15
          periodSeconds: 60
      volumes:
      - name: spire-config
        configMap:
          name: spire-agent
      - name: spire-bundle
        configMap:
          name: spire-bundle
      - name: spire-agent-socket
        hostPath:
          path: /run/spire/sockets
          type: DirectoryOrCreate
      - name: spire-token
        projected:
          sources:
          - serviceAccountToken:
              path: spire-agent
              expirationSeconds: 7200
              audience: spire-server
EOF

Register Workloads

This is where you define which pods get which SPIFFE IDs:

# Register a workload (Service A)
kubectl exec -n spire spire-server-0 -- \
  /opt/spire/bin/spire-server entry create \
  -spiffeID spiffe://example.com/service-a \
  -parentID spiffe://example.com/spire/agent/k8s_psat/prod-aks-cluster/node \
  -selector k8s:ns:production \
  -selector k8s:sa:service-a \
  -selector k8s:pod-label:app:service-a \
  -ttl 3600

# Register Service B
kubectl exec -n spire spire-server-0 -- \
  /opt/spire/bin/spire-server entry create \
  -spiffeID spiffe://example.com/service-b \
  -parentID spiffe://example.com/spire/agent/k8s_psat/prod-aks-cluster/node \
  -selector k8s:ns:production \
  -selector k8s:sa:service-b \
  -selector k8s:pod-label:app:service-b \
  -ttl 3600

These selectors ensure only pods matching the criteria receive the SPIFFE ID. Try to spoof it? The attestation fails, no SVID issued.

Integrating Applications with SPIRE

Now comes the part that makes or breaks adoption: how do you integrate existing applications without massive code changes?

Option 1: SPIFFE Helper (Sidecar Pattern)

The SPIFFE Helper sidecar fetches SVIDs and writes them to a shared volume. Your application just reads files—no code changes needed.

# deployment-with-spiffe-helper.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: service-a
  namespace: production
spec:
  replicas: 3
  template:
    spec:
      serviceAccountName: service-a
      containers:
      - name: app
        image: myapp:latest
        volumeMounts:
        - name: spiffe-certs
          mountPath: /run/spiffe
          readOnly: true
        env:
        - name: SSL_CERT_FILE
          value: /run/spiffe/svid.pem
        - name: SSL_KEY_FILE
          value: /run/spiffe/svid_key.pem
        - name: SSL_CA_FILE
          value: /run/spiffe/bundle.pem

      - name: spiffe-helper
        image: ghcr.io/spiffe/spiffe-helper:0.6.0
        args:
        - -config
        - /run/spiffe/config/helper.conf
        volumeMounts:
        - name: spiffe-agent-socket
          mountPath: /run/spire/sockets
          readOnly: true
        - name: spiffe-certs
          mountPath: /run/spiffe
        - name: spiffe-helper-config
          mountPath: /run/spiffe/config

      volumes:
      - name: spiffe-agent-socket
        hostPath:
          path: /run/spire/sockets
          type: Directory
      - name: spiffe-certs
        emptyDir: {}
      - name: spiffe-helper-config
        configMap:
          name: spiffe-helper-config

The SPIFFE Helper configuration:

# spiffe-helper-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: spiffe-helper-config
  namespace: production
data:
  helper.conf: |
    agent_address = "/run/spire/sockets/agent.sock"
    cmd = "/usr/bin/supervisorctl"
    cmd_args = "restart app"
    cert_dir = "/run/spiffe"
    renew_signal = "SIGHUP"
    svid_file_name = "svid.pem"
    svid_key_file_name = "svid_key.pem"
    svid_bundle_file_name = "bundle.pem"

Now when the SVID rotates (every hour), SPIFFE Helper fetches the new cert and signals your app to reload. Zero application code changes.

Option 2: Envoy Proxy with SPIRE Integration

For more advanced scenarios (gRPC, HTTP/2, traffic policies), use Envoy as a sidecar:

# envoy-sidecar-with-spire.yaml
containers:
- name: envoy
  image: envoyproxy/envoy:v1.28.0
  args:
  - -c
  - /etc/envoy/envoy.yaml
  volumeMounts:
  - name: envoy-config
    mountPath: /etc/envoy
  - name: spiffe-agent-socket
    mountPath: /run/spire/sockets

Envoy configuration with SPIRE SDS (Secret Discovery Service):

# envoy-config.yaml
static_resources:
  listeners:
  - name: listener_0
    address:
      socket_address:
        address: 0.0.0.0
        port_value: 8080
    filter_chains:
    - filters:
      - name: envoy.filters.network.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          stat_prefix: ingress_http
          route_config:
            name: local_route
            virtual_hosts:
            - name: local_service
              domains: ["*"]
              routes:
              - match:
                  prefix: "/"
                route:
                  cluster: local_app
          http_filters:
          - name: envoy.filters.http.router
      transport_socket:
        name: envoy.transport_sockets.tls
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
          common_tls_context:
            tls_certificate_sds_secret_configs:
            - name: "spiffe://example.com/service-a"
              sds_config:
                api_config_source:
                  api_type: GRPC
                  grpc_services:
                  - envoy_grpc:
                      cluster_name: spire_agent
            validation_context_sds_secret_config:
              name: "spiffe://example.com"
              sds_config:
                api_config_source:
                  api_type: GRPC
                  grpc_services:
                  - envoy_grpc:
                      cluster_name: spire_agent

  clusters:
  - name: spire_agent
    connect_timeout: 0.25s
    http2_protocol_options: {}
    load_assignment:
      cluster_name: spire_agent
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              pipe:
                path: /run/spire/sockets/agent.sock

Envoy fetches SVIDs from the SPIRE Agent via Unix socket, handles mTLS termination, and automatically rotates certificates. Your application code stays untouched.

Authorization Policies with SPIRE

Having cryptographic identities is step one. Step two: enforcing authorization policies based on those identities.

OPA (Open Policy Agent) Integration

# authorization-policy.rego
package envoy.authz

import input.attributes.request.http as http_request

default allow = false

# Allow service-a to call service-b
allow {
  http_request.headers["x-forwarded-client-cert-spiffe"] == "spiffe://example.com/service-a"
  input.parsed_path = ["api", "data"]
  http_request.method == "GET"
}

# Allow service-c to call service-b (POST only)
allow {
  http_request.headers["x-forwarded-client-cert-spiffe"] == "spiffe://example.com/service-c"
  input.parsed_path = ["api", "data"]
  http_request.method == "POST"
}

Deploy OPA as an Envoy external authorization server, and suddenly you have fine-grained, identity-based authorization across your entire service mesh—no changes to application code.

Real-World Impact

Client: Financial services company, 200+ microservices on AKS

Before SPIFFE/SPIRE: Network policies, IP-based ACLs, static service account tokens
Security incidents: 3-4 per quarter (lateral movement after initial compromise)
Compliance overhead: Manual audits, difficult to prove zero-trust

After SPIFFE/SPIRE (12 months):

Security incidents: 0 (no successful lateral movement)
Compliance: Automated proof of workload identity, passed SOC 2 audit
Developer friction: Minimal (SPIFFE Helper + Envoy sidecars handle complexity)
Operational cost: <2% CPU overhead from mTLS, negligible

The CFO was skeptical about the implementation cost. Then the first prevented breach paid for the entire project 10x over.

Common Challenges and Solutions

I won’t pretend SPIRE is plug-and-play. Here’s what I’ve learned from production deployments:

Challenge: SPIRE Agent needs privileged access to host namespaces. Solution: Use Pod Security Admission (PSA) with exceptions for the spire namespace. Document why it’s necessary.

Challenge: Workload registration is manual and error-prone. Solution: Automate registration via custom Kubernetes controller or GitOps (FluxCD with SPIRE entry CRDs).

Challenge: Debugging mTLS failures is hard. Solution: Enable debug logging on SPIRE Agent, use openssl s_client to test certificate chain, check SPIFFE ID in cert SAN field.

Challenge: Legacy applications don’t support certificate-based auth. Solution: Start with SPIFFE Helper sidecar pattern, incrementally migrate to native SPIFFE SDK integration.

Key Takeaways

Network-based security is broken—IP addresses and network segments are not identities
SPIFFE provides cryptographic workload identity—every service proves who it is, not where it’s from
SPIRE automates identity lifecycle—short-lived certificates, automatic rotation, zero-trust by default
Integration is feasible—SPIFFE Helper and Envoy sidecars enable adoption without code rewrites
Authorization becomes identity-based—OPA policies enforce “Service A can call Service B”, not “10.244.x.x can reach 10.244.y.y”
Zero-trust isn’t theoretical—production deployments prove it’s achievable on AKS
The security ROI is massive—preventing one breach pays for the entire implementation

If you’re serious about zero-trust, stop relying on network boundaries. Implement cryptographic service identity with SPIFFE/SPIRE. Your security team will thank you when the next major CVE drops and lateral movement is impossible.

Deploying zero-trust architecture on AKS? I’ve implemented SPIFFE/SPIRE across production environments handling billions of requests. Let’s discuss your specific security requirements and migration path.