Building an Internal Developer Platform That Cut Onboarding Time from 2 Weeks to 2 Days

How StriveNimbus helped a Series B SaaS company eliminate infrastructure friction by building a Backstage-powered IDP—reducing new developer onboarding from 2 weeks to 2 days, achieving Elite DORA metrics, and saving $695K annually.

Executive Summary

The CTO said something I’ll never forget: “We’re hiring great engineers, but our velocity isn’t increasing. New hires spend two weeks learning tribal knowledge before they can deploy anything. Something is fundamentally broken.”

This Series B SaaS company had grown from 40 to 180 engineers in 18 months—impressive growth, but their infrastructure was still built for a 10-person startup. No standardization. No self-service. Three DevOps engineers drowning in 180 support tickets per month. New developers spending their first two weeks asking “how do I deploy this?” instead of shipping features.

We helped them build an Internal Developer Platform (IDP) using Backstage, ArgoCD, and golden path automation. The transformation was remarkable: onboarding time dropped 90% (2 weeks → 2 days), deployment frequency increased 140%, and the platform team went from firefighting tickets to building strategic capabilities.

Key Outcomes:

  • New developer onboarding: 2 weeks → 2 days (90% reduction)
  • Time to first production deploy: 5-7 days → 3 hours (98% faster)
  • Infrastructure support tickets: 180/month → 22/month (88% reduction)
  • Deployment frequency: 4/week → 9.6/week (140% increase)
  • DORA metrics: Achieved Elite performer status across all 4 metrics
  • Developer satisfaction: 4.1/10 → 8.9/10
  • Annual value delivered: $695K (productivity gains + reduced incidents)

Client Background

This wasn’t a story of bad engineering—it was a story of hypergrowth outpacing infrastructure maturity.

The company had gone from seed stage to Series B in 3 years, growing from 10 engineers to 180. Their product was winning in the market, customers loved it, revenue was growing 3x year-over-year. By most measures, they were incredibly successful.

But beneath the surface, the cracks were showing.

Industry: Enterprise SaaS (Collaboration & Project Management Platform)

Team Size: 180 engineers across 15 product teams

Infrastructure Scale:

  • 200+ microservices (Node.js, Python, Go, Ruby)
  • Azure-native stack (AKS, PostgreSQL, CosmosDB, Event Hubs)
  • 8 environments (1 shared dev, 3 staging tiers, 4 production regions)
  • 800+ deployments per month (across all teams)
  • Hiring velocity: 15-20 engineers per quarter

The Problem Nobody Was Talking About:

In quarterly planning, the VP of Engineering presented a troubling slide: “We added 40 engineers last quarter, but our feature delivery velocity only increased 15%. Why aren’t we scaling linearly?”

The answer was hiding in plain sight: Infrastructure friction was eating productivity.

Senior engineers with 10+ years of experience were taking 2 weeks to understand the deployment process. New hires’ first production deployment averaged 5-7 days. One engineer told me during our discovery: “I spent my first three weeks just trying to figure out where the documentation was. By week four, I realized most of it didn’t exist—it was all in people’s heads.”

The 3-person platform team was underwater. They were handling 180 support tickets per month—everything from “How do I create a database?” to “My deployment failed and I don’t know why.” Every request required manual intervention. Context switching was constant. Burnout was real.

The VP of Engineering eventually admitted: “We’re spending $36M annually on engineering salaries, but my gut says 35% of their time is wasted on infrastructure instead of features. That’s $12.6M in misallocated productivity. We have to fix this.”

The Challenge: Tribal Knowledge and No Self-Service

Let me walk you through what it actually looked like to be a new developer at this company before we built the platform.

Week 1: “How Do I Even Get Started?”

You join as a new software engineer. You’re excited—you’ve got your laptop, Slack access, GitHub credentials. Your manager says, “Go ahead and set up your development environment and clone the repos.”

Then reality hits.

Documentation is scattered everywhere:

  • Some in Confluence (last updated 8 months ago, half the links are broken)
  • Some in Notion (3 different workspaces, nobody knows which is current)
  • Some in Google Docs (linked from old Slack messages, impossible to find)
  • Most in senior engineers’ heads (tribal knowledge acquired over years)

There’s no service catalog:

  • The company has 200+ microservices
  • Nobody has a complete list of what they all do
  • No ownership information (Who owns auth-service? Who knows?)
  • No architecture diagrams (someone made one 18 months ago, it’s completely outdated)
  • No dependency mapping (What happens if I change this API?)

Onboarding is entirely manual:

  • Fill out 5 different Google Forms for various access requests
  • Wait 2-3 days for approvals (whoever approves is on vacation)
  • Manual setup of local dev environment (different instructions for each service)
  • “Shadow a senior engineer for a week to learn how things work” (that senior engineer doesn’t have time)

By day 5, you’ve attended 12 meetings, read hundreds of Slack messages, and you still don’t know how to deploy a simple service.

Week 2: “How Do I Deploy Anything?”

You’ve been assigned your first task: Add a new API endpoint to the user-service. Straightforward, right?

You write the code. Tests pass locally. PR is approved. You’re ready to deploy to the dev environment.

Now what?

The deployment process nobody documented:

  1. Figure out which of the 15 Git repositories contains user-service deployment configs
  2. Find the correct Kubernetes YAML files (teams use 3 different naming conventions)
  3. Update the YAML (hope you don’t break production with a typo—no validation)
  4. Run kubectl apply -f deployment.yaml (wait, which context? which namespace? which cluster?)
  5. Check if it worked (how? where are the logs? what’s the monitoring dashboard URL?)
  6. Something broke. Now what? (no rollback documentation, no runbooks)
  7. Slack a senior engineer: “Hey sorry to bother you, but my deployment failed and I’m not sure why…”
  8. Wait for response (they’re in a meeting, or dealing with a production incident)
  9. Eventually get help (they spend 20 minutes debugging what turned out to be a wrong namespace)
  10. Try again. It works. Total time: 4 hours for something that should take 5 minutes.

Average time to first production deployment for a new engineer: 5-7 days.

Not because the engineer was incompetent. Not because the code was complex. Because the process was impossible to discover and execute independently.

One new hire told me: “I felt stupid for three weeks. Everyone else seemed to know what they were doing, and I was constantly asking basic questions. Then I realized—they’d all struggled through the same thing. They’d just forgotten how painful it was.”

The Platform Team: Playing an Unwinnable Game

The three platform engineers were talented and dedicated—but they were fighting a losing battle.

A typical day for the platform team:

  • 9:00 AM: Check ticket queue. 23 new tickets overnight.
  • 9:15 AM: “Can you create a new PostgreSQL database for my service? I need test-db-v2 with 100GB storage.”
  • 9:45 AM: “My deployment to staging failed but I don’t know why. Can you look at the logs?”
  • 10:30 AM: “I need access to production logs for debugging. How do I get that?”
  • 11:00 AM: “What’s the URL for the staging-3 environment? It’s not in the docs.”
  • 11:30 AM: Production alert: Database connection pool exhausted. Drop everything. Fix incident.
  • 1:00 PM: Back from incident. 12 more tickets in queue.
  • 1:30 PM: “Can you help me set up Datadog monitoring for my new service?”
  • 2:00 PM: “I accidentally deleted my namespace. Can you restore it?”
  • 3:00 PM: “Our deployment pipeline is broken. Nothing is deploying.”
  • 4:30 PM: “I need a Redis instance for caching. How long does that take?”
  • 5:00 PM: Check ticket queue. Still 35 open tickets.
  • 5:30 PM: Another production alert…

Average ticket resolution time: 2-3 days.

Not because the platform team was slow—but because they were context-switching between 20 different requests while also keeping the infrastructure running and responding to production incidents.

One platform engineer said it perfectly: “I spend 90% of my time answering the same five questions over and over. I wish I could just build something that lets people help themselves. But I don’t have time to build that because I’m too busy answering tickets.”

It’s the classic “too busy chopping wood to sharpen the axe” problem.

The Solution: Building a Product for Internal Customers

Our philosophy was simple: Treat your internal developers like customers. Build them a product they’ll love to use.

We weren’t just “installing Backstage and calling it done.” We built a complete developer experience that made infrastructure disappear from the daily workflow of product engineers.

Phase 1: Developer Portal with Self-Service Catalog (Weeks 1-4)

1. Backstage as the Central Portal

We deployed Backstage.io—Spotify’s open-source developer portal—as the single pane of glass for everything infrastructure.

When a developer logs in, they see:

  • All services they own (with health, deployments, ownership)
  • Recent deployments across their team
  • Pending tasks (PRs to review, on-call shifts, runbooks)
  • Quick actions: “Create new service,” “Deploy to production,” “View logs,” “Check metrics”

It’s like a “product homepage” for your infrastructure.

2. Service Catalog with Metadata

We cataloged all 200+ microservices with rich metadata:

  • Ownership: Which team owns this? Who’s on-call?
  • Dependencies: What does this service depend on? What depends on it?
  • Documentation: Architecture docs, API docs, runbooks—all in one place
  • Health: Current deployments, error rates, latency, uptime
  • Links: Datadog dashboard, GitHub repo, ArgoCD app, PagerDuty service

No more hunting through Slack or asking “Who owns auth-service?”—just search the catalog.

3. Golden Path Templates (The Magic)

This was the game-changer. We created “Software Templates” for common use cases:

New Microservice Template:

  • Developer clicks “Create New Service” in Backstage

  • Fills out a simple form:

    • Service name: payment-processor
    • Language: Go
    • Database needed: Yes (PostgreSQL)
    • Team: Payments Team
    • Owner: jane.doe@company.com
  • Backstage generates (in < 2 minutes):

    • Git repository with boilerplate code (Go service template)
    • CI/CD pipeline (GitHub Actions)
    • Kubernetes deployment manifests
    • ArgoCD application
    • Terraform for PostgreSQL database
    • Datadog dashboard
    • PagerDuty service integration
    • On-call rotation schedule
  • Automated PR is created, reviewed, merged

  • Service is deployed to dev environment

  • Developer gets Slack notification: “Your service is ready! View it here: [link]”

Total time: 12 minutes from click to running service.

We built templates for:

  • New API service (Node.js, Python, Go, Ruby)
  • Background worker service
  • Scheduled job (cron)
  • Database provisioning
  • Redis cache provisioning
  • S3 bucket with lifecycle policies
  • Event stream (Azure Event Hubs)

Developers went from spending days setting up infrastructure to clicking a button and getting a production-ready service in minutes.

4. GitOps with ArgoCD

We centralized all deployments through ArgoCD (GitOps):

  • Deployments are declarative (Git is source of truth)
  • Multi-environment promotion (dev → staging → prod)
  • Automated rollback on health check failures
  • Full audit trail (who deployed what, when, why)
  • Self-service deployment via Backstage

Developers never run kubectl apply again. They merge a PR, and ArgoCD handles deployment.

Phase 2: Documentation and Observability (Weeks 5-8)

1. TechDocs Integration

We integrated Backstage TechDocs—documentation lives right alongside your service in the catalog:

  • Write docs in Markdown (version-controlled in Git)
  • Automatically rendered in Backstage
  • Linked to the service that owns it
  • Searchable across all services

Developers can now find documentation because it’s right where they’re already looking.

2. API Documentation Auto-Generation

We set up automatic API doc generation:

  • OpenAPI/Swagger specs generated from code
  • Rendered in Backstage service page
  • Always up-to-date (regenerated on every deployment)
  • Interactive API explorer (test APIs directly from the portal)

No more “the API docs are outdated”—they’re generated from the code.

3. Observability Integration

We integrated observability tools directly into Backstage:

  • Datadog dashboards: View metrics without leaving Backstage
  • Logs: One-click access to logs for your service
  • Traces: Distributed tracing for debugging
  • Alerts: Current firing alerts for your services
  • Cost metrics: “Your service costs $340/month to run”

Developers get complete visibility without juggling 5 different tools.

4. DORA Metrics Dashboard

We built a dashboard showing engineering velocity metrics:

  • Deployment Frequency: How often are we deploying?
  • Lead Time for Changes: How long from commit to production?
  • Change Failure Rate: What % of deployments cause incidents?
  • MTTR: How fast do we recover from incidents?

This gave leadership visibility into the impact of the platform.

Phase 3: Security and Governance (Weeks 9-12)

1. Policy-as-Code with OPA

We implemented automated security policies using Open Policy Agent:

  • Enforce standards: All services must have resource limits, health checks, proper labels
  • Security scanning: Container images scanned for vulnerabilities (Trivy)
  • Secrets detection: Block commits with exposed secrets (GitGuardian)
  • Compliance checks: Enforce company security policies automatically

Deployments that violate policies are blocked automatically—no manual review needed.

2. Self-Service Access Management

We built self-service access workflows:

  • Role-based access: Developers automatically get appropriate permissions
  • Just-in-Time access: Request production access for 4 hours (auto-revoked)
  • Audit logging: Complete trail of who accessed what, when
  • Access reviews: Automated quarterly reviews, automatic revocation of stale access

Developers can get access when they need it, without waiting days for manual approval.

3. Cost Visibility

We integrated cost tracking (using Kubecost):

  • Every service shows its monthly cost
  • Teams get monthly cost reports
  • Budget alerts when approaching limits
  • Rightsizing recommendations

Engineers now see: “Your service costs $890/month. You’re over-provisioned. Click here to optimize.”

The Results: From Bottleneck to Force Multiplier

Developer Onboarding Transformation

Before:

  • New developer onboarding: 2 weeks
  • Time to first production deployment: 5-7 days
  • Onboarding satisfaction: 4.1/10
  • Primary obstacle: “Figuring out how things work”

After:

  • New developer onboarding: 2 days (90% reduction)
  • Time to first production deployment: 3 hours (98% faster)
  • Onboarding satisfaction: 8.9/10
  • Feedback: “The platform made everything obvious. I was productive immediately.”

We onboarded 8 new engineers in Q3. They were all committing code to production by end of week 1.

Platform Adoption

Golden Path Usage:

  • Services created using templates: 78% (156 of 200 migrated)
  • New services using golden paths: 100% (mandatory)
  • Self-service infrastructure requests: 92% (vs 0% before)
  • Documentation usage: 2,400 views/month (vs 140 before)
  • Platform NPS: +64 (world-class score)

Engineers wanted to use the platform because it made their lives easier.

Engineering Velocity

Support Tickets:

  • Infrastructure-related tickets: 180/month → 22/month (88% reduction)
  • Average resolution time: 2-3 days → 15 minutes (99% faster)
  • Platform team capacity freed: 75% (they can now focus on strategic work)

DORA Metrics:

Before IDP:

  • Deployment Frequency: 4/week (Low performer)
  • Lead Time for Changes: 5-7 days (Low performer)
  • Change Failure Rate: 22% (Medium performer)
  • MTTR: 3.5 hours (Medium performer)

After IDP:

  • Deployment Frequency: 9.6/week → Elite performer
  • Lead Time for Changes: < 1 hour → Elite performer
  • Change Failure Rate: 8% → Elite performer
  • MTTR: 18 minutes → Elite performer

We achieved Elite across all four DORA metrics within 4 months.

Business Impact

Cost Savings:

  • Platform team support overhead reduced: 75% capacity freed (2.25 FTE)
  • Onboarding cost savings: $420K/year (15 engineers/quarter × $7K saved per engineer)
  • Reduced deployment errors: $180K/year (fewer incidents, faster recovery)
  • Infrastructure optimization: $95K/year (cost visibility enabled rightsizing)
  • Total annual value: $695K

Developer Satisfaction:

  • Overall developer satisfaction: 6.2/10 → 8.9/10
  • Engineers rating infrastructure as “not a blocker”: 35% → 91%
  • Retention improvement: Exit interview feedback citing “infrastructure friction” dropped from 18% → 0%
  • Time to productivity: 8 weeks → 2 weeks

Competitive Advantage:

  • Feature delivery velocity: +45% (measured by story points per sprint)
  • Time to market for new products: 12 weeks → 6 weeks
  • Engineering confidence deploying to production: 58% → 94%
  • Security compliance: 40% of services meeting standards → 98% (policy automation)

Cultural Transformation

The most important outcome wasn’t the metrics—it was the cultural shift.

Before:

  • “How do I deploy this?” (asked constantly)
  • “Infrastructure is blocking us” (weekly complaint)
  • “I don’t know who owns that service” (daily confusion)
  • “I’m afraid to deploy on Friday” (deployment anxiety)

After:

  • “I just clicked the button and it deployed” (common feedback)
  • “The platform handles that automatically” (new default)
  • “Everything I need is in Backstage” (single source of truth)
  • “I deploy 5x per day with confidence” (deployment becomes routine)

The VP of Engineering said at our final retrospective: “You didn’t just build us tooling. You changed how engineers think about infrastructure. It went from a blocker to an enabler.”

Lessons Learned

1. Developer Experience Is a Product, Not a Project

We treated the IDP like a customer-facing product:

  • Regular user research (developer surveys, interviews)
  • Weekly iteration based on feedback
  • Clear metrics (NPS, adoption, satisfaction)
  • Roadmap driven by developer needs

This isn’t a “set it and forget it” project. It’s an ongoing product that evolves.

2. Golden Paths > Documentation

Writing docs telling developers “how to do X” doesn’t scale. Instead:

  • Build golden paths that do the right thing by default
  • Make the easy way the correct way
  • Automate the toil away

Documentation is still important, but automation is better.

3. Self-Service Requires Guardrails

You can’t just give developers “self-service root access” and hope for the best:

  • Implement policy-as-code (OPA) to enforce standards
  • Use templates to encode best practices
  • Provide automated security scanning
  • Set up cost governance

Self-service + guardrails = safe empowerment.

4. Migration Is Gradual (and That’s OK)

We didn’t force-migrate all 200 services overnight:

  • Started with 2 pilot teams (15 engineers)
  • Gathered feedback and iterated
  • Rolled out to 5 more teams
  • Eventually made golden paths mandatory for new services
  • Migrated legacy services gradually over 6 months

Teams adopted because they wanted to, not because they were forced.

5. Platform Team Transformation

The platform team went from:

  • Before: Firefighting tickets, reactive support, burnout
  • After: Strategic planning, building capabilities, innovation

One platform engineer told me: “This is the first time in 2 years I’ve had time to actually build something meaningful. I’m not just answering the same questions anymore.”

That’s what success looks like.

What to Do Next

If your engineering team is slowed down by infrastructure friction, here’s how to start building an Internal Developer Platform:

Week 1-2: Discovery

  • Survey developers: What’s painful? What takes too long?
  • Audit your infrastructure: How many services? How are they deployed?
  • Identify quick wins: What would have immediate impact?
  • Define success metrics: Onboarding time, deployment frequency, satisfaction

Week 3-6: Foundation

  • Deploy Backstage (or similar IDP framework)
  • Build service catalog (start with top 20 services)
  • Create your first golden path template (new service creation)
  • Integrate with CI/CD (GitHub Actions, ArgoCD, etc.)

Week 7-10: Expand

  • Add observability integrations (Datadog, Grafana, etc.)
  • Build documentation portal (TechDocs)
  • Add 3-5 more golden path templates
  • Onboard pilot teams (2-3 teams, 15-20 engineers)

Week 11-12: Iterate

  • Gather feedback from pilot teams
  • Iterate on templates and workflows
  • Begin org-wide rollout
  • Measure DORA metrics for baseline

Beyond 12 Weeks: Continuously Improve

  • Add new templates based on developer requests
  • Integrate more tools (cost visibility, security scanning)
  • Build platform team roadmap based on feedback
  • Treat IDP as a product (ongoing investment)

Partner with StriveNimbus for Platform Engineering

Is infrastructure friction slowing your engineering team? StriveNimbus specializes in building Internal Developer Platforms that transform developer productivity.

How We Can Help:

  • Platform Strategy & Design: Understand your needs, design the right solution
  • Backstage Implementation: Deploy and customize Backstage for your organization
  • Golden Path Development: Build templates and automation for your common workflows
  • GitOps with ArgoCD: Implement declarative, self-service deployments
  • Developer Training: Onboard your teams to the new platform
  • Ongoing Support: Platform team coaching and optimization

Our Approach:

  • Start with pilot teams (prove value quickly)
  • Iterate based on developer feedback
  • Gradual rollout (no big-bang migrations)
  • Measure impact with DORA metrics
  • Transfer knowledge to your team (build internal capability)

Ready to 10x your developer productivity? Schedule a platform assessment to discover how an Internal Developer Platform can eliminate infrastructure friction and accelerate your engineering velocity.

Let’s build a platform your developers will love.