The Hidden Scaling Trap: Why Your Kubernetes Multi-Worker Setup is Sabotaging Your Reliability
Last updated on

Why We Switched to 1 Worker Per Pod in Kubernetes


Our "healthy" pods were hiding failures from Kubernetes, and we had no visibility into what was actually breaking. Here's why we switched to single-worker pods—and why you might want to consider it too.

When our monitoring showed green lights but customers reported intermittent timeouts, we thought we had a network issue. After weeks of debugging, we found the culprit: multiple workers per pod were hiding failures from Kubernetes' health checks.

This isn't a universal truth—plenty of teams successfully run multi-worker pods. But if you're struggling with mysterious failures, hard-to-debug issues, or applications with worker-level bugs, this might help.

The Visibility Problem

When users reported "the API is slow sometimes" and we checked our dashboards, everything looked healthy. All green lights, normal metrics.

This went on for months.

We had the classic FastAPI setup that everyone uses:

uvicorn main:app --workers 4

Makes sense, right? One pod, four workers. More workers = more throughput. It's what the docs suggest, what Stack Overflow recommends, what we've all been doing since forever.

Except it was randomly eating requests, and we had no idea.

The Debugging Journey

It started with sporadic customer complaints. Not many, but enough to investigate. The weird part? We could never reproduce it. Our logs showed the requests coming in, but then... nothing. No error, no timeout logged, just silence.

After adding way too much instrumentation, we finally caught it:

Pod Status: ✅ Running
Health Check: ✅ 200 OK
Reality: Worker 2 had been deadlocked for 47 minutes

The pod was "healthy" because the health check hit Worker 1. Meanwhile, Worker 2 was stuck in an infinite loop, silently dropping requests that landed on it.

┌─────────────────┐
│ Pod (Running)   │
│                 │
│ Worker 1: ✅    │ ← Health check always hits this one
│ Worker 2: 💀    │ ← Been dead for an hour
│ Worker 3: ✅    │ 
│ Worker 4: ✅    │
└─────────────────┘

Why This is Hard to Detect

The challenging part is that standard monitoring often won't catch this.

Kubernetes checks if the pod is healthy, not individual workers. Your health endpoint returns 200 OK because one worker responds. Your CPU and memory look normal because three workers are fine. Your logs are a mess because all four workers write to the same stdout.

It's like having a 4-person customer service team where one person just went home but nobody noticed because the phone still gets answered... eventually.

The Core Issue

After diving deep into this, we realized something that sounds obvious in hindsight: Kubernetes manages pods, not processes within pods. When you run multiple workers, Kubernetes can't see if individual workers fail.

It's like hiring a manager and then not telling them about three quarters of your team. They can't manage what they don't know exists.

The Solution That Felt Wrong (But Worked)

We switched to 1 worker per pod.

Yes, really.

# Before: 1 pod with 4 workers
apiVersion: apps/v1
kind: Deployment
spec:
  replicas: 1
  template:
    spec:
      containers:
      - name: app
        command: ["uvicorn", "main:app", "--workers", "4"]

# After: 4 pods with 1 worker each
apiVersion: apps/v1
kind: Deployment
spec:
  replicas: 4
  template:
    spec:
      containers:
      - name: app
        command: ["uvicorn", "main:app"]  # No --workers flag
        resources:
          requests:
            memory: "128Mi"
            cpu: "100m"
          limits:
            memory: "256Mi"
            cpu: "200m"

At first, this felt like going backwards. More pods? Isn't that wasteful?

We also added an HPA:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
  scaleTargetRef:
    kind: Deployment
    name: my-app
  minReplicas: 4
  maxReplicas: 20
  targetCPUUtilizationPercentage: 70

This combination gave us several benefits.

What Actually Improved

1. Failures became visible

Worker crash = Pod crash = Kubernetes restarts it immediately. No more silent failures.

2. Debugging became possible

# Before: Which worker logged this error??
kubectl logs my-app-pod-abc123
[ERROR] Database connection failed
[INFO] Request processed
[ERROR] Database connection failed
[INFO] Request processed
# All workers mixed together 😭

# After: Clear logs per pod
kubectl logs my-app-pod-abc123
[ERROR] Database connection failed
# Ah, THIS specific instance has DB issues

3. Scaling became granular

Instead of jumping from 4 workers to 8 (doubling capacity), we could go from 4 to 5 pods. Much smoother.

4. Resource limits actually worked

Before, one worker could hog CPU and starve the others. Now, Kubernetes enforces limits per pod. Fair and predictable.

But What About...

"Isn't this more overhead?"

Yes. 4 pods use more total memory than 1 pod with 4 workers (no Copy-on-Write sharing between pods). However, the isolation is valuable. Each pod gets its own resources, no workers fighting over CPU, and when something breaks, you know exactly which instance is the problem. The trade-off in memory usage is often worth the improved debugging experience.

"What about startup time?"

Fair point. If your app takes 30 seconds to start, this might be annoying during deploys. But here's the thing: with rolling deployments and proper readiness probes, users never notice.

"What about shared resources?"

This is the one real trade-off. If you're loading a 2GB ML model, you probably don't want 10 pods each loading their own copy. More on this below.

The ML Exception (And Why It Proves The Rule)

There's one case where we kept multi-worker pods: our ML inference service.

The model takes 2 minutes to load and uses 4GB of RAM. Running 10 pods would mean 40GB of RAM just for model copies—an inefficient use of resources.

So for that service:

# ML service keeps multiple workers
uvicorn ml_service:app --workers 3

# But we added aggressive health checking
@app.get("/health/workers")
async def check_all_workers():
    """Actually verify all workers can process requests"""
    results = []
    for worker_id in range(3):
        try:
            # Each worker must prove it's alive
            test_inference = await run_inference_test(worker_id)
            results.append({"worker": worker_id, "status": "ok"})
        except:
            results.append({"worker": worker_id, "status": "dead"})
    
    if any(r["status"] == "dead" for r in results):
        raise HTTPException(503, detail=results)
    
    return results

The key: if you must use multiple workers, at least monitor them properly.

How To Know If You Have This Problem

To verify if this affects your setup:

  1. Find a multi-worker pod
  2. Exec into it and kill -STOP one worker process
  3. Observe that health checks still pass
  4. Watch some requests timeout
  5. Consider if this visibility gap is acceptable

If you're using --workers in your deployments, you may be affected by this issue.

Migration Strategy (The Realistic Version)

Don't big-bang this. Here's what worked for us:

Week 1: Pick your least important service

  • That internal admin tool with 5 users? Perfect.
  • Remove --workers, increase replicas
  • Watch it for a week

Week 2-3: Move to a real service

  • Pick something customer-facing but not critical
  • Add good resource limits
  • Set up an HPA
  • Monitor error rates obsessively

Week 4+: Gradual rollout

  • Service by service
  • Keep a list of "exceptions" (ML services, etc.)
  • Document why each exception exists

What Changed

The migration to single-worker pods made the most difference in operational visibility and debugging workflow.

What improved:

  • Failed workers now trigger pod restarts immediately
  • Logs became easier to trace to specific instances
  • Debugging time decreased noticeably
  • The mysterious timeout issues mostly disappeared

What stayed the same:

  • Overall application performance and latency
  • Total resource usage (with proper limits configured)

What required adjustment:

  • Resource limits needed recalculation per pod
  • Deployment configurations across all services
  • Team's mental model of how pods scale

What This Taught Us

The bigger lesson here isn't about workers or pods. It's that sometimes the "best practice" from the VM era doesn't translate to Kubernetes.

We've been trained to think "fewer processes = better" because managing processes used to be hard. But Kubernetes is really good at managing lots of pods. That's literally its job.

Let it do its job.

The TL;DR

  • Multiple workers per pod can lead to silent failures and difficult debugging
  • 1 worker per pod + HPA provides visible failures and clearer debugging
  • Exception: Services with high startup costs (ML models)
  • Migration: Start small, measure everything
  • The improved visibility pays off during production debugging

I know this approach might seem like extra work initially. We had the same hesitation. But after experiencing the operational improvements, we found that in Kubernetes, visibility and clear failure modes often matter more than minimal resource usage.

Consider trying it with one non-critical service first. You can always revert if it doesn't work for your use case.


Got war stories about multi-worker pods? Found a case where they actually make sense? Hit me up. I'm genuinely curious about other experiences with this pattern.