Celery · Python

Fix WorkerLostError: celery.exceptions.WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL) Job: 42 in Celery

This error occurs when a Celery worker process is killed by the OS, usually because it exceeded memory limits (OOM killer) or hit the hard time limit. The task cannot catch this because SIGKILL terminates the process immediately. Fix it by reducing memory usage in tasks, increasing container memory limits, or setting task_acks_late so tasks are redelivered to a healthy worker.

Reading the Stack Trace

[2026-04-10 14:32:01,234: ERROR/MainProcess] Task app.tasks.process_images[abc123] raised unexpected: WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL) Job: 42') Traceback (most recent call last): File "/app/venv/lib/python3.12/site-packages/billiard/pool.py", line 1265, in mark_as_worker_lost raise WorkerLostError( File "/app/venv/lib/python3.12/site-packages/billiard/pool.py", line 682, in on_job_process_down self.mark_as_worker_lost(job, exitcode) File "/app/venv/lib/python3.12/site-packages/celery/concurrency/prefork.py", line 83, in on_hard_timeout raise TimeLimitExceeded(job._timeout) celery.exceptions.WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL) Job: 42

Here's what each line means:

WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL) Job: 42'): Signal 9 (SIGKILL) indicates the OS forcefully terminated the worker — likely due to memory exhaustion or the hard time limit.
File "/app/venv/lib/python3.12/site-packages/billiard/pool.py", line 1265, in mark_as_worker_lost: The billiard process pool detects the worker child process has exited and marks the task as lost.
File "/app/venv/lib/python3.12/site-packages/celery/concurrency/prefork.py", line 83, in on_hard_timeout: The prefork pool's timeout handler fired, indicating the task exceeded the hard time limit.

Common Causes

1. Task consumes too much memory

The task loads large files or datasets into memory, causing the OS OOM killer to terminate the worker process.

@celery.task
def process_images(image_paths):
    images = [Image.open(p) for p in image_paths]  # loads all images into memory
    for img in images:
        process(img)

2. Container memory limit too low

The Docker container or Kubernetes pod has a memory limit that is too small for the task's workload.

# docker-compose.yml
worker:
  mem_limit: 256m  # too small for image processing

3. Hard time limit exceeded

The task exceeded the hard time limit and Celery sent SIGKILL to the worker process.

@celery.task(time_limit=30)  # hard kill after 30 seconds
def long_running_task():
    process_large_dataset()  # takes 5+ minutes

bugstack fixes this class of error automatically — in under 2 minutes.

Start Free Trial →

The Fix

Process images one at a time using a context manager to release memory after each one. Set worker_max_memory_per_child to automatically restart workers before they accumulate too much memory. Adjust time limits to realistic values and always set soft_time_limit below time_limit to allow graceful cleanup.

Before (broken)

@celery.task
def process_images(image_paths):
    images = [Image.open(p) for p in image_paths]
    for img in images:
        process(img)

After (fixed)

@celery.task(bind=True, time_limit=600, soft_time_limit=540)
def process_images(self, image_paths):
    for path in image_paths:
        with Image.open(path) as img:
            process(img)
        # Process one at a time, releasing memory between iterations

# Also set: worker_max_memory_per_child = 200000  # restart worker after 200 MB

Testing the Fix

import pytest
import tempfile
from PIL import Image
from app.tasks import process_images
from app import create_app

@pytest.fixture
def app():
    app = create_app()
    app.config['CELERY_ALWAYS_EAGER'] = True
    app.config['TESTING'] = True
    return app

def test_process_single_image(app, tmp_path):
    img_path = tmp_path / 'test.png'
    Image.new('RGB', (100, 100)).save(img_path)
    with app.app_context():
        result = process_images([str(img_path)])
        assert result is not None

def test_process_empty_list(app):
    with app.app_context():
        result = process_images([])
        assert result is not None

def test_process_multiple_images(app, tmp_path):
    paths = []
    for i in range(5):
        p = tmp_path / f'test_{i}.png'
        Image.new('RGB', (100, 100)).save(p)
        paths.append(str(p))
    with app.app_context():
        result = process_images(paths)

Run your tests:

pytest tests/ -v

Pushing Through CI/CD

git checkout -b fix/celery-worker-memory
git add app/tasks.py celeryconfig.py
git commit -m "fix: process images sequentially and set memory limits for workers"
git push origin fix/celery-worker-memory

Your CI config should look something like this:

name: CI
on:
  pull_request:
    branches: [main]
jobs:
  test:
    runs-on: ubuntu-latest
    services:
      redis:
        image: redis:7-alpine
        ports:
          - 6379:6379
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'
          cache: 'pip'
      - run: pip install -r requirements.txt
      - run: pytest tests/ -v --tb=short
        env:
          CELERY_BROKER_URL: redis://localhost:6379/0

The Full Manual Process: 18 Steps

Here's every step you just went through to fix this one bug:

Notice the error alert or see it in your monitoring tool
Open the error dashboard and read the stack trace
Identify the file and line number from the stack trace
Open your IDE and navigate to the file
Read the surrounding code to understand context
Reproduce the error locally
Identify the error source
Write the fix
Run the test suite locally
Fix any failing tests
Write new tests covering the edge case
Run the full test suite again
Create a new git branch
Commit and push your changes
Open a pull request
Wait for code review
Merge and deploy to production
Monitor production to confirm the error is resolved

Total time: 30-60 minutes. For one bug.

Or Let bugstack Fix It in Under 2 minutes

Every step above? bugstack does it automatically.

Step 1: Install the SDK

pip install bugstack

Step 2: Initialize

import bugstack

bugstack.init(api_key=os.environ["BUGSTACK_API_KEY"])

Step 3: There is no step 3.

bugstack handles everything from here:

Captures the stack trace and request context
Pulls the relevant source files from your GitHub repo
Analyzes the error and understands the code context
Generates a minimal, validated fix
Runs your existing test suite
Pushes through your CI/CD pipeline
Deploys to production (or opens a PR for review)

Time from error to fix deployed: Under 2 minutes.

Human involvement: zero.

Try bugstack Free →

No credit card. 5-minute setup. Cancel anytime.

Deploying the Fix (Manual Path)

Run pytest locally to verify images are processed without memory spikes.
Open a pull request with the memory optimization and config changes.
Wait for CI checks to pass on the PR.
Have a teammate review and approve the PR.
Merge to main and monitor worker memory usage in staging.

Frequently Asked Questions

How does BugStack know the fix is safe to deploy?

BugStack profiles memory usage during test execution, verifies workers stay within limits, and runs your full test suite before marking it safe to deploy.

What if BugStack generates a bad fix?

BugStack never pushes directly to production. Every fix goes through a pull request with full CI checks, so your team can review it before merging.

What is worker_max_memory_per_child?

It is a Celery setting that automatically restarts a worker after it has used the specified amount of memory (in KB). This prevents memory leaks from accumulating.

How can I tell if SIGKILL is from OOM or time limit?

Check dmesg or /var/log/syslog for OOM killer messages. If the worker was killed by Celery's hard time limit, the Celery logs will show 'Hard time limit exceeded'.

Stop fixing Python errors manually.

bugstack catches runtime errors, writes the fix, and opens a tested PR — in under 2 minutes.

Start Free Trial → Book a Demo →