Fix WorkerLostError: celery.exceptions.WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL) Job: 42 in Celery
This error occurs when a Celery worker process is killed by the OS, usually because it exceeded memory limits (OOM killer) or hit the hard time limit. The task cannot catch this because SIGKILL terminates the process immediately. Fix it by reducing memory usage in tasks, increasing container memory limits, or setting task_acks_late so tasks are redelivered to a healthy worker.
Reading the Stack Trace
Here's what each line means:
- WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL) Job: 42'): Signal 9 (SIGKILL) indicates the OS forcefully terminated the worker — likely due to memory exhaustion or the hard time limit.
- File "/app/venv/lib/python3.12/site-packages/billiard/pool.py", line 1265, in mark_as_worker_lost: The billiard process pool detects the worker child process has exited and marks the task as lost.
- File "/app/venv/lib/python3.12/site-packages/celery/concurrency/prefork.py", line 83, in on_hard_timeout: The prefork pool's timeout handler fired, indicating the task exceeded the hard time limit.
Common Causes
1. Task consumes too much memory
The task loads large files or datasets into memory, causing the OS OOM killer to terminate the worker process.
@celery.task
def process_images(image_paths):
images = [Image.open(p) for p in image_paths] # loads all images into memory
for img in images:
process(img)
2. Container memory limit too low
The Docker container or Kubernetes pod has a memory limit that is too small for the task's workload.
# docker-compose.yml
worker:
mem_limit: 256m # too small for image processing
3. Hard time limit exceeded
The task exceeded the hard time limit and Celery sent SIGKILL to the worker process.
@celery.task(time_limit=30) # hard kill after 30 seconds
def long_running_task():
process_large_dataset() # takes 5+ minutes
The Fix
Process images one at a time using a context manager to release memory after each one. Set worker_max_memory_per_child to automatically restart workers before they accumulate too much memory. Adjust time limits to realistic values and always set soft_time_limit below time_limit to allow graceful cleanup.
@celery.task
def process_images(image_paths):
images = [Image.open(p) for p in image_paths]
for img in images:
process(img)
@celery.task(bind=True, time_limit=600, soft_time_limit=540)
def process_images(self, image_paths):
for path in image_paths:
with Image.open(path) as img:
process(img)
# Process one at a time, releasing memory between iterations
# Also set: worker_max_memory_per_child = 200000 # restart worker after 200 MB
Testing the Fix
import pytest
import tempfile
from PIL import Image
from app.tasks import process_images
from app import create_app
@pytest.fixture
def app():
app = create_app()
app.config['CELERY_ALWAYS_EAGER'] = True
app.config['TESTING'] = True
return app
def test_process_single_image(app, tmp_path):
img_path = tmp_path / 'test.png'
Image.new('RGB', (100, 100)).save(img_path)
with app.app_context():
result = process_images([str(img_path)])
assert result is not None
def test_process_empty_list(app):
with app.app_context():
result = process_images([])
assert result is not None
def test_process_multiple_images(app, tmp_path):
paths = []
for i in range(5):
p = tmp_path / f'test_{i}.png'
Image.new('RGB', (100, 100)).save(p)
paths.append(str(p))
with app.app_context():
result = process_images(paths)
Run your tests:
pytest tests/ -v
Pushing Through CI/CD
git checkout -b fix/celery-worker-memory,git add app/tasks.py celeryconfig.py,git commit -m "fix: process images sequentially and set memory limits for workers",git push origin fix/celery-worker-memory
Your CI config should look something like this:
name: CI
on:
pull_request:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
services:
redis:
image: redis:7-alpine
ports:
- 6379:6379
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.12'
cache: 'pip'
- run: pip install -r requirements.txt
- run: pytest tests/ -v --tb=short
env:
CELERY_BROKER_URL: redis://localhost:6379/0
The Full Manual Process: 18 Steps
Here's every step you just went through to fix this one bug:
- Notice the error alert or see it in your monitoring tool
- Open the error dashboard and read the stack trace
- Identify the file and line number from the stack trace
- Open your IDE and navigate to the file
- Read the surrounding code to understand context
- Reproduce the error locally
- Identify the root cause
- Write the fix
- Run the test suite locally
- Fix any failing tests
- Write new tests covering the edge case
- Run the full test suite again
- Create a new git branch
- Commit and push your changes
- Open a pull request
- Wait for code review
- Merge and deploy to production
- Monitor production to confirm the error is resolved
Total time: 30-60 minutes. For one bug.
Or Let bugstack Fix It in Under 2 minutes
Every step above? bugstack does it automatically.
Step 1: Install the SDK
pip install bugstack
Step 2: Initialize
import bugstack
bugstack.init(api_key=os.environ["BUGSTACK_API_KEY"])
Step 3: There is no step 3.
bugstack handles everything from here:
- Captures the stack trace and request context
- Pulls the relevant source files from your GitHub repo
- Analyzes the error and understands the code context
- Generates a minimal, verified fix
- Runs your existing test suite
- Pushes through your CI/CD pipeline
- Deploys to production (or opens a PR for review)
Time from error to fix deployed: Under 2 minutes.
Human involvement: zero.
Try bugstack Free →No credit card. 5-minute setup. Cancel anytime.
Deploying the Fix (Manual Path)
- Run pytest locally to verify images are processed without memory spikes.
- Open a pull request with the memory optimization and config changes.
- Wait for CI checks to pass on the PR.
- Have a teammate review and approve the PR.
- Merge to main and monitor worker memory usage in staging.
Frequently Asked Questions
BugStack profiles memory usage during test execution, verifies workers stay within limits, and runs your full test suite before marking it safe to deploy.
BugStack never pushes directly to production. Every fix goes through a pull request with full CI checks, so your team can review it before merging.
It is a Celery setting that automatically restarts a worker after it has used the specified amount of memory (in KB). This prevents memory leaks from accumulating.
Check dmesg or /var/log/syslog for OOM killer messages. If the worker was killed by Celery's hard time limit, the Celery logs will show 'Hard time limit exceeded'.