Celery · Python

Fix MaxRetriesExceededError: celery.exceptions.MaxRetriesExceededError: Can't retry app.tasks.send_email[abc123] max retries exceeded in Celery

This error occurs when a Celery task has failed and exhausted all configured retry attempts. The underlying error keeps recurring across retries, typically because of a persistent external service failure. Fix it by implementing exponential backoff, catching the MaxRetriesExceeded exception to handle the final failure gracefully, and adding a dead-letter mechanism for failed tasks.

Reading the Stack Trace

Traceback (most recent call last): File "/app/venv/lib/python3.12/site-packages/celery/app/trace.py", line 477, in trace_task R = retval = fun(*args, **kwargs) File "/app/venv/lib/python3.12/site-packages/celery/app/trace.py", line 460, in __protected_call__ return self.run(*args, **kwargs) File "/app/app/tasks.py", line 18, in send_email self.retry(exc=exc) File "/app/venv/lib/python3.12/site-packages/celery/app/task.py", line 763, in retry raise MaxRetriesExceededError( celery.exceptions.MaxRetriesExceededError: Can't retry app.tasks.send_email[abc123] max retries exceeded

Here's what each line means:

File "/app/app/tasks.py", line 18, in send_email: The task's retry call triggers after catching an exception from the email service.
File "/app/venv/lib/python3.12/site-packages/celery/app/task.py", line 763, in retry: Celery's retry mechanism detects that the maximum retry count (default: 3) has been reached.
celery.exceptions.MaxRetriesExceededError: Can't retry app.tasks.send_email[abc123] max retries exceeded: All retry attempts have failed. The task is marked as permanently failed unless this exception is handled.

Common Causes

1. External service persistently down

The email provider or API is experiencing an extended outage that outlasts all retry attempts.

@celery.task(bind=True, max_retries=3)
def send_email(self, to, subject, body):
    try:
        smtp.send(to, subject, body)
    except SMTPException as exc:
        self.retry(exc=exc)  # retries 3 times, then crashes

2. No exponential backoff between retries

Retries happen immediately, hammering the failing service instead of giving it time to recover.

@celery.task(bind=True, max_retries=3)
def send_email(self, to, subject, body):
    try:
        smtp.send(to, subject, body)
    except SMTPException as exc:
        self.retry(exc=exc)  # no countdown, retries instantly

3. No final failure handler

The task does not catch MaxRetriesExceededError, so the failure is not logged or routed to a dead-letter queue.

# No try/except around self.retry()
# MaxRetriesExceededError propagates as an unhandled error

bugstack fixes this class of error automatically — in under 2 minutes.

Start Free Trial →

The Fix

Add exponential backoff by calculating the countdown as 60 * 2^retries, giving the external service increasing time to recover. Catch MaxRetriesExceededError to log the permanent failure and alert the operations team so the issue can be investigated.

Before (broken)

@celery.task(bind=True, max_retries=3)
def send_email(self, to, subject, body):
    try:
        smtp.send(to, subject, body)
    except SMTPException as exc:
        self.retry(exc=exc)

After (fixed)

from celery.exceptions import MaxRetriesExceededError

@celery.task(bind=True, max_retries=5, default_retry_delay=60)
def send_email(self, to, subject, body):
    try:
        smtp.send(to, subject, body)
    except SMTPException as exc:
        try:
            self.retry(exc=exc, countdown=60 * (2 ** self.request.retries))
        except MaxRetriesExceededError:
            # Log to dead-letter queue and alert ops
            log_failed_email(to, subject, str(exc))
            notify_ops(f'Email to {to} failed after {self.max_retries} retries: {exc}')
            raise

Testing the Fix

import pytest
from unittest.mock import patch, MagicMock
from celery.exceptions import MaxRetriesExceededError
from app.tasks import send_email
from app import create_app

@pytest.fixture
def app():
    app = create_app()
    app.config['CELERY_ALWAYS_EAGER'] = True
    app.config['CELERY_EAGER_PROPAGATES'] = True
    app.config['TESTING'] = True
    return app

@patch('app.tasks.smtp')
def test_successful_email(mock_smtp, app):
    with app.app_context():
        result = send_email('user@example.com', 'Hello', 'Body')
        mock_smtp.send.assert_called_once()

@patch('app.tasks.log_failed_email')
@patch('app.tasks.smtp')
def test_max_retries_logs_failure(mock_smtp, mock_log, app):
    mock_smtp.send.side_effect = Exception('SMTP down')
    with app.app_context():
        with pytest.raises(MaxRetriesExceededError):
            send_email('user@example.com', 'Hello', 'Body')
        mock_log.assert_called_once()

Run your tests:

pytest tests/ -v

Pushing Through CI/CD

git checkout -b fix/celery-retry-exceeded-handling
git add app/tasks.py tests/test_tasks.py
git commit -m "fix: add exponential backoff and dead-letter handling for retries"
git push origin fix/celery-retry-exceeded-handling

Your CI config should look something like this:

name: CI
on:
  pull_request:
    branches: [main]
jobs:
  test:
    runs-on: ubuntu-latest
    services:
      redis:
        image: redis:7-alpine
        ports:
          - 6379:6379
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'
          cache: 'pip'
      - run: pip install -r requirements.txt
      - run: pytest tests/ -v --tb=short
        env:
          CELERY_BROKER_URL: redis://localhost:6379/0

The Full Manual Process: 18 Steps

Here's every step you just went through to fix this one bug:

Notice the error alert or see it in your monitoring tool
Open the error dashboard and read the stack trace
Identify the file and line number from the stack trace
Open your IDE and navigate to the file
Read the surrounding code to understand context
Reproduce the error locally
Identify the error source
Write the fix
Run the test suite locally
Fix any failing tests
Write new tests covering the edge case
Run the full test suite again
Create a new git branch
Commit and push your changes
Open a pull request
Wait for code review
Merge and deploy to production
Monitor production to confirm the error is resolved

Total time: 30-60 minutes. For one bug.

Or Let bugstack Fix It in Under 2 minutes

Every step above? bugstack does it automatically.

Step 1: Install the SDK

pip install bugstack

Step 2: Initialize

import bugstack

bugstack.init(api_key=os.environ["BUGSTACK_API_KEY"])

Step 3: There is no step 3.

bugstack handles everything from here:

Captures the stack trace and request context
Pulls the relevant source files from your GitHub repo
Analyzes the error and understands the code context
Generates a minimal, validated fix
Runs your existing test suite
Pushes through your CI/CD pipeline
Deploys to production (or opens a PR for review)

Time from error to fix deployed: Under 2 minutes.

Human involvement: zero.

Try bugstack Free →

No credit card. 5-minute setup. Cancel anytime.

Deploying the Fix (Manual Path)

Run pytest locally to verify retry behavior and dead-letter logging.
Open a pull request with the retry and error handling changes.
Wait for CI checks to pass on the PR.
Have a teammate review and approve the PR.
Merge to main and monitor retry counts and dead-letter logs in staging.

Frequently Asked Questions

How does BugStack know the fix is safe to deploy?

BugStack simulates SMTP failures, verifies exponential backoff timing, confirms dead-letter logging works, and runs your full suite before marking it safe.

What if BugStack generates a bad fix?

BugStack never pushes directly to production. Every fix goes through a pull request with full CI checks, so your team can review it before merging.

What is a good max_retries value?

5 retries with exponential backoff covers most transient failures. For critical tasks, consider higher values with a cap on maximum delay.

How do I reprocess dead-letter tasks?

Store failed tasks in a database table, then create a management command or admin UI to resubmit them to the Celery queue when the external service recovers.

Stop fixing Python errors manually.

bugstack catches runtime errors, writes the fix, and opens a tested PR — in under 2 minutes.

Start Free Trial → Book a Demo →