Fix MaxRetriesExceededError: celery.exceptions.MaxRetriesExceededError: Can't retry app.tasks.send_email[abc123] max retries exceeded in Celery
This error occurs when a Celery task has failed and exhausted all configured retry attempts. The underlying error keeps recurring across retries, typically because of a persistent external service failure. Fix it by implementing exponential backoff, catching the MaxRetriesExceeded exception to handle the final failure gracefully, and adding a dead-letter mechanism for failed tasks.
Reading the Stack Trace
Here's what each line means:
- File "/app/app/tasks.py", line 18, in send_email: The task's retry call triggers after catching an exception from the email service.
- File "/app/venv/lib/python3.12/site-packages/celery/app/task.py", line 763, in retry: Celery's retry mechanism detects that the maximum retry count (default: 3) has been reached.
- celery.exceptions.MaxRetriesExceededError: Can't retry app.tasks.send_email[abc123] max retries exceeded: All retry attempts have failed. The task is marked as permanently failed unless this exception is handled.
Common Causes
1. External service persistently down
The email provider or API is experiencing an extended outage that outlasts all retry attempts.
@celery.task(bind=True, max_retries=3)
def send_email(self, to, subject, body):
try:
smtp.send(to, subject, body)
except SMTPException as exc:
self.retry(exc=exc) # retries 3 times, then crashes
2. No exponential backoff between retries
Retries happen immediately, hammering the failing service instead of giving it time to recover.
@celery.task(bind=True, max_retries=3)
def send_email(self, to, subject, body):
try:
smtp.send(to, subject, body)
except SMTPException as exc:
self.retry(exc=exc) # no countdown, retries instantly
3. No final failure handler
The task does not catch MaxRetriesExceededError, so the failure is not logged or routed to a dead-letter queue.
# No try/except around self.retry()
# MaxRetriesExceededError propagates as an unhandled error
The Fix
Add exponential backoff by calculating the countdown as 60 * 2^retries, giving the external service increasing time to recover. Catch MaxRetriesExceededError to log the permanent failure and alert the operations team so the issue can be investigated.
@celery.task(bind=True, max_retries=3)
def send_email(self, to, subject, body):
try:
smtp.send(to, subject, body)
except SMTPException as exc:
self.retry(exc=exc)
from celery.exceptions import MaxRetriesExceededError
@celery.task(bind=True, max_retries=5, default_retry_delay=60)
def send_email(self, to, subject, body):
try:
smtp.send(to, subject, body)
except SMTPException as exc:
try:
self.retry(exc=exc, countdown=60 * (2 ** self.request.retries))
except MaxRetriesExceededError:
# Log to dead-letter queue and alert ops
log_failed_email(to, subject, str(exc))
notify_ops(f'Email to {to} failed after {self.max_retries} retries: {exc}')
raise
Testing the Fix
import pytest
from unittest.mock import patch, MagicMock
from celery.exceptions import MaxRetriesExceededError
from app.tasks import send_email
from app import create_app
@pytest.fixture
def app():
app = create_app()
app.config['CELERY_ALWAYS_EAGER'] = True
app.config['CELERY_EAGER_PROPAGATES'] = True
app.config['TESTING'] = True
return app
@patch('app.tasks.smtp')
def test_successful_email(mock_smtp, app):
with app.app_context():
result = send_email('user@example.com', 'Hello', 'Body')
mock_smtp.send.assert_called_once()
@patch('app.tasks.log_failed_email')
@patch('app.tasks.smtp')
def test_max_retries_logs_failure(mock_smtp, mock_log, app):
mock_smtp.send.side_effect = Exception('SMTP down')
with app.app_context():
with pytest.raises(MaxRetriesExceededError):
send_email('user@example.com', 'Hello', 'Body')
mock_log.assert_called_once()
Run your tests:
pytest tests/ -v
Pushing Through CI/CD
git checkout -b fix/celery-retry-exceeded-handling,git add app/tasks.py tests/test_tasks.py,git commit -m "fix: add exponential backoff and dead-letter handling for retries",git push origin fix/celery-retry-exceeded-handling
Your CI config should look something like this:
name: CI
on:
pull_request:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
services:
redis:
image: redis:7-alpine
ports:
- 6379:6379
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.12'
cache: 'pip'
- run: pip install -r requirements.txt
- run: pytest tests/ -v --tb=short
env:
CELERY_BROKER_URL: redis://localhost:6379/0
The Full Manual Process: 18 Steps
Here's every step you just went through to fix this one bug:
- Notice the error alert or see it in your monitoring tool
- Open the error dashboard and read the stack trace
- Identify the file and line number from the stack trace
- Open your IDE and navigate to the file
- Read the surrounding code to understand context
- Reproduce the error locally
- Identify the root cause
- Write the fix
- Run the test suite locally
- Fix any failing tests
- Write new tests covering the edge case
- Run the full test suite again
- Create a new git branch
- Commit and push your changes
- Open a pull request
- Wait for code review
- Merge and deploy to production
- Monitor production to confirm the error is resolved
Total time: 30-60 minutes. For one bug.
Or Let bugstack Fix It in Under 2 minutes
Every step above? bugstack does it automatically.
Step 1: Install the SDK
pip install bugstack
Step 2: Initialize
import bugstack
bugstack.init(api_key=os.environ["BUGSTACK_API_KEY"])
Step 3: There is no step 3.
bugstack handles everything from here:
- Captures the stack trace and request context
- Pulls the relevant source files from your GitHub repo
- Analyzes the error and understands the code context
- Generates a minimal, verified fix
- Runs your existing test suite
- Pushes through your CI/CD pipeline
- Deploys to production (or opens a PR for review)
Time from error to fix deployed: Under 2 minutes.
Human involvement: zero.
Try bugstack Free →No credit card. 5-minute setup. Cancel anytime.
Deploying the Fix (Manual Path)
- Run pytest locally to verify retry behavior and dead-letter logging.
- Open a pull request with the retry and error handling changes.
- Wait for CI checks to pass on the PR.
- Have a teammate review and approve the PR.
- Merge to main and monitor retry counts and dead-letter logs in staging.
Frequently Asked Questions
BugStack simulates SMTP failures, verifies exponential backoff timing, confirms dead-letter logging works, and runs your full suite before marking it safe.
BugStack never pushes directly to production. Every fix goes through a pull request with full CI checks, so your team can review it before merging.
5 retries with exponential backoff covers most transient failures. For critical tasks, consider higher values with a cap on maximum delay.
Store failed tasks in a database table, then create a management command or admin UI to resubmit them to the Celery queue when the external service recovers.