Fix Sidekiq::RetryExhausted: Job retries exhausted after 25 attempts in Sidekiq
This error occurs when a Sidekiq job has failed and been retried the maximum number of times without succeeding. After exhausting retries, the job moves to the Dead Job queue. Investigate the root cause of the failure, fix the underlying issue, and re-enqueue the job from the Dead queue or programmatically using the Sidekiq API.
Reading the Stack Trace
Here's what each line means:
- sidekiq (7.2.1) lib/sidekiq/job_retry.rb:145:in `retries_exhausted': Sidekiq's retry handler determines the job has exceeded its maximum retry count.
- app/jobs/email_notification_job.rb:12:in `perform': The email notification job at line 12 has been failing consistently across all retry attempts.
- sidekiq (7.2.1) lib/sidekiq/processor.rb:160:in `execute_job': The processor executes the job which fails again on the final retry attempt.
Common Causes
1. Persistent external service failure
The job depends on an external service that has been down for the entire retry window.
class EmailNotificationJob
include Sidekiq::Job
def perform(user_id, template)
user = User.find(user_id)
ExternalEmailService.send(to: user.email, template: template)
# ExternalEmailService has been returning 503 for days
end
end
2. Data deleted between retries
The record the job operates on was deleted, causing every retry to fail.
class OrderProcessingJob
include Sidekiq::Job
def perform(order_id)
order = Order.find(order_id) # Raises RecordNotFound if deleted
order.process!
end
end
3. Missing error handling
The job does not handle expected error cases, causing all retries to fail the same way.
class PaymentJob
include Sidekiq::Job
def perform(payment_id)
payment = Payment.find(payment_id)
gateway_response = PaymentGateway.charge(payment)
# Does not handle declined cards or network errors
payment.update!(status: 'completed')
end
end
The Fix
Add a sidekiq_retries_exhausted callback to handle the case when all retries fail. Use find_by to gracefully handle deleted records. After a threshold of failures, fall back to an alternative email delivery method instead of continuing to retry.
class EmailNotificationJob
include Sidekiq::Job
def perform(user_id, template)
user = User.find(user_id)
ExternalEmailService.send(to: user.email, template: template)
end
end
class EmailNotificationJob
include Sidekiq::Job
sidekiq_options retry: 10
sidekiq_retries_exhausted do |job, exception|
Rails.logger.error("Email job #{job['jid']} exhausted retries: #{exception.message}")
FailedJobNotifier.alert(job, exception)
end
def perform(user_id, template)
user = User.find_by(id: user_id)
return if user.nil?
ExternalEmailService.send(to: user.email, template: template)
rescue ExternalEmailService::ServiceUnavailable => e
raise if executions < 5
FallbackMailer.send_email(user, template).deliver_later
end
end
Testing the Fix
require 'rails_helper'
require 'sidekiq/testing'
RSpec.describe EmailNotificationJob do
before { Sidekiq::Testing.inline! }
it 'sends email for existing user' do
user = create(:user)
expect(ExternalEmailService).to receive(:send)
EmailNotificationJob.perform_async(user.id, 'welcome')
end
it 'handles deleted user gracefully' do
expect {
EmailNotificationJob.perform_async(999999, 'welcome')
}.not_to raise_error
end
it 'falls back on persistent failure' do
user = create(:user)
allow(ExternalEmailService).to receive(:send).and_raise(
ExternalEmailService::ServiceUnavailable
)
expect(FallbackMailer).to receive_message_chain(:send_email, :deliver_later)
job = EmailNotificationJob.new
allow(job).to receive(:executions).and_return(5)
job.perform(user.id, 'welcome')
end
end
Run your tests:
bundle exec rspec spec/jobs/email_notification_job_spec.rb
Pushing Through CI/CD
git checkout -b fix/sidekiq-retry-exhausted,git add app/jobs/email_notification_job.rb,git commit -m "fix: add retry exhaustion handler and fallback for email jobs",git push origin fix/sidekiq-retry-exhausted
Your CI config should look something like this:
name: CI
on:
pull_request:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
services:
postgres:
image: postgres:16
env:
POSTGRES_PASSWORD: postgres
ports: ['5432:5432']
redis:
image: redis:7
ports: ['6379:6379']
steps:
- uses: actions/checkout@v4
- uses: ruby/setup-ruby@v1
with:
ruby-version: '3.3'
bundler-cache: true
- run: bin/rails db:setup
- run: bundle exec rspec
The Full Manual Process: 18 Steps
Here's every step you just went through to fix this one bug:
- Notice the error alert or see it in your monitoring tool
- Open the error dashboard and read the stack trace
- Identify the file and line number from the stack trace
- Open your IDE and navigate to the file
- Read the surrounding code to understand context
- Reproduce the error locally
- Identify the root cause
- Write the fix
- Run the test suite locally
- Fix any failing tests
- Write new tests covering the edge case
- Run the full test suite again
- Create a new git branch
- Commit and push your changes
- Open a pull request
- Wait for code review
- Merge and deploy to production
- Monitor production to confirm the error is resolved
Total time: 30-60 minutes. For one bug.
Or Let bugstack Fix It in Under 2 minutes
Every step above? bugstack does it automatically.
Step 1: Install the SDK
gem install bugstack
Step 2: Initialize
require 'bugstack'
Bugstack.init(api_key: ENV['BUGSTACK_API_KEY'])
Step 3: There is no step 3.
bugstack handles everything from here:
- Captures the stack trace and request context
- Pulls the relevant source files from your GitHub repo
- Analyzes the error and understands the code context
- Generates a minimal, verified fix
- Runs your existing test suite
- Pushes through your CI/CD pipeline
- Deploys to production (or opens a PR for review)
Time from error to fix deployed: Under 2 minutes.
Human involvement: zero.
Try bugstack Free →No credit card. 5-minute setup. Cancel anytime.
Deploying the Fix (Manual Path)
- Add sidekiq_retries_exhausted callback for alerting.
- Handle edge cases like deleted records.
- Add fallback strategies for critical jobs.
- Open a pull request.
- Merge and monitor the dead job queue in staging.
Frequently Asked Questions
BugStack runs the fix through your existing test suite, generates additional edge-case tests, and validates that no other components are affected before marking it safe to deploy.
BugStack never pushes directly to production. Every fix goes through a pull request with full CI checks, so your team can review it before merging.
Use the Sidekiq Web UI to retry individual dead jobs, or programmatically with Sidekiq::DeadSet.new.each(&:retry). Fix the root cause first to prevent re-failure.
Sidekiq retries 25 times with exponential backoff. The formula is (retry_count ** 4) + 15 + (rand(10) * (retry_count + 1)) seconds, spanning about 21 days total.