Fix Sidekiq::DeadSet::Error: Job moved to dead set after exhausting all retries in Sidekiq
This error indicates a job was moved to Sidekiq's Dead Set after exhausting all retry attempts. Dead jobs remain in Redis for six months by default. Investigate the original error, fix the root cause, then re-enqueue the dead jobs using the Sidekiq Web UI or the DeadSet API programmatically.
Reading the Stack Trace
Here's what each line means:
- sidekiq (7.2.1) lib/sidekiq/job_retry.rb:185:in `send_to_morgue': Sidekiq moves the failed job to the dead set (morgue) after all retries are exhausted.
- sidekiq (7.2.1) lib/sidekiq/job_retry.rb:145:in `retries_exhausted': The retry handler determines the job has exceeded its maximum retry count.
- app/jobs/payment_processing_job.rb:18:in `perform': The payment processing job failed on its final retry attempt.
Common Causes
1. Persistent dependency failure
The job depends on an external payment gateway that has been unavailable during the entire retry window.
class PaymentProcessingJob
include Sidekiq::Job
def perform(payment_id)
payment = Payment.find(payment_id)
gateway = PaymentGateway.new
gateway.charge(payment.amount, payment.card_token)
# Gateway has been down for 21 days (entire retry window)
end
end
2. Data integrity issue
The job references data that was deleted or corrupted, causing every attempt to fail.
class InvoiceJob
include Sidekiq::Job
def perform(invoice_id)
invoice = Invoice.find(invoice_id) # Record was deleted
InvoiceMailer.send(invoice).deliver_now
end
end
3. Configuration error
A misconfiguration causes the job to fail consistently regardless of retries.
class S3UploadJob
include Sidekiq::Job
def perform(file_path)
Aws::S3::Client.new(region: 'wrong-region')
.put_object(bucket: 'my-bucket', key: file_path)
# Wrong region causes auth failure every time
end
end
The Fix
Add a sidekiq_retries_exhausted callback that marks the payment as failed and sends a notification. Handle non-retryable errors like card declines immediately without retrying. This ensures users and admins are informed when payments fail permanently.
class PaymentProcessingJob
include Sidekiq::Job
def perform(payment_id)
payment = Payment.find(payment_id)
PaymentGateway.new.charge(payment.amount, payment.card_token)
end
end
class PaymentProcessingJob
include Sidekiq::Job
sidekiq_options retry: 15, dead: true
sidekiq_retries_exhausted do |job, exception|
payment = Payment.find_by(id: job['args'].first)
if payment
payment.update!(status: 'failed', failure_reason: exception.message)
PaymentFailureNotifier.notify(payment, exception)
end
Rails.logger.error("Payment job #{job['jid']} dead: #{exception.message}")
end
def perform(payment_id)
payment = Payment.find(payment_id)
result = PaymentGateway.new.charge(payment.amount, payment.card_token)
payment.update!(status: 'completed', transaction_id: result.id)
rescue PaymentGateway::CardDeclined => e
payment.update!(status: 'declined', failure_reason: e.message)
# Do not retry on declined cards
end
end
Testing the Fix
require 'rails_helper'
require 'sidekiq/testing'
RSpec.describe PaymentProcessingJob do
before { Sidekiq::Testing.inline! }
let(:payment) { create(:payment, status: 'pending') }
it 'marks payment as completed on success' do
allow(PaymentGateway).to receive_message_chain(:new, :charge)
.and_return(OpenStruct.new(id: 'txn_123'))
PaymentProcessingJob.perform_async(payment.id)
expect(payment.reload.status).to eq('completed')
end
it 'marks payment as declined without retrying' do
allow(PaymentGateway).to receive_message_chain(:new, :charge)
.and_raise(PaymentGateway::CardDeclined)
expect {
PaymentProcessingJob.perform_async(payment.id)
}.not_to raise_error
expect(payment.reload.status).to eq('declined')
end
end
Run your tests:
bundle exec rspec spec/jobs/payment_processing_job_spec.rb
Pushing Through CI/CD
git checkout -b fix/sidekiq-dead-job-handling,git add app/jobs/payment_processing_job.rb,git commit -m "fix: add retry exhaustion handler for payment job dead jobs",git push origin fix/sidekiq-dead-job-handling
Your CI config should look something like this:
name: CI
on:
pull_request:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
services:
postgres:
image: postgres:16
env:
POSTGRES_PASSWORD: postgres
ports: ['5432:5432']
redis:
image: redis:7
ports: ['6379:6379']
steps:
- uses: actions/checkout@v4
- uses: ruby/setup-ruby@v1
with:
ruby-version: '3.3'
bundler-cache: true
- run: bin/rails db:setup
- run: bundle exec rspec
The Full Manual Process: 18 Steps
Here's every step you just went through to fix this one bug:
- Notice the error alert or see it in your monitoring tool
- Open the error dashboard and read the stack trace
- Identify the file and line number from the stack trace
- Open your IDE and navigate to the file
- Read the surrounding code to understand context
- Reproduce the error locally
- Identify the root cause
- Write the fix
- Run the test suite locally
- Fix any failing tests
- Write new tests covering the edge case
- Run the full test suite again
- Create a new git branch
- Commit and push your changes
- Open a pull request
- Wait for code review
- Merge and deploy to production
- Monitor production to confirm the error is resolved
Total time: 30-60 minutes. For one bug.
Or Let bugstack Fix It in Under 2 minutes
Every step above? bugstack does it automatically.
Step 1: Install the SDK
gem install bugstack
Step 2: Initialize
require 'bugstack'
Bugstack.init(api_key: ENV['BUGSTACK_API_KEY'])
Step 3: There is no step 3.
bugstack handles everything from here:
- Captures the stack trace and request context
- Pulls the relevant source files from your GitHub repo
- Analyzes the error and understands the code context
- Generates a minimal, verified fix
- Runs your existing test suite
- Pushes through your CI/CD pipeline
- Deploys to production (or opens a PR for review)
Time from error to fix deployed: Under 2 minutes.
Human involvement: zero.
Try bugstack Free →No credit card. 5-minute setup. Cancel anytime.
Deploying the Fix (Manual Path)
- Add sidekiq_retries_exhausted callbacks to critical jobs.
- Handle non-retryable errors explicitly.
- Set up monitoring for the dead job queue.
- Open a pull request.
- Merge and verify dead job handling in staging.
Frequently Asked Questions
BugStack runs the fix through your existing test suite, generates additional edge-case tests, and validates that no other components are affected before marking it safe to deploy.
BugStack never pushes directly to production. Every fix goes through a pull request with full CI checks, so your team can review it before merging.
By default, dead jobs are kept for 6 months and the dead set is limited to 10,000 entries. You can configure this with Sidekiq's dead_max_jobs and dead_timeout_in_seconds options.
Use Sidekiq::DeadSet.new.each(&:retry) to retry all dead jobs, or use the Sidekiq Web UI for selective retries. Always fix the root cause first.