Fix Error: Worker 1 died with code 1, signal null in Node.js
This error means a Node.js cluster worker process crashed and exited with a non-zero code. Common causes include unhandled exceptions, out-of-memory kills, or segmentation faults in native modules. Fix it by adding uncaughtException handling in workers and implementing automatic restart with crash loop detection.
Reading the Stack Trace
Here's what each line means:
- at Worker.exitedAfterDisconnect (node:internal/cluster/primary:282:9): The cluster primary process detected that a worker exited unexpectedly (not due to a graceful disconnect).
- at Process.ChildProcess._handle.onexit (node:internal/child_process:291:12): The child process handle reported that the worker's OS process terminated with exit code 1.
- at cluster.on (src/cluster.js:24:5): Your cluster setup at line 24 listens for the 'exit' event but may not be restarting workers or logging details.
Common Causes
1. Unhandled exception in worker
The worker process encounters a thrown error that is not caught, causing it to crash immediately.
if (cluster.isWorker) {
const app = require('./app');
app.listen(3000);
// No uncaughtException handler, any unhandled error kills the worker
}
2. Out of memory in worker process
A memory leak or large operation causes the worker to exceed its heap limit and get killed by the OS.
// Worker accumulates data without limits
const cache = [];
app.get('/data', (req, res) => {
cache.push(expensiveComputation()); // Unbounded growth
res.json(cache);
});
3. No automatic restart logic
The cluster primary does not restart workers when they die, leaving the application with reduced capacity.
if (cluster.isPrimary) {
for (let i = 0; i < numCPUs; i++) cluster.fork();
// No 'exit' handler to restart crashed workers
}
The Fix
Add an 'exit' handler on the cluster primary that automatically restarts dead workers with a delay. Include crash loop detection to stop restarting if workers die too frequently. Add an uncaughtException handler in workers to log the error before exiting.
const cluster = require('cluster');
const os = require('os');
if (cluster.isPrimary) {
for (let i = 0; i < os.cpus().length; i++) {
cluster.fork();
}
} else {
require('./app').listen(3000);
}
const cluster = require('cluster');
const os = require('os');
const RESTART_DELAY = 1000;
const MAX_RESTARTS = 5;
const RESTART_WINDOW = 60000;
const restartTimestamps = [];
if (cluster.isPrimary) {
for (let i = 0; i < os.cpus().length; i++) {
cluster.fork();
}
cluster.on('exit', (worker, code, signal) => {
console.error(`Worker ${worker.process.pid} died (code: ${code}, signal: ${signal})`);
const now = Date.now();
restartTimestamps.push(now);
const recentRestarts = restartTimestamps.filter((t) => now - t < RESTART_WINDOW);
if (recentRestarts.length > MAX_RESTARTS) {
console.error('Too many worker restarts, possible crash loop. Not restarting.');
return;
}
setTimeout(() => cluster.fork(), RESTART_DELAY);
});
} else {
process.on('uncaughtException', (err) => {
console.error('Uncaught exception in worker:', err);
process.exit(1);
});
require('./app').listen(3000);
}
Testing the Fix
describe('Cluster restart logic', () => {
let restartTimestamps;
const MAX_RESTARTS = 5;
const RESTART_WINDOW = 60000;
beforeEach(() => {
restartTimestamps = [];
});
function shouldRestart() {
const now = Date.now();
restartTimestamps.push(now);
const recent = restartTimestamps.filter((t) => now - t < RESTART_WINDOW);
return recent.length <= MAX_RESTARTS;
}
it('allows restart when under the limit', () => {
expect(shouldRestart()).toBe(true);
expect(shouldRestart()).toBe(true);
});
it('blocks restart when crash loop detected', () => {
for (let i = 0; i < MAX_RESTARTS; i++) shouldRestart();
expect(shouldRestart()).toBe(false);
});
});
Run your tests:
npm test
Pushing Through CI/CD
git checkout -b fix/nodejs-cluster-fork-restart,git add src/cluster.js src/__tests__/cluster.test.js,git commit -m "fix: auto-restart cluster workers with crash loop detection",git push origin fix/nodejs-cluster-fork-restart
Your CI config should look something like this:
name: CI
on:
pull_request:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- run: npm ci
- run: npm test -- --coverage
- run: npm run lint
The Full Manual Process: 18 Steps
Here's every step you just went through to fix this one bug:
- Notice the error alert or see it in your monitoring tool
- Open the error dashboard and read the stack trace
- Identify the file and line number from the stack trace
- Open your IDE and navigate to the file
- Read the surrounding code to understand context
- Reproduce the error locally
- Identify the root cause
- Write the fix
- Run the test suite locally
- Fix any failing tests
- Write new tests covering the edge case
- Run the full test suite again
- Create a new git branch
- Commit and push your changes
- Open a pull request
- Wait for code review
- Merge and deploy to production
- Monitor production to confirm the error is resolved
Total time: 30-60 minutes. For one bug.
Or Let bugstack Fix It in Under 2 minutes
Every step above? bugstack does it automatically.
Step 1: Install the SDK
npm install bugstack-sdk
Step 2: Initialize
const { initBugStack } = require('bugstack-sdk')
initBugStack({ apiKey: process.env.BUGSTACK_API_KEY })
Step 3: There is no step 3.
bugstack handles everything from here:
- Captures the stack trace and request context
- Pulls the relevant source files from your GitHub repo
- Analyzes the error and understands the code context
- Generates a minimal, verified fix
- Runs your existing test suite
- Pushes through your CI/CD pipeline
- Deploys to production (or opens a PR for review)
Time from error to fix deployed: Under 2 minutes.
Human involvement: zero.
Try bugstack Free →No credit card. 5-minute setup. Cancel anytime.
Deploying the Fix (Manual Path)
- Add exit event handling and worker restart logic to the cluster primary.
- Implement crash loop detection to avoid infinite restart cycles.
- Add uncaughtException handlers in worker processes.
- Run tests and verify restart behavior.
- Open a PR, merge after CI, and monitor worker stability in staging.
Frequently Asked Questions
BugStack runs the fix through your existing test suite, generates additional edge-case tests, and validates that no other modules are affected before marking it safe to deploy.
BugStack never pushes directly to production. Every fix goes through a pull request with full CI checks, so your team can review it before merging.
PM2 provides cluster management with built-in restart policies, logging, and monitoring out of the box. For production, PM2 is often simpler than implementing custom cluster logic.
Exit code 137 means the process was killed by SIGKILL (128 + 9), typically by the OS OOM killer. This indicates the worker exceeded available memory.