Production Guide | RustQueue

Foundation

Crash-only design

Every job is persisted to storage before it is acknowledged. There is no in-memory state that would be lost on a crash. You can kill -9 the process at any point and no committed work will be silently dropped.

Delivery is at-least-once: a job being processed when the process died will be retried automatically. Handlers should be idempotent, or use the unique_key field to deduplicate on re-delivery.

Storage

Durability modes

Choose the backend that matches your availability/throughput tradeoff.

Backend	Durability	Throughput	Crash loss window	When to use
`redb` (default)	ACID, fsync per write	~314 push/s raw	None — zero data loss	Default production choice. Correct and zero-config.
Buffered redb (`.with_write_coalescing`)	Batched fsync	~19 K ops/s	One flush interval	High-write workloads where small loss windows are acceptable.
Hybrid (`RustQueue::hybrid`)	In-memory hot path + periodic snapshot	300 K+ ops/s	Up to `snapshot_interval`	Maximum throughput; explicitly trade durability for speed.
In-memory (`RustQueue::memory`)	None — lost on drop	Fastest	Total	Tests and ephemeral workloads only.

// Safest: ACID redb
let rq = RustQueue::redb("./jobs.db")?.build()?;

// Faster writes, tiny loss window:
use rustqueue::storage::BufferedRedbConfig;
let rq = RustQueue::redb("./jobs.db")?
    .with_write_coalescing(BufferedRedbConfig::default())
    .build()?;

// Maximum throughput, explicit durability tradeoff:
let rq = RustQueue::hybrid("./jobs.db")?.build()?;

Workers

run_worker is the managed entry point for processing jobs. It replaces a hand-rolled pull / ack / fail loop and handles operational details automatically:

Auto-acks a job when the handler returns Ok(()).
Auto-fails a job (triggering retry/DLQ logic) when the handler returns Err(e).
Auto-heartbeats the in-flight job so stall detection does not reclaim a healthy long-running job.
Starts housekeeping automatically on first call — no separate setup required.

use rustqueue::RustQueue;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let rq = RustQueue::redb("./jobs.db")?.build()?;

    rq.run_worker("emails", |job| async move {
        println!("sending email: {}", job.data["to"]);
        // return Ok to ack, Err to fail (→ retry or DLQ)
        Ok::<(), String>(())
    })
    .await?;

    Ok(())
}

run_worker runs until Ctrl-C and finishes the in-flight job before returning.

Embedding in a web server

Use run_worker_with_shutdown to stop the worker when a server shuts down rather than on Ctrl-C:

use tokio::sync::oneshot;

let (tx, rx) = oneshot::channel::<()>();
// In your server shutdown handler: tx.send(()).ok();

tokio::spawn(async move {
    rq.run_worker_with_shutdown("emails", handler, async {
        rx.await.ok();
    })
    .await
    .expect("worker failed");
});

CPU-bound handlers: A handler that never .awaits will block the tokio thread and prevent heartbeats from firing. After stall_timeout the job will be reclaimed. Wrap CPU-intensive sections with tokio::task::spawn_blocking, or raise stall_timeout.

Reliability

Retries & DLQ

Every job carries retry configuration. The defaults:

Setting	Default
`max_attempts`	3
`backoff`	Exponential
`backoff_delay_ms`	1 000 ms

When a job fails (handler returns Err, or stall detection fires), the engine applies backoff and re-queues — unless attempt ≥ max_attempts, at which point the job moves to the dead-letter queue.

Strategy	Delay formula	Example (base = 1 s)
`Fixed`	`base`	1 s, 1 s, 1 s
`Linear`	`base × attempt`	1 s, 2 s, 3 s
`Exponential` (default)	`base × 2^(attempt-1)`	1 s, 2 s, 4 s

Inspecting the DLQ

let dead = rq.get_dlq_jobs("emails", 50).await?;
for job in dead {
    println!("{}: {:?}", job.id, job.last_error);
}

Operations

Housekeeping

The housekeeping loop is a background tokio task (default tick: 1 s) that drives retries, schedules, and crash recovery. run_worker starts it automatically. If you use pull/ack directly, call it explicitly:

// Call once after building; safe to call multiple times (idempotent).
rq.start_housekeeping()?;

Required for correctness: without housekeeping, delayed jobs never become runnable, stalled jobs stay stuck forever, and schedules never fire.

What it does each tick

Promotes Delayed → Waiting when delay expires.
Fires cron and interval schedules that are due.
Fails jobs that have exceeded timeout_ms.
Detects stalled Active jobs (no heartbeat within stall_timeout).
Refreshes queue depth metrics.
Cleans up expired completed/failed/DLQ jobs (every ~1 min).

Tuning

use std::time::Duration;

let rq = RustQueue::redb("./jobs.db")?
    .stall_timeout(Duration::from_secs(60))    // default: 30 s
    .tick_interval(Duration::from_millis(500))  // default: 1 s
    .build()?;

Lifecycle

Graceful shutdown

run_worker catches Ctrl-C and finishes the in-flight job before returning. For programmatic shutdown (e.g., Axum shutdown signal):

use tokio::sync::watch;

let (shutdown_tx, shutdown_rx) = watch::channel(false);

rq.run_worker_with_shutdown("work", handler, async move {
    shutdown_rx.changed().await.ok();
})
.await?;

// Elsewhere, to trigger shutdown:
shutdown_tx.send(true).ok();

Jobs that have not yet been pulled remain in Waiting state and will be picked up on the next startup. No work is lost.

Tutorial

Crash-recovery walkthrough

The crash_recovery example demonstrates the full lifecycle: push a job, crash mid-processing, restart, and watch the job complete automatically.

Start the worker (Terminal 1)

Uses a short stall_timeout of 3 s so you don't wait long.
```
cargo run --example crash_recovery -- worker
```
Push a job (Terminal 2)
```
cargo run --example crash_recovery -- push
```
The worker picks it up and logs progress steps.
Simulate a crash

While the job is mid-run, kill the worker process:
```
kill -9 <pid printed at startup>
```
The job is now Active with no heartbeat. The database still holds it.
Restart and watch recovery
```
cargo run --example crash_recovery -- worker
```
After ~3 s, housekeeping detects no heartbeat, fails the job, and re-queues it. The new worker picks it up and completes it. No manual intervention. No data loss. No message broker.

Note: each crash consumes one retry attempt. With max_attempts=3 (the default), a job killed three times will land in the DLQ. Raise max_attempts for workloads that may experience frequent infrastructure interruptions.

Running RustQueue in production

Crash-only design

Durability modes

Workers

Embedding in a web server

Retries & DLQ

Inspecting the DLQ

Housekeeping

What it does each tick

Tuning

Graceful shutdown

Crash-recovery walkthrough

Start the worker (Terminal 1)

Push a job (Terminal 2)

Simulate a crash

Restart and watch recovery