Guide
Running RustQueue in production
Durability tradeoffs, managed workers, retries, crash recovery, and graceful shutdown — without a message broker.
Crash-only design
Every job is persisted to storage before it is acknowledged. There is no in-memory state that would be lost on a crash. You can kill -9 the process at any point and no committed work will be silently dropped.
Delivery is at-least-once: a job being processed when the process died will be retried automatically. Handlers should be idempotent, or use the unique_key field to deduplicate on re-delivery.
Durability modes
Choose the backend that matches your availability/throughput tradeoff.
| Backend | Durability | Throughput | Crash loss window | When to use |
|---|---|---|---|---|
redb (default) |
ACID, fsync per write | ~314 push/s raw | None — zero data loss | Default production choice. Correct and zero-config. |
Buffered redb (.with_write_coalescing) |
Batched fsync | ~19 K ops/s | One flush interval | High-write workloads where small loss windows are acceptable. |
Hybrid (RustQueue::hybrid) |
In-memory hot path + periodic snapshot | 300 K+ ops/s | Up to snapshot_interval |
Maximum throughput; explicitly trade durability for speed. |
In-memory (RustQueue::memory) |
None — lost on drop | Fastest | Total | Tests and ephemeral workloads only. |
// Safest: ACID redb
let rq = RustQueue::redb("./jobs.db")?.build()?;
// Faster writes, tiny loss window:
use rustqueue::storage::BufferedRedbConfig;
let rq = RustQueue::redb("./jobs.db")?
.with_write_coalescing(BufferedRedbConfig::default())
.build()?;
// Maximum throughput, explicit durability tradeoff:
let rq = RustQueue::hybrid("./jobs.db")?.build()?;Workers
run_worker is the managed entry point for processing jobs. It replaces a hand-rolled pull / ack / fail loop and handles operational details automatically:
- Auto-acks a job when the handler returns
Ok(()). - Auto-fails a job (triggering retry/DLQ logic) when the handler returns
Err(e). - Auto-heartbeats the in-flight job so stall detection does not reclaim a healthy long-running job.
- Starts housekeeping automatically on first call — no separate setup required.
use rustqueue::RustQueue;
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let rq = RustQueue::redb("./jobs.db")?.build()?;
rq.run_worker("emails", |job| async move {
println!("sending email: {}", job.data["to"]);
// return Ok to ack, Err to fail (→ retry or DLQ)
Ok::<(), String>(())
})
.await?;
Ok(())
}run_worker runs until Ctrl-C and finishes the in-flight job before returning.
Embedding in a web server
Use run_worker_with_shutdown to stop the worker when a server shuts down rather than on Ctrl-C:
use tokio::sync::oneshot;
let (tx, rx) = oneshot::channel::<()>();
// In your server shutdown handler: tx.send(()).ok();
tokio::spawn(async move {
rq.run_worker_with_shutdown("emails", handler, async {
rx.await.ok();
})
.await
.expect("worker failed");
});.awaits will block the tokio thread and prevent heartbeats from firing. After stall_timeout the job will be reclaimed. Wrap CPU-intensive sections with tokio::task::spawn_blocking, or raise stall_timeout.
Retries & DLQ
Every job carries retry configuration. The defaults:
| Setting | Default |
|---|---|
max_attempts | 3 |
backoff | Exponential |
backoff_delay_ms | 1 000 ms |
When a job fails (handler returns Err, or stall detection fires), the engine applies backoff and re-queues — unless attempt ≥ max_attempts, at which point the job moves to the dead-letter queue.
| Strategy | Delay formula | Example (base = 1 s) |
|---|---|---|
Fixed | base | 1 s, 1 s, 1 s |
Linear | base × attempt | 1 s, 2 s, 3 s |
Exponential (default) | base × 2^(attempt-1) | 1 s, 2 s, 4 s |
Inspecting the DLQ
let dead = rq.get_dlq_jobs("emails", 50).await?;
for job in dead {
println!("{}: {:?}", job.id, job.last_error);
}Housekeeping
The housekeeping loop is a background tokio task (default tick: 1 s) that drives retries, schedules, and crash recovery. run_worker starts it automatically. If you use pull/ack directly, call it explicitly:
// Call once after building; safe to call multiple times (idempotent).
rq.start_housekeeping()?;What it does each tick
- Promotes
Delayed→Waitingwhen delay expires. - Fires cron and interval schedules that are due.
- Fails jobs that have exceeded
timeout_ms. - Detects stalled
Activejobs (no heartbeat withinstall_timeout). - Refreshes queue depth metrics.
- Cleans up expired completed/failed/DLQ jobs (every ~1 min).
Tuning
use std::time::Duration;
let rq = RustQueue::redb("./jobs.db")?
.stall_timeout(Duration::from_secs(60)) // default: 30 s
.tick_interval(Duration::from_millis(500)) // default: 1 s
.build()?;Graceful shutdown
run_worker catches Ctrl-C and finishes the in-flight job before returning. For programmatic shutdown (e.g., Axum shutdown signal):
use tokio::sync::watch;
let (shutdown_tx, shutdown_rx) = watch::channel(false);
rq.run_worker_with_shutdown("work", handler, async move {
shutdown_rx.changed().await.ok();
})
.await?;
// Elsewhere, to trigger shutdown:
shutdown_tx.send(true).ok();Jobs that have not yet been pulled remain in Waiting state and will be picked up on the next startup. No work is lost.
Crash-recovery walkthrough
The crash_recovery example demonstrates the full lifecycle: push a job, crash mid-processing, restart, and watch the job complete automatically.
-
Start the worker (Terminal 1)
Uses a short
stall_timeoutof 3 s so you don't wait long.cargo run --example crash_recovery -- worker -
Push a job (Terminal 2)
cargo run --example crash_recovery -- pushThe worker picks it up and logs progress steps.
-
Simulate a crash
While the job is mid-run, kill the worker process:
kill -9 <pid printed at startup>The job is now
Activewith no heartbeat. The database still holds it. -
Restart and watch recovery
cargo run --example crash_recovery -- workerAfter ~3 s, housekeeping detects no heartbeat, fails the job, and re-queues it. The new worker picks it up and completes it. No manual intervention. No data loss. No message broker.
max_attempts=3 (the default), a job killed three times will land in the DLQ. Raise max_attempts for workloads that may experience frequent infrastructure interruptions.