Nonce Management Patterns - Souhaila Serbout

If you have read the nonces post, you know that every Ethereum transaction carries a nonce and that the network rejects any transaction whose nonce does not match the next expected value for the sender's account. In theory this is straightforward. Keep a counter, increment it, done. In practice, this is one of the hardest problems in backend systems that interact with Ethereum.

The moment your service needs to send more than one transaction at a time, or talks to more than one RPC node, or needs to recover from failures without losing track of where it left off, the simple counter breaks. You get nonce too low (you reused a nonce that was already confirmed), or replacement transaction underpriced (you sent two different transactions with the same nonce but did not increase the gas), or silent nonce gaps that block your entire queue.

Why "Just Get the Nonce" Breaks

The most intuitive approach is to call eth_getTransactionCount("pending") before each transaction. This returns the next expected nonce including unconfirmed transactions in the mempool. The problem is timing. If two goroutines (or threads, or processes) call this at the same moment, they both get the same value. Both build a transaction with the same nonce. One of them will succeed, and the other will fail with nonce too low or replacement transaction underpriced.

This is a classic TOCTOU (time-of-check-time-of-use) race condition. The nonce was valid when you read it, but by the time you send your transaction, someone else has already used it.

The natural instinct is to retry. Catch the error, call eth_getTransactionCount("pending") again, rebuild the transaction with the fresh nonce, and resend. This can work when failures are rare, but under concurrent load it makes things worse. Every failed goroutine retries at the same time, they all read the same updated nonce, and you get another round of collisions. The retries themselves become a source of contention. You end up in a retry storm where most of your sends are wasted, latency spikes, and the RPC provider may start rate-limiting you on top of it. Even if a retry eventually succeeds, you have no guarantee about the order in which transactions land on chain, which matters when operations depend on each other (like approving a token before transferring it). Retrying treats the symptom but not the cause. The root problem is that multiple callers are competing for the same nonce without coordination.

There is also a third error you will encounter beyond nonce too low and replacement transaction underpriced, which is already known. This happens when your transaction was already received by the node via peer-to-peer gossip from another node. If you sent a transaction to Node A, and Node B received it through the gossip protocol before you tried to send it to Node B directly, Node B rejects it as a duplicate. This is harmless (the transaction will still be mined) but it means your retry logic needs to recognize it and not treat it as a real failure.

The root cause of all these failures is re-querying PendingNonceAt. Every time you ask a node "what nonce should I use next?", the answer depends on what transactions that particular node has seen in its mempool. When nodes rotate, the new node has not seen your previous transaction yet. Even on the same node, there is an internal lag between accepting a transaction and updating the pending count. The fix is to query the nonce once at startup, then track it locally. If you sent nonce N and it worked, the next nonce is N+1. You do not need to ask.

Pattern 1: Send Lock

The simplest correct solution is to serialize all sends from the same account behind a mutex. Only one transaction can be built and sent at a time. After the send call returns (not after confirmation, just after the node accepts the transaction into its mempool), you increment a local nonce counter and release the lock.

type NonceMutex struct {
    mu      sync.Mutex
    nonce   uint64
    synced  bool
    client  *ethclient.Client
    account common.Address
}

func (nm *NonceMutex) SendTx(ctx context.Context, buildTx func(uint64) *types.Transaction) error {
    nm.mu.Lock()
    defer nm.mu.Unlock()

    // First call: sync nonce from chain
    if !nm.synced {
        pending, err := nm.client.PendingNonceAt(ctx, nm.account)
        if err != nil {
            return err
        }
        nm.nonce = pending
        nm.synced = true
    }

    tx := buildTx(nm.nonce)
    err := nm.client.SendTransaction(ctx, tx)
    if err != nil {
        // On error, resync nonce from chain on next call
        nm.synced = false
        return err
    }

    nm.nonce++ // only increment after successful send
    return nil
}

A critical detail in the error handling is that when a send fails, you need to distinguish nonce-related errors from other failures. Only nonce errors should trigger a resync. Other errors (network timeouts, gas estimation failures) should not reset the counter, or you will create nonce gaps.

// Detect nonce-related errors that warrant a resync
func IsNonceError(err error) bool {
    msg := strings.ToLower(err.Error())
    return strings.Contains(msg, "nonce too low") ||
        strings.Contains(msg, "already known") ||
        strings.Contains(msg, "replacement transaction underpriced")
}

This works well for moderate throughput. The trade-off is that all sends from this account are serialized. If you need to send 100 transactions per second, they queue up behind the mutex and you are limited by the round-trip time to the RPC node.

When to use single account, moderate transaction volume, simplicity is a priority.

Pattern 2: Sticky RPC

Before discussing multi-node setups, it is worth understanding what happens if you pin all operations to a single RPC node but keep re-querying PendingNonceAt before every send. You might expect this to solve everything since there is no cross-node inconsistency, the node always sees its own mempool. In practice, pinning alone is not enough. Even on the same node, if you send transactions faster than the node updates its pending state, PendingNonceAt returns a stale value. The node accepted your transaction into its mempool, but the counter has not caught up yet. You get nonce too low or already known errors even though you are talking to a single node.

This rules out a common assumption that pinning to one RPC solves the nonce problem. It eliminates cross-node drift, which helps, but it does not eliminate the fundamental issue of re-querying a nonce that is already stale by the time you use it.

// StickyClient keeps a single persistent connection to one RPC node.
// All sends go through this connection, avoiding cross-node mempool drift.
type StickyClient struct {
    client *ethclient.Client // one connection, reused for every send
}

func NewStickyClient(rawURL string) (*StickyClient, error) {
    // Dial once at startup, keep the connection open
    client, err := ethclient.Dial(rawURL)
    if err != nil {
        return nil, err
    }
    return &StickyClient{client: client}, nil
}

func (s *StickyClient) Send(ctx context.Context, tx *types.Transaction) error {
    return s.client.SendTransaction(ctx, tx)
}

func (s *StickyClient) PendingNonce(ctx context.Context, addr common.Address) (uint64, error) {
    return s.client.PendingNonceAt(ctx, addr)
}

When to use when you want to eliminate cross-node inconsistency without managing nonce state yourself. This is a building block, not a complete solution on its own.

Pattern 3: Nonce Keeper

Instead of asking the node for the nonce before every send, you query the chain once at startup (the "seed"), store it locally, and increment it yourself after each successful send. This is the single most impactful change you can make. The local counter guarantees that every transaction gets a unique, strictly incrementing nonce without any RPC round-trip. Even with rotating RPC nodes, this works surprisingly well on its own.

// NonceKeeper seeds the nonce once from the chain, then tracks it locally.
// After each successful send, the counter increments in memory.
type NonceKeeper struct {
    mu     sync.Mutex
    nonce  uint64
    seeded bool
}

func (nk *NonceKeeper) Seed(ctx context.Context, client *ethclient.Client, addr common.Address) error {
    nk.mu.Lock()
    defer nk.mu.Unlock()
    pending, err := client.PendingNonceAt(ctx, addr)
    if err != nil {
        return err
    }
    nk.nonce = pending
    nk.seeded = true
    return nil
}

func (nk *NonceKeeper) Next() uint64 {
    nk.mu.Lock()
    defer nk.mu.Unlock()
    n := nk.nonce
    nk.nonce++
    return n
}

func (nk *NonceKeeper) Resync(ctx context.Context, client *ethclient.Client, addr common.Address) error {
    nk.mu.Lock()
    defer nk.mu.Unlock()
    confirmed, err := client.NonceAt(ctx, addr, nil)
    if err != nil {
        return err
    }
    nk.nonce = confirmed
    return nil
}

The remaining failures come from already known errors. When you send a transaction to Node A, Node A gossips it to Node B via the peer-to-peer network. If your next send happens to hit Node B (RPC rotation), Node B may have already received the transaction through gossip and rejects the direct send as a duplicate. The transaction still gets mined (it was already in Node B's mempool via gossip), so this is not a real failure, but your code sees an error and needs to handle it correctly.

When to use when you want to eliminate stale nonce reads. This is the single most impactful pattern and a building block for all the compositions that follow.

Pattern 4: Sticky Counter (Composition: Patterns 2 + 3)

Combining Sticky RPC (Pattern 2) with a Nonce Keeper (Pattern 3) eliminates both sources of failure, cross-node mempool drift and stale nonce reads. Most production systems do not rely on a single RPC node. You typically have a primary provider (Alchemy, Infura, QuickNode) and one or more fallback nodes. The problem is that different nodes can report different pending nonce values because their mempools are not perfectly synchronized. By pinning sends to one node and tracking the nonce locally, you get the most reliable combination for production use with failover support.

The solution is nonce pinning. Never ask any node for the nonce during normal operation. Instead, maintain the nonce locally in your service. The only time you resync from the chain is on startup, after an error, or periodically.

type PinnedNonceManager struct {
    mu       sync.Mutex
    nonce    uint64
    clients  []*ethclient.Client // primary + fallbacks
    account  common.Address
}

func (p *PinnedNonceManager) Init(ctx context.Context) error {
    // Sync from confirmed count on startup
    confirmed, err := p.clients[0].NonceAt(ctx, p.account, nil)
    if err != nil {
        return err
    }
    p.nonce = confirmed
    return nil
}

func (p *PinnedNonceManager) SendWithFailover(ctx context.Context, buildTx func(uint64) *types.Transaction) error {
    p.mu.Lock()
    nonce := p.nonce
    p.nonce++
    p.mu.Unlock()

    tx := buildTx(nonce)

    // Try each node until one accepts the transaction
    var lastErr error
    for _, client := range p.clients {
        err := client.SendTransaction(ctx, tx)
        if err == nil {
            return nil
        }
        if strings.Contains(err.Error(), "nonce too low") {
            // Nonce was already used, resync
            p.resync(ctx)
            return err
        }
        lastErr = err
    }
    return lastErr
}

The key insight here is that the nonce is assigned before contacting any node. The choice of which node to use does not affect the nonce. If the primary node is down, you try the backup node with the exact same nonce. The nonce comes from your local state, not from any RPC response.

When to use any production system with multiple RPC providers or that needs failover resilience.

Variant 4a: The Confirmation Trap (Composition: Pattern 2 + WaitMined)

An intuitive alternative is to wait for each transaction to be confirmed before sending the next one. After sending, you call WaitMined (which polls the node until the transaction is included in a block), and only then query the nonce for the next transaction. This seems like it should be perfectly safe since the node confirmed the block, so surely PendingNonceAt reflects the new state?

It does not. In testing on Base Sepolia, this approach (Sticky RPC + WaitMined + re-query nonce each time) still failed frequently. The reason is an internal lag inside the node between block processing (which WaitMined sees) and pending state update (which PendingNonceAt reads). The node has confirmed the block, but its pending nonce counter has not caught up yet. This is a powerful proof that the Nonce Keeper is not just an optimization, it is essential. Even on a single pinned node, even after waiting for confirmation, re-querying the nonce is unreliable. The local counter is the only source of truth you can rely on.

// The Confirmation Trap: this looks correct but still fails.
// WaitMined confirms the block, but PendingNonceAt returns a stale value
// because of internal node lag between block processing and pending state.
func sendWithWaitMined(ctx context.Context, client *ethclient.Client, account common.Address, txs []TxIntent) error {
    for _, intent := range txs {
        // Re-query the nonce before each send (this is the mistake)
        nonce, err := client.PendingNonceAt(ctx, account)
        if err != nil {
            return err
        }

        tx := buildTx(intent, nonce)
        err = client.SendTransaction(ctx, tx)
        if err != nil {
            return err // often "nonce too low" despite WaitMined
        }

        // Wait for the transaction to be mined before sending the next one
        receipt, err := bind.WaitMined(ctx, client, tx)
        if err != nil {
            return err
        }
        if receipt.Status != types.ReceiptStatusSuccessful {
            return fmt.Errorf("tx %s reverted", tx.Hash().Hex())
        }

        // Block is confirmed, but PendingNonceAt may still return the OLD nonce.
        // The node has processed the block but hasn't updated its pending state yet.
        // Next iteration reads a stale nonce and the send fails.
    }
    return nil
}

// The fix is simple: use a Nonce Keeper (Pattern 3) instead of re-querying.
// Replace PendingNonceAt with keeper.Next() and the failures disappear.

When to use never. This is an anti-pattern. It is included here because it is a trap many developers fall into. The correct version replaces the PendingNonceAt call with keeper.Next() from Pattern 3, which turns this into a fully reliable approach that also waits for confirmation before proceeding.

Pattern 5: Wallet Pool

The mutex pattern serializes all transactions. If your system needs to send many transactions in parallel (a relayer, a mass airdrop, an arbitrage bot, a gas station network), serializing everything through one account is a bottleneck. The solution is to use multiple accounts, each with its own nonce, and distribute transactions across them.

type AccountPool struct {
    accounts []*NonceMutex
    next     uint32 // atomic round-robin counter
}

func NewAccountPool(keys []*ecdsa.PrivateKey, client *ethclient.Client) *AccountPool {
    pool := &AccountPool{}
    for _, key := range keys {
        addr := crypto.PubkeyToAddress(key.PublicKey)
        pool.accounts = append(pool.accounts, &NonceMutex{
            client:  client,
            account: addr,
        })
    }
    return pool
}

func (p *AccountPool) Send(ctx context.Context, buildTx func(uint64) *types.Transaction) error {
    // Round-robin across accounts
    idx := atomic.AddUint32(&p.next, 1) % uint32(len(p.accounts))
    return p.accounts[idx].SendTx(ctx, buildTx)
}

With N accounts, you can send N transactions in parallel without any nonce conflicts. Each account has its own independent nonce counter. The round-robin distributes the load evenly. This is the pattern used by high-throughput relayers and transaction bundlers.

The trade-off is operational. You need to fund multiple accounts with ETH for gas, manage multiple private keys securely, and ensure the pool is large enough for your peak load.

When to use high throughput, parallel execution, systems that need to send many transactions per block.

Pattern 6: Intent Queue

For the most robust setups, you decouple the intent to send a transaction from the actual sending. Your application writes transaction intents to a persistent queue (a database table, Redis, Kafka, etc.), and a dedicated sender worker processes them one at a time per account.

// Application code: just enqueue the intent
db.InsertTxIntent(TxIntent{
    To:    recipient,
    Value: amount,
    Data:  calldata,
    Status: "pending",
})

// Sender worker: runs in a loop, one per account
func senderLoop(ctx context.Context, account *NonceMutex, db *DB) {
    for {
        intent, err := db.NextPending(ctx)
        if err != nil || intent == nil {
            time.Sleep(500 * time.Millisecond)
            continue
        }

        err = account.SendTx(ctx, func(nonce uint64) *types.Transaction {
            return buildTx(intent, nonce)
        })

        if err != nil {
            db.MarkFailed(intent.ID, err)
        } else {
            db.MarkSent(intent.ID)
        }
    }
}

This gives you crash recovery (the queue is persistent, so you know which transactions were sent and which were not), retry logic (failed intents can be retried), and clean separation of concerns (the business logic does not need to know about nonces at all).

You can combine this with the Wallet Pool (Pattern 5) by running one sender worker per account, each processing from the same queue, to get both persistence and parallelism.

When to use any system that needs crash recovery, auditability, or reliable delivery guarantees.

Combining the Patterns

Each pattern addresses a specific failure mode, but no single pattern handles everything. In practice, production systems combine several of them. The effectiveness depends on which fixes you stack together:

A typical production setup for a high-throughput relayer combines all the layers:

Each layer addresses a different failure mode. The Send Lock prevents nonce races, the Nonce Keeper eliminates stale reads, Sticky RPC avoids gossip collisions, resync handles drift, the Wallet Pool unlocks parallelism, and the Intent Queue provides durability.

Pattern	Throughput	Reliability	Complexity
Re-query nonce each time	Fast	Broken under load	None
Retry on error	Slow (wasted sends)	Unreliable	Low
Sticky RPC only (P2)	Fast	Fails (PendingNonceAt lag)	Low
Sticky RPC + WaitMined (re-query)	Slow (1 tx/block)	Poor (internal lag)	Low
Nonce Keeper only (P3)	Fast	Good	Low
Sticky RPC + Nonce Keeper (P4)	Fast	Very high	Medium
P4 + resync-on-failure	Fast (self-healing)	Very high	Medium
P4 + WaitMined	Slow (1 tx/block)	Highest	Medium
Wallet Pool + Intent Queue + all above	N parallel, persistent	Highest	High

The Problem