If you have read the nonces post, you know that every Ethereum transaction carries a nonce and that the network rejects any transaction whose nonce does not match the next expected value for the sender's account. In theory this is straightforward. Keep a counter, increment it, done. In practice, this is one of the hardest problems in backend systems that interact with Ethereum.
The moment your service needs to send more than one transaction at a time, or talks to more than one RPC node, or needs to recover from failures without losing track of where it left off, the simple counter breaks. You get nonce too low (you reused a nonce that was already confirmed), or replacement transaction underpriced (you sent two different transactions with the same nonce but did not increase the gas), or silent nonce gaps that block your entire queue.
The most intuitive approach is to call eth_getTransactionCount("pending") before each transaction. This returns the next expected nonce including unconfirmed transactions in the mempool. The problem is timing. If two goroutines (or threads, or processes) call this at the same moment, they both get the same value. Both build a transaction with the same nonce. One of them will succeed, and the other will fail with nonce too low or replacement transaction underpriced.
This is a classic TOCTOU (time-of-check-time-of-use) race condition. The nonce was valid when you read it, but by the time you send your transaction, someone else has already used it.
The natural instinct is to retry. Catch the error, call eth_getTransactionCount("pending") again, rebuild the transaction with the fresh nonce, and resend. This can work when failures are rare, but under concurrent load it makes things worse. Every failed goroutine retries at the same time, they all read the same updated nonce, and you get another round of collisions. The retries themselves become a source of contention. You end up in a retry storm where most of your sends are wasted, latency spikes, and the RPC provider may start rate-limiting you on top of it. Even if a retry eventually succeeds, you have no guarantee about the order in which transactions land on chain, which matters when operations depend on each other (like approving a token before transferring it). Retrying treats the symptom but not the cause. The root problem is that multiple callers are competing for the same nonce without coordination.
There is also a third error you will encounter beyond nonce too low and replacement transaction underpriced, which is already known. This happens when your transaction was already received by the node via peer-to-peer gossip from another node. If you sent a transaction to Node A, and Node B received it through the gossip protocol before you tried to send it to Node B directly, Node B rejects it as a duplicate. This is harmless (the transaction will still be mined) but it means your retry logic needs to recognize it and not treat it as a real failure.
The root cause of all these failures is re-querying PendingNonceAt. Every time you ask a node "what nonce should I use next?", the answer depends on what transactions that particular node has seen in its mempool. When nodes rotate, the new node has not seen your previous transaction yet. Even on the same node, there is an internal lag between accepting a transaction and updating the pending count. The fix is to query the nonce once at startup, then track it locally. If you sent nonce N and it worked, the next nonce is N+1. You do not need to ask.
The simplest correct solution is to serialize all sends from the same account behind a mutex. Only one transaction can be built and sent at a time. After the send call returns (not after confirmation, just after the node accepts the transaction into its mempool), you increment a local nonce counter and release the lock.
type NonceMutex struct {
mu sync.Mutex
nonce uint64
synced bool
client *ethclient.Client
account common.Address
}
func (nm *NonceMutex) SendTx(ctx context.Context, buildTx func(uint64) *types.Transaction) error {
nm.mu.Lock()
defer nm.mu.Unlock()
// First call: sync nonce from chain
if !nm.synced {
pending, err := nm.client.PendingNonceAt(ctx, nm.account)
if err != nil {
return err
}
nm.nonce = pending
nm.synced = true
}
tx := buildTx(nm.nonce)
err := nm.client.SendTransaction(ctx, tx)
if err != nil {
// On error, resync nonce from chain on next call
nm.synced = false
return err
}
nm.nonce++ // only increment after successful send
return nil
}
A critical detail in the error handling is that when a send fails, you need to distinguish nonce-related errors from other failures. Only nonce errors should trigger a resync. Other errors (network timeouts, gas estimation failures) should not reset the counter, or you will create nonce gaps.
// Detect nonce-related errors that warrant a resync
func IsNonceError(err error) bool {
msg := strings.ToLower(err.Error())
return strings.Contains(msg, "nonce too low") ||
strings.Contains(msg, "already known") ||
strings.Contains(msg, "replacement transaction underpriced")
}
This works well for moderate throughput. The trade-off is that all sends from this account are serialized. If you need to send 100 transactions per second, they queue up behind the mutex and you are limited by the round-trip time to the RPC node.
When to use single account, moderate transaction volume, simplicity is a priority.
Before discussing multi-node setups, it is worth understanding what happens if you pin all operations to a single RPC node but keep re-querying PendingNonceAt before every send. You might expect this to solve everything since there is no cross-node inconsistency, the node always sees its own mempool. In practice, pinning alone is not enough. Even on the same node, if you send transactions faster than the node updates its pending state, PendingNonceAt returns a stale value. The node accepted your transaction into its mempool, but the counter has not caught up yet. You get nonce too low or already known errors even though you are talking to a single node.
This rules out a common assumption that pinning to one RPC solves the nonce problem. It eliminates cross-node drift, which helps, but it does not eliminate the fundamental issue of re-querying a nonce that is already stale by the time you use it.
// StickyClient keeps a single persistent connection to one RPC node.
// All sends go through this connection, avoiding cross-node mempool drift.
type StickyClient struct {
client *ethclient.Client // one connection, reused for every send
}
func NewStickyClient(rawURL string) (*StickyClient, error) {
// Dial once at startup, keep the connection open
client, err := ethclient.Dial(rawURL)
if err != nil {
return nil, err
}
return &StickyClient{client: client}, nil
}
func (s *StickyClient) Send(ctx context.Context, tx *types.Transaction) error {
return s.client.SendTransaction(ctx, tx)
}
func (s *StickyClient) PendingNonce(ctx context.Context, addr common.Address) (uint64, error) {
return s.client.PendingNonceAt(ctx, addr)
}
When to use when you want to eliminate cross-node inconsistency without managing nonce state yourself. This is a building block, not a complete solution on its own.
Instead of asking the node for the nonce before every send, you query the chain once at startup (the "seed"), store it locally, and increment it yourself after each successful send. This is the single most impactful change you can make. The local counter guarantees that every transaction gets a unique, strictly incrementing nonce without any RPC round-trip. Even with rotating RPC nodes, this works surprisingly well on its own.
// NonceKeeper seeds the nonce once from the chain, then tracks it locally.
// After each successful send, the counter increments in memory.
type NonceKeeper struct {
mu sync.Mutex
nonce uint64
seeded bool
}
func (nk *NonceKeeper) Seed(ctx context.Context, client *ethclient.Client, addr common.Address) error {
nk.mu.Lock()
defer nk.mu.Unlock()
pending, err := client.PendingNonceAt(ctx, addr)
if err != nil {
return err
}
nk.nonce = pending
nk.seeded = true
return nil
}
func (nk *NonceKeeper) Next() uint64 {
nk.mu.Lock()
defer nk.mu.Unlock()
n := nk.nonce
nk.nonce++
return n
}
func (nk *NonceKeeper) Resync(ctx context.Context, client *ethclient.Client, addr common.Address) error {
nk.mu.Lock()
defer nk.mu.Unlock()
confirmed, err := client.NonceAt(ctx, addr, nil)
if err != nil {
return err
}
nk.nonce = confirmed
return nil
}
The remaining failures come from already known errors. When you send a transaction to Node A, Node A gossips it to Node B via the peer-to-peer network. If your next send happens to hit Node B (RPC rotation), Node B may have already received the transaction through gossip and rejects the direct send as a duplicate. The transaction still gets mined (it was already in Node B's mempool via gossip), so this is not a real failure, but your code sees an error and needs to handle it correctly.
When to use when you want to eliminate stale nonce reads. This is the single most impactful pattern and a building block for all the compositions that follow.
Combining Sticky RPC (Pattern 2) with a Nonce Keeper (Pattern 3) eliminates both sources of failure, cross-node mempool drift and stale nonce reads. Most production systems do not rely on a single RPC node. You typically have a primary provider (Alchemy, Infura, QuickNode) and one or more fallback nodes. The problem is that different nodes can report different pending nonce values because their mempools are not perfectly synchronized. By pinning sends to one node and tracking the nonce locally, you get the most reliable combination for production use with failover support.
The solution is nonce pinning. Never ask any node for the nonce during normal operation. Instead, maintain the nonce locally in your service. The only time you resync from the chain is on startup, after an error, or periodically.
eth_getTransactionCount("latest") to get the confirmed nonce count, then check for pending transactions in the mempool to determine the true next noncenonce too low, resync from the confirmed count and rebuild your pending queuetype PinnedNonceManager struct {
mu sync.Mutex
nonce uint64
clients []*ethclient.Client // primary + fallbacks
account common.Address
}
func (p *PinnedNonceManager) Init(ctx context.Context) error {
// Sync from confirmed count on startup
confirmed, err := p.clients[0].NonceAt(ctx, p.account, nil)
if err != nil {
return err
}
p.nonce = confirmed
return nil
}
func (p *PinnedNonceManager) SendWithFailover(ctx context.Context, buildTx func(uint64) *types.Transaction) error {
p.mu.Lock()
nonce := p.nonce
p.nonce++
p.mu.Unlock()
tx := buildTx(nonce)
// Try each node until one accepts the transaction
var lastErr error
for _, client := range p.clients {
err := client.SendTransaction(ctx, tx)
if err == nil {
return nil
}
if strings.Contains(err.Error(), "nonce too low") {
// Nonce was already used, resync
p.resync(ctx)
return err
}
lastErr = err
}
return lastErr
}
The key insight here is that the nonce is assigned before contacting any node. The choice of which node to use does not affect the nonce. If the primary node is down, you try the backup node with the exact same nonce. The nonce comes from your local state, not from any RPC response.
When to use any production system with multiple RPC providers or that needs failover resilience.
An intuitive alternative is to wait for each transaction to be confirmed before sending the next one. After sending, you call WaitMined (which polls the node until the transaction is included in a block), and only then query the nonce for the next transaction. This seems like it should be perfectly safe since the node confirmed the block, so surely PendingNonceAt reflects the new state?
It does not. In testing on Base Sepolia, this approach (Sticky RPC + WaitMined + re-query nonce each time) still failed frequently. The reason is an internal lag inside the node between block processing (which WaitMined sees) and pending state update (which PendingNonceAt reads). The node has confirmed the block, but its pending nonce counter has not caught up yet. This is a powerful proof that the Nonce Keeper is not just an optimization, it is essential. Even on a single pinned node, even after waiting for confirmation, re-querying the nonce is unreliable. The local counter is the only source of truth you can rely on.
// The Confirmation Trap: this looks correct but still fails.
// WaitMined confirms the block, but PendingNonceAt returns a stale value
// because of internal node lag between block processing and pending state.
func sendWithWaitMined(ctx context.Context, client *ethclient.Client, account common.Address, txs []TxIntent) error {
for _, intent := range txs {
// Re-query the nonce before each send (this is the mistake)
nonce, err := client.PendingNonceAt(ctx, account)
if err != nil {
return err
}
tx := buildTx(intent, nonce)
err = client.SendTransaction(ctx, tx)
if err != nil {
return err // often "nonce too low" despite WaitMined
}
// Wait for the transaction to be mined before sending the next one
receipt, err := bind.WaitMined(ctx, client, tx)
if err != nil {
return err
}
if receipt.Status != types.ReceiptStatusSuccessful {
return fmt.Errorf("tx %s reverted", tx.Hash().Hex())
}
// Block is confirmed, but PendingNonceAt may still return the OLD nonce.
// The node has processed the block but hasn't updated its pending state yet.
// Next iteration reads a stale nonce and the send fails.
}
return nil
}
// The fix is simple: use a Nonce Keeper (Pattern 3) instead of re-querying.
// Replace PendingNonceAt with keeper.Next() and the failures disappear.
When to use never. This is an anti-pattern. It is included here because it is a trap many developers fall into. The correct version replaces the PendingNonceAt call with keeper.Next() from Pattern 3, which turns this into a fully reliable approach that also waits for confirmation before proceeding.
The mutex pattern serializes all transactions. If your system needs to send many transactions in parallel (a relayer, a mass airdrop, an arbitrage bot, a gas station network), serializing everything through one account is a bottleneck. The solution is to use multiple accounts, each with its own nonce, and distribute transactions across them.
type AccountPool struct {
accounts []*NonceMutex
next uint32 // atomic round-robin counter
}
func NewAccountPool(keys []*ecdsa.PrivateKey, client *ethclient.Client) *AccountPool {
pool := &AccountPool{}
for _, key := range keys {
addr := crypto.PubkeyToAddress(key.PublicKey)
pool.accounts = append(pool.accounts, &NonceMutex{
client: client,
account: addr,
})
}
return pool
}
func (p *AccountPool) Send(ctx context.Context, buildTx func(uint64) *types.Transaction) error {
// Round-robin across accounts
idx := atomic.AddUint32(&p.next, 1) % uint32(len(p.accounts))
return p.accounts[idx].SendTx(ctx, buildTx)
}
With N accounts, you can send N transactions in parallel without any nonce conflicts. Each account has its own independent nonce counter. The round-robin distributes the load evenly. This is the pattern used by high-throughput relayers and transaction bundlers.
The trade-off is operational. You need to fund multiple accounts with ETH for gas, manage multiple private keys securely, and ensure the pool is large enough for your peak load.
When to use high throughput, parallel execution, systems that need to send many transactions per block.
For the most robust setups, you decouple the intent to send a transaction from the actual sending. Your application writes transaction intents to a persistent queue (a database table, Redis, Kafka, etc.), and a dedicated sender worker processes them one at a time per account.
// Application code: just enqueue the intent
db.InsertTxIntent(TxIntent{
To: recipient,
Value: amount,
Data: calldata,
Status: "pending",
})
// Sender worker: runs in a loop, one per account
func senderLoop(ctx context.Context, account *NonceMutex, db *DB) {
for {
intent, err := db.NextPending(ctx)
if err != nil || intent == nil {
time.Sleep(500 * time.Millisecond)
continue
}
err = account.SendTx(ctx, func(nonce uint64) *types.Transaction {
return buildTx(intent, nonce)
})
if err != nil {
db.MarkFailed(intent.ID, err)
} else {
db.MarkSent(intent.ID)
}
}
}
This gives you crash recovery (the queue is persistent, so you know which transactions were sent and which were not), retry logic (failed intents can be retried), and clean separation of concerns (the business logic does not need to know about nonces at all).
You can combine this with the Wallet Pool (Pattern 5) by running one sender worker per account, each processing from the same queue, to get both persistence and parallelism.
When to use any system that needs crash recovery, auditability, or reliable delivery guarantees.
Each pattern addresses a specific failure mode, but no single pattern handles everything. In practice, production systems combine several of them. The effectiveness depends on which fixes you stack together:
already known errors when a node receives your transaction via p2p gossip before your direct send reaches it.A typical production setup for a high-throughput relayer combines all the layers:
Each layer addresses a different failure mode. The Send Lock prevents nonce races, the Nonce Keeper eliminates stale reads, Sticky RPC avoids gossip collisions, resync handles drift, the Wallet Pool unlocks parallelism, and the Intent Queue provides durability.
| Pattern | Throughput | Reliability | Complexity |
|---|---|---|---|
| Re-query nonce each time | Fast | Broken under load | None |
| Retry on error | Slow (wasted sends) | Unreliable | Low |
| Sticky RPC only (P2) | Fast | Fails (PendingNonceAt lag) | Low |
| Sticky RPC + WaitMined (re-query) | Slow (1 tx/block) | Poor (internal lag) | Low |
| Nonce Keeper only (P3) | Fast | Good | Low |
| Sticky RPC + Nonce Keeper (P4) | Fast | Very high | Medium |
| P4 + resync-on-failure | Fast (self-healing) | Very high | Medium |
| P4 + WaitMined | Slow (1 tx/block) | Highest | Medium |
| Wallet Pool + Intent Queue + all above | N parallel, persistent | Highest | High |
For more on how nonces work at the protocol level, see Understanding Nonces on Ethereum. To understand the full transaction lifecycle from wallet to finality, see Sending Transactions on Ethereum.