Skip to main content

RPC Pool Failover

The transaction engine uses an intelligent RPC pool with automatic health scoring and failover.

Configuration

Configure multiple RPC endpoints for high availability:
.env
# Primary RPC endpoint
SOLANA_RPC_URL=https://api.devnet.solana.com

# Pool of endpoints (comma-separated)
SOLANA_RPC_POOL_URLS=https://api.devnet.solana.com,https://rpc.ankr.com/solana_devnet,https://devnet.helius-rpc.com

# Health probe interval in milliseconds
SOLANA_RPC_HEALTH_PROBE_MS=15000

Health Scoring Algorithm

Each endpoint maintains a dynamic health score based on:
  • Success rate - Successful requests increase score
  • Latency - High latency reduces score
  • Failure streaks - Consecutive failures penalize score
  • Recovery - Endpoints can recover score over time
Initial Score: 1.0On Success:
score = max(0.05, min(1.0, score + 0.06 - latencyPenalty(avgLatencyMs)))
avgLatencyMs = avgLatencyMs * 0.7 + currentLatency * 0.3
failStreak = 0
On Failure:
score = max(0.05, score * 0.7)
failStreak += 1
Latency Penalties:
  • ≤ 200ms: 0.00
  • 201-500ms: 0.04
  • 501-1000ms: 0.08
  • 1000ms: 0.12

Failover Behavior

1

Sort by Score

Endpoints are sorted by health score (descending), then by average latency (ascending).
2

Attempt Primary

The highest-scored endpoint is tried first for each operation.
3

Cascade on Failure

If the primary fails, the next endpoint in the sorted list is tried automatically.
4

Update Health

Success or failure updates the endpoint’s health score for future requests.

Monitor Pool Status

Query the current health of all RPC endpoints:
curl -H 'x-api-key: dev-api-key' \
  http://localhost:3000/api/v1/rpc/pool/status
Response:
{
  "endpoints": [
    {
      "url": "https://api.devnet.solana.com",
      "score": 0.95,
      "successes": 142,
      "failures": 3,
      "avgLatencyMs": 230,
      "failStreak": 0,
      "lastCheckedAt": "2026-03-08T10:30:00.000Z"
    },
    {
      "url": "https://rpc.ankr.com/solana_devnet",
      "score": 0.88,
      "successes": 98,
      "failures": 8,
      "avgLatencyMs": 420,
      "failStreak": 1,
      "lastCheckedAt": "2026-03-08T10:29:45.000Z",
      "lastError": "Connection timeout"
    }
  ]
}
The pool continuously probes all endpoints in the background at the configured interval to maintain fresh health scores.

Durable Outbox Queue

The transaction engine uses a SQLite-backed outbox pattern for reliable transaction processing.

Architecture

  • Persistent Storage - Jobs survive process restarts
  • Lease-based Claiming - Workers claim jobs with time-bound leases
  • Automatic Retry - Failed jobs re-enter the queue
  • Deduplication - Prevents duplicate pending jobs for the same transaction

Outbox Actions

  • execute - Process a new transaction request
  • retry - Retry a failed transaction
  • approve - Process an approval gate decision

Job States

type OutboxStatus = 
  | 'pending'      // Queued, waiting for worker
  | 'processing'   // Worker has active lease
  | 'done'         // Successfully completed
  | 'failed'       // Exceeded max retry attempts

Configuration

.env
# Lease duration - how long a worker can hold a job
TX_OUTBOX_LEASE_MS=30000

# Poll interval - how often worker checks for new jobs
TX_OUTBOX_POLL_MS=2000

# Max attempts before marking job as permanently failed
TX_OUTBOX_MAX_ATTEMPTS=6

Lease and Retry Semantics

1

Claim Job

Worker claims the oldest pending job or a processing job with expired lease.
WHERE status = 'pending'
   OR (status = 'processing' AND lease_expires_at <= NOW())
ORDER BY created_at ASC
LIMIT 1
2

Process with Lease

Job moves to processing with a unique leaseId and expiration timestamp.
attempts += 1
lease_expires_at = NOW() + TX_OUTBOX_LEASE_MS
3

Complete or Fail

Worker marks job as done or failed.
  • Success: status = 'done', lease cleared
  • Retryable Failure: status = 'pending' if attempts < TX_OUTBOX_MAX_ATTEMPTS
  • Permanent Failure: status = 'failed' if max attempts exceeded
4

Automatic Recovery

If worker crashes, expired leases allow job to be reclaimed by another worker.
Lease Expiration: Set TX_OUTBOX_LEASE_MS longer than your worst-case transaction confirmation time to avoid premature lease expiration.

Monitor Outbox Status

curl -H 'x-api-key: dev-api-key' \
  http://localhost:3000/api/v1/outbox/stats
Response:
{
  "pending": 3,
  "processing": 2,
  "failed": 0,
  "done": 145
}

Adaptive Priority Fee Tuning

The execution tuner automatically calculates optimal priority fees based on recent network activity.

Configuration

.env
# Minimum priority fee (microlamports per compute unit)
SOLANA_PRIORITY_FEE_MIN_MICROLAMPORTS=2000

# Maximum priority fee (microlamports per compute unit)
SOLANA_PRIORITY_FEE_MAX_MICROLAMPORTS=200000

# Percentile of recent fees to target (1-99)
SOLANA_PRIORITY_FEE_PERCENTILE=75

# Multiplier applied to percentile fee (basis points)
# 1150 bps = 1.15x boost
SOLANA_PRIORITY_FEE_MULTIPLIER_BPS=1150

Fee Calculation Algorithm

1

Collect Recent Fees

Gather recent priority fees from the RPC endpoint’s recent blocks.
2

Calculate Percentile

Compute the configured percentile (e.g., 75th percentile).
const sortedFees = recentFees.filter(f => f >= 0).sort((a, b) => a - b)
const index = Math.floor((percentile / 100) * (sortedFees.length - 1))
const percentileFee = sortedFees[index]
3

Apply Multiplier

Boost the percentile fee by the configured multiplier.
const multiplier = PRIORITY_FEE_MULTIPLIER_BPS / 10000
const boostedFee = Math.floor(percentileFee * multiplier)
4

Clamp to Bounds

Ensure the final fee is within min/max bounds.
const finalFee = clamp(
  boostedFee > 0 ? boostedFee : minFee,
  minFee,
  maxFee
)

Compute Unit Estimation

The tuner also calculates compute unit limits based on transaction type:
const computeByType: Record<TransactionType, number> = {
  transfer_sol: 120_000,
  transfer_spl: 180_000,
  swap: 380_000,
  stake: 240_000,
  unstake: 240_000,
  lend_supply: 320_000,
  lend_borrow: 350_000,
  create_escrow: 320_000,
  accept_escrow: 280_000,
  release_escrow: 260_000,
  // ...
}

// Add buffer for additional instructions
const instructionBuffer = max(0, instructionCount - 1) * 15_000
const computeUnitLimit = clamp(baseUnits + instructionBuffer, 100_000, 1_200_000)
Compute budgets are automatically injected as the first instructions in every transaction.

Delta Guard Checks

Delta guard validates that observed balance changes match expected deltas to detect simulation drift or unexpected fees.

Configuration

.env
# Absolute tolerance in lamports for small variances
DELTA_GUARD_ABSOLUTE_TOLERANCE_LAMPORTS=10000

Expected Delta Calculation

const expectedLamportsDelta = (type: string, intent: Record<string, unknown>): number | null => {
  const amount = Number(intent['lamports'] ?? intent['amountLamports'] ?? intent['amount'] ?? 0)
  
  if (!Number.isFinite(amount) || amount <= 0) {
    return null // Cannot compute delta
  }
  
  // Outflows (negative delta)
  if (
    type === 'transfer_sol' ||
    type === 'stake' ||
    type === 'lend_supply' ||
    type === 'create_escrow'
  ) {
    return -amount
  }
  
  // Inflows (positive delta)
  if (
    type === 'unstake' ||
    type === 'release_escrow' ||
    type === 'refund_escrow'
  ) {
    return amount
  }
  
  return null // No delta check for this type
}

Variance Evaluation

1

Compare Deltas

Calculate the absolute difference between expected and observed deltas.
const absoluteDelta = Math.abs(observed - expected)
2

Check Absolute Tolerance

If within the configured tolerance, delta guard passes.
if (absoluteDelta <= DELTA_GUARD_ABSOLUTE_TOLERANCE_LAMPORTS) {
  return { ok: true, reason: 'within absolute tolerance' }
}
3

Calculate Variance BPS

Compute variance in basis points.
const denom = Math.max(1, Math.abs(expected))
const varianceBps = Math.round((absoluteDelta / denom) * 10000)
4

Check Threshold

Compare variance to the configured threshold (typically 200-500 bps).
const ok = varianceBps <= thresholdBps

Delta Guard Result

interface DeltaGuardResult {
  ok: boolean;
  expectedLamportsDelta: number | null;
  observedLamportsDelta: number | null;
  varianceBps: number | null;
  reason?: string;
}
Delta Guard Failures indicate a mismatch between simulation and execution. This could be due to:
  • Unexpected transaction fees
  • Rent changes
  • Protocol fee variations
  • Simulation drift (different blockhash/slot)
Investigate failed transactions to determine if the variance is acceptable.

Restart Recovery

The outbox queue automatically drains pending work on service restart.

Recovery Flow

1

Load Persistent State

On startup, the transaction engine loads the SQLite database with all pending and processing jobs.
2

Reclaim Expired Leases

Jobs in processing state with expired leases return to pending.
3

Resume Processing

The outbox worker starts polling and claiming jobs from the queue.
4

Process Backlog

All pending jobs are processed in FIFO order (oldest first).
No manual intervention is required after a restart. The system automatically recovers and continues processing.

Database Migration

The system includes automatic migration from legacy JSON snapshots to SQLite.

Migration Behavior

  • Automatic Detection - On first startup, checks for legacy snapshot file
  • One-time Import - Migrates jobs if SQLite database is empty
  • Preserves History - All job states, attempts, and metadata are preserved
  • Idempotent - Safe to restart during migration
if (existsSync(legacySnapshotFile) && dbRowCount === 0) {
  const snapshot = JSON.parse(readFileSync(legacySnapshotFile))
  
  db.transaction(() => {
    for (const job of snapshot.jobs) {
      db.insert(job) // INSERT OR IGNORE for idempotency
    }
  })
}
After successful migration, you can safely delete the legacy JSON snapshot file.