Reliability

RPC Pool Failover

The transaction engine uses an intelligent RPC pool with automatic health scoring and failover.

Configuration

Configure multiple RPC endpoints for high availability:

.env

# Primary RPC endpoint
SOLANA_RPC_URL=https://api.devnet.solana.com

# Pool of endpoints (comma-separated)
SOLANA_RPC_POOL_URLS=https://api.devnet.solana.com,https://rpc.ankr.com/solana_devnet,https://devnet.helius-rpc.com

# Health probe interval in milliseconds
SOLANA_RPC_HEALTH_PROBE_MS=15000

Health Scoring Algorithm

Each endpoint maintains a dynamic health score based on:

Success rate - Successful requests increase score
Latency - High latency reduces score
Failure streaks - Consecutive failures penalize score
Recovery - Endpoints can recover score over time

Health Score Formula

Initial Score: 1.0On Success:

score = max(0.05, min(1.0, score + 0.06 - latencyPenalty(avgLatencyMs)))
avgLatencyMs = avgLatencyMs * 0.7 + currentLatency * 0.3
failStreak = 0

On Failure:

score = max(0.05, score * 0.7)
failStreak += 1

Latency Penalties:

≤ 200ms: 0.00
201-500ms: 0.04
501-1000ms: 0.08
1000ms: 0.12

Failover Behavior

Sort by Score

Endpoints are sorted by health score (descending), then by average latency (ascending).

Attempt Primary

The highest-scored endpoint is tried first for each operation.

Cascade on Failure

If the primary fails, the next endpoint in the sorted list is tried automatically.

Update Health

Success or failure updates the endpoint’s health score for future requests.

Monitor Pool Status

Query the current health of all RPC endpoints:

curl -H 'x-api-key: dev-api-key' \
  http://localhost:3000/api/v1/rpc/pool/status

Response:

{
  "endpoints": [
    {
      "url": "https://api.devnet.solana.com",
      "score": 0.95,
      "successes": 142,
      "failures": 3,
      "avgLatencyMs": 230,
      "failStreak": 0,
      "lastCheckedAt": "2026-03-08T10:30:00.000Z"
    },
    {
      "url": "https://rpc.ankr.com/solana_devnet",
      "score": 0.88,
      "successes": 98,
      "failures": 8,
      "avgLatencyMs": 420,
      "failStreak": 1,
      "lastCheckedAt": "2026-03-08T10:29:45.000Z",
      "lastError": "Connection timeout"
    }
  ]
}

The pool continuously probes all endpoints in the background at the configured interval to maintain fresh health scores.

Durable Outbox Queue

The transaction engine uses a SQLite-backed outbox pattern for reliable transaction processing.

Architecture

Persistent Storage - Jobs survive process restarts
Lease-based Claiming - Workers claim jobs with time-bound leases
Automatic Retry - Failed jobs re-enter the queue
Deduplication - Prevents duplicate pending jobs for the same transaction

Outbox Actions

execute - Process a new transaction request
retry - Retry a failed transaction
approve - Process an approval gate decision

Job States

type OutboxStatus = 
  | 'pending'      // Queued, waiting for worker
  | 'processing'   // Worker has active lease
  | 'done'         // Successfully completed
  | 'failed'       // Exceeded max retry attempts

Configuration

.env

# Lease duration - how long a worker can hold a job
TX_OUTBOX_LEASE_MS=30000

# Poll interval - how often worker checks for new jobs
TX_OUTBOX_POLL_MS=2000

# Max attempts before marking job as permanently failed
TX_OUTBOX_MAX_ATTEMPTS=6

Lease and Retry Semantics

Claim Job

Worker claims the oldest pending job or a processing job with expired lease.

WHERE status = 'pending'
   OR (status = 'processing' AND lease_expires_at <= NOW())
ORDER BY created_at ASC
LIMIT 1

Process with Lease

Job moves to processing with a unique leaseId and expiration timestamp.

attempts += 1
lease_expires_at = NOW() + TX_OUTBOX_LEASE_MS

Complete or Fail

Worker marks job as done or failed.

Success: status = 'done', lease cleared
Retryable Failure: status = 'pending' if attempts < TX_OUTBOX_MAX_ATTEMPTS
Permanent Failure: status = 'failed' if max attempts exceeded

Automatic Recovery

If worker crashes, expired leases allow job to be reclaimed by another worker.

Lease Expiration: Set TX_OUTBOX_LEASE_MS longer than your worst-case transaction confirmation time to avoid premature lease expiration.

Monitor Outbox Status

curl -H 'x-api-key: dev-api-key' \
  http://localhost:3000/api/v1/outbox/stats

Response:

{
  "pending": 3,
  "processing": 2,
  "failed": 0,
  "done": 145
}

Adaptive Priority Fee Tuning

The execution tuner automatically calculates optimal priority fees based on recent network activity.

Configuration

.env

# Minimum priority fee (microlamports per compute unit)
SOLANA_PRIORITY_FEE_MIN_MICROLAMPORTS=2000

# Maximum priority fee (microlamports per compute unit)
SOLANA_PRIORITY_FEE_MAX_MICROLAMPORTS=200000

# Percentile of recent fees to target (1-99)
SOLANA_PRIORITY_FEE_PERCENTILE=75

# Multiplier applied to percentile fee (basis points)
# 1150 bps = 1.15x boost
SOLANA_PRIORITY_FEE_MULTIPLIER_BPS=1150

Fee Calculation Algorithm

Collect Recent Fees

Gather recent priority fees from the RPC endpoint’s recent blocks.

Calculate Percentile

Compute the configured percentile (e.g., 75th percentile).

const sortedFees = recentFees.filter(f => f >= 0).sort((a, b) => a - b)
const index = Math.floor((percentile / 100) * (sortedFees.length - 1))
const percentileFee = sortedFees[index]

Apply Multiplier

Boost the percentile fee by the configured multiplier.

const multiplier = PRIORITY_FEE_MULTIPLIER_BPS / 10000
const boostedFee = Math.floor(percentileFee * multiplier)

Clamp to Bounds

Ensure the final fee is within min/max bounds.

const finalFee = clamp(
  boostedFee > 0 ? boostedFee : minFee,
  minFee,
  maxFee
)

Compute Unit Estimation

The tuner also calculates compute unit limits based on transaction type:

const computeByType: Record<TransactionType, number> = {
  transfer_sol: 120_000,
  transfer_spl: 180_000,
  swap: 380_000,
  stake: 240_000,
  unstake: 240_000,
  lend_supply: 320_000,
  lend_borrow: 350_000,
  create_escrow: 320_000,
  accept_escrow: 280_000,
  release_escrow: 260_000,
  // ...
}

// Add buffer for additional instructions
const instructionBuffer = max(0, instructionCount - 1) * 15_000
const computeUnitLimit = clamp(baseUnits + instructionBuffer, 100_000, 1_200_000)

Compute budgets are automatically injected as the first instructions in every transaction.

Delta Guard Checks

Delta guard validates that observed balance changes match expected deltas to detect simulation drift or unexpected fees.

Configuration

.env

# Absolute tolerance in lamports for small variances
DELTA_GUARD_ABSOLUTE_TOLERANCE_LAMPORTS=10000

Expected Delta Calculation

const expectedLamportsDelta = (type: string, intent: Record<string, unknown>): number | null => {
  const amount = Number(intent['lamports'] ?? intent['amountLamports'] ?? intent['amount'] ?? 0)
  
  if (!Number.isFinite(amount) || amount <= 0) {
    return null // Cannot compute delta
  }
  
  // Outflows (negative delta)
  if (
    type === 'transfer_sol' ||
    type === 'stake' ||
    type === 'lend_supply' ||
    type === 'create_escrow'
  ) {
    return -amount
  }
  
  // Inflows (positive delta)
  if (
    type === 'unstake' ||
    type === 'release_escrow' ||
    type === 'refund_escrow'
  ) {
    return amount
  }
  
  return null // No delta check for this type
}

Variance Evaluation

Compare Deltas

Calculate the absolute difference between expected and observed deltas.

const absoluteDelta = Math.abs(observed - expected)

Check Absolute Tolerance

If within the configured tolerance, delta guard passes.

if (absoluteDelta <= DELTA_GUARD_ABSOLUTE_TOLERANCE_LAMPORTS) {
  return { ok: true, reason: 'within absolute tolerance' }
}

Calculate Variance BPS

Compute variance in basis points.

const denom = Math.max(1, Math.abs(expected))
const varianceBps = Math.round((absoluteDelta / denom) * 10000)

Check Threshold

Compare variance to the configured threshold (typically 200-500 bps).

const ok = varianceBps <= thresholdBps

Delta Guard Result

interface DeltaGuardResult {
  ok: boolean;
  expectedLamportsDelta: number | null;
  observedLamportsDelta: number | null;
  varianceBps: number | null;
  reason?: string;
}

Delta Guard Failures indicate a mismatch between simulation and execution. This could be due to:

Unexpected transaction fees
Rent changes
Protocol fee variations
Simulation drift (different blockhash/slot)

Investigate failed transactions to determine if the variance is acceptable.

Restart Recovery

The outbox queue automatically drains pending work on service restart.

Recovery Flow

Load Persistent State

On startup, the transaction engine loads the SQLite database with all pending and processing jobs.

Reclaim Expired Leases

Jobs in processing state with expired leases return to pending.

Resume Processing

The outbox worker starts polling and claiming jobs from the queue.

Process Backlog

All pending jobs are processed in FIFO order (oldest first).

No manual intervention is required after a restart. The system automatically recovers and continues processing.

Database Migration

The system includes automatic migration from legacy JSON snapshots to SQLite.

Migration Behavior

Automatic Detection - On first startup, checks for legacy snapshot file
One-time Import - Migrates jobs if SQLite database is empty
Preserves History - All job states, attempts, and metadata are preserved
Idempotent - Safe to restart during migration

if (existsSync(legacySnapshotFile) && dbRowCount === 0) {
  const snapshot = JSON.parse(readFileSync(legacySnapshotFile))
  
  db.transaction(() => {
    for (const job of snapshot.jobs) {
      db.insert(job) // INSERT OR IGNORE for idempotency
    }
  })
}

After successful migration, you can safely delete the legacy JSON snapshot file.

Getting Started

Architecture

Core Concepts

Operations

Security

Integration

RPC Pool Failover

Configuration

Health Scoring Algorithm

Failover Behavior

Monitor Pool Status

Durable Outbox Queue

Architecture

Outbox Actions

Job States

Configuration

Lease and Retry Semantics

Monitor Outbox Status

Adaptive Priority Fee Tuning

Configuration

Fee Calculation Algorithm

Compute Unit Estimation

Delta Guard Checks

Configuration

Expected Delta Calculation

Variance Evaluation

Delta Guard Result

Restart Recovery

Recovery Flow

Database Migration

Migration Behavior

Getting Started

Architecture

Core Concepts

Operations

Security

Integration

​RPC Pool Failover

​Configuration

​Health Scoring Algorithm

​Failover Behavior

​Monitor Pool Status

​Durable Outbox Queue

​Architecture

​Outbox Actions

​Job States

​Configuration

​Lease and Retry Semantics

​Monitor Outbox Status

​Adaptive Priority Fee Tuning

​Configuration

​Fee Calculation Algorithm

​Compute Unit Estimation

​Delta Guard Checks

​Configuration

​Expected Delta Calculation

​Variance Evaluation

​Delta Guard Result

​Restart Recovery

​Recovery Flow

​Database Migration

​Migration Behavior

RPC Pool Failover

Configuration

Health Scoring Algorithm

Failover Behavior

Monitor Pool Status

Durable Outbox Queue

Architecture

Outbox Actions

Job States

Configuration

Lease and Retry Semantics

Monitor Outbox Status

Adaptive Priority Fee Tuning

Configuration

Fee Calculation Algorithm

Compute Unit Estimation

Delta Guard Checks

Configuration

Expected Delta Calculation

Variance Evaluation

Delta Guard Result

Restart Recovery

Recovery Flow

Database Migration

Migration Behavior