Key Pool Management and Intelligent Routing: The Core of Building a High-Availability API Relay Service

2026/06/08·6 min read·37 views

In AI API relay services, stability is the top priority. Users call your interface expecting to always get results — even if an upstream provider experiences temporary failures, rate limits, or even goes offline. TokenCircle achieves 99.7% service availability through a comprehensive Key pool management and intelligent routing system. This article will deeply dissect the design concepts and technical implementation of this system.

Why Key Pool Management is Needed

Traditional API relay services usually adopt a simple "one channel, one Key" model. This approach has obvious single points of failure:

Upstream rate limiting: Providers set RPM/TPM limits on a single API Key. Once a 429 error is triggered, all requests through that channel fail
Balance exhausted: When a Key's quota runs out, the entire channel is cut off, requiring manual intervention to top up or switch
Key invalidation: The upstream bans the Key due to violation detection, security policies, etc., causing direct service interruption
Load imbalance: All requests hit the same Key, failing to leverage the aggregated capacity of multiple Keys

The core idea of a Key pool is simple: Treat multiple upstream Keys of each channel as a resource pool, uniformly scheduled by a router. This way, an anomaly in a single Key does not affect the overall service.

Key Pool Management and Intelligent Routing Architecture

Data Model Design

TokenCircle's Key pool system is built on the channel_keys table, where each channel can be associated with multiple upstream Keys:

CREATE TABLE channel_keys (
  id            SERIAL PRIMARY KEY,
  channel_id    INTEGER NOT NULL,
  api_key       TEXT NOT NULL,
  weight        INTEGER DEFAULT 1,        -- Weight for weighted round-robin
  daily_limit   NUMERIC,                  -- Daily request limit
  monthly_limit NUMERIC,                  -- Monthly request limit
  health_status VARCHAR(16) DEFAULT 'healthy',  -- healthy/degraded/unhealthy
  consecutive_failures INTEGER DEFAULT 0, -- Consecutive failure count
  last_used_at  INTEGER,
  created_at    INTEGER DEFAULT EXTRACT(epoch FROM now())::integer
);

Several key design decisions:

Weight field: Different Keys may come from different pricing plans; Keys with higher weight should bear more traffic. For example, an enterprise Key has weight 5, personal Key weight 1
Dual limits: Daily limits prevent burst consumption, monthly limits control long-term costs. The two dimensions are checked independently
Three-level health status: Not a simple binary "available/unavailable" judgment. The degraded state indicates available but risky; routing reduces its priority

Intelligent Routing Algorithm

The core of routing is a multi-layer decision tree, executed in order of priority:

Layer 1: Channel Matching

Based on the model parameter in the request, look up the corresponding channel list in model_map. A model may map to multiple channels (e.g., deepseek-v4 goes through both DeepSeek official and SiliconFlow), which provides the basis for failover.

Layer 2: Weighted Random Selection

In the candidate Key list, perform weighted random selection (Weighted Round-Robin) based on the weight field. This is more flexible than simple round-robin:

function weightedRandomSelect(keys) {
  const totalWeight = keys.reduce((sum, k) => sum + k.weight, 0);
  let random = Math.random() * totalWeight;
  for (const key of keys) {
    random -= key.weight;
    if (random <= 0) return key;
  }
  return keys[keys.length - 1];
}

A Key with weight 5 has 5 times the probability of being selected compared to a Key with weight 1. This way, high-spec Keys naturally handle more traffic, but do not completely overwhelm low-spec Keys.

Layer 3: Health Check Filtering

Before selection, filter out unhealthy Keys:

healthy: Participates normally in routing, no additional restrictions
degraded: Participates in routing with reduced weight (actual weight = original weight × 0.3)
unhealthy: Completely skipped, not participating in this round of selection

Layer 4: Limit Check

After selecting a Key, check its daily and monthly limits. If exceeded, mark it as degraded and re-select.

Failure Detection and Automatic Recovery

This is the most critical part of the entire system. TokenCircle's relay layer updates the Key's health status based on the response status after each request:

async function updateKeyHealth(keyId, response) {
  if (response.status === 429 || response.status === 402) {
    // Rate limited or insufficient balance
    await incrementFailures(keyId);
  } else if (response.status >= 500) {
    // Upstream server error
    await incrementFailures(keyId);
  } else if (response.ok) {
    // Success, reset failure count
    await resetFailures(keyId);
  }
}

async function incrementFailures(keyId) {
  const key = await getKey(keyId);
  const newCount = key.consecutive_failures + 1;

  let newStatus = 'degraded';
  if (newCount >= 3) newStatus = 'unhealthy';

  await updateKey(keyId, {
    consecutive_failures: newCount,
    health_status: newStatus
  });
}

After 3 consecutive failures, it is marked unhealthy. Once marked, the system starts a 5-minute recovery timer:

async function scheduleRecovery(keyId) {
  setTimeout(async () => {
    await updateKey(keyId, {
      health_status: 'degraded',   // First downgrade to degraded
      consecutive_failures: 0
    });
    // If successful in degraded state, auto-recover to healthy
  }, 5 * 60 * 1000);
}

Recovery does not directly jump back to healthy; instead, it first goes to degraded for probing. If the request succeeds during the probing period, it upgrades to healthy. This gradual recovery avoids the "jitter" problem — if the upstream only briefly recovers and then fails again, the degraded state can quickly degrade again.

Request Retry Strategy

When a Key fails, the relay layer does not immediately return an error to the user, but automatically tries the next available Key:

async function retryWithFallback(channelId, request) {
  const keys = await getHealthyKeys(channelId);
  const sorted = keys.sort((a, b) => {
    // healthy > degraded > unhealthy
    const order = { healthy: 0, degraded: 1, unhealthy: 2 };
    return order[a.health_status] - order[b.health_status];
  });

  for (const key of sorted) {
    try {
      const response = await callUpstream(key.api_key, request);
      await updateKeyHealth(key.id, response);
      return response;
    } catch (err) {
      await updateKeyHealth(key.id, { status: 500 });
      continue; // Try next key
    }
  }

  // All keys failed, try fallback channel
  return await tryFallbackChannel(request);
}

The entire retry process is transparent to the user. What the user perceives is just "a slightly slower response", not "the service is down".

Multi-Channel Failover

Key pools solve high availability within a single channel, but true disaster recovery requires cross-channel failover. TokenCircle's model routing supports mapping one model to multiple channels:

// model_map example
{
  "deepseek-v4": [
    { "channel_id": 1, "priority": 1 },  // DeepSeek official
    { "channel_id": 5, "priority": 2 },  // SiliconFlow
    { "channel_id": 8, "priority": 3 }   // Other alternatives
  ]
}

When all Keys of a channel are unavailable, it automatically switches to the next priority channel. This two-level fault tolerance (Key level + Channel level) ensures extremely high availability.

Monitoring and Observability

TokenCircle's backend provides a complete monitoring view for each Key:

Real-time health status: Green/yellow/red three-color indicator, easy to see at a glance
Request volume statistics: Daily/monthly request counts, with limit warnings
Error details: HTTP status code and error message of the most recent error
Cost tracking: Cumulative consumption amount for each Key

Administrators can adjust Key weights, limits, add new Keys, or disable abnormal Keys at any time in the backend. All changes take effect in real time without restarting the service.

Actual Results

After launching the Key pool system, TokenCircle's service quality has significantly improved:

Availability: Increased from 98.5% to 99.7%, users hardly perceive upstream failures
Failure recovery time: Reduced from 10-30 minutes of manual intervention to seconds for automatic switching
Operational cost: No longer need 7×24 on-duty to watch upstream status
Capacity elasticity: Linearly scale throughput by increasing the number of Keys

Conclusion

Key pool management and intelligent routing are the infrastructure of API relay services. TokenCircle's implementation follows several core principles: weighted scheduling ensures reasonable resource allocation, multi-level health checks achieve precise fault detection, gradual recovery avoids state jitter, and two-level fault tolerance ensures availability in extreme scenarios. This system has been running stably, supporting tens of thousands of API calls per day for TokenCircle.

If you are building a similar API gateway or relay service, the Key pool is a feature worth prioritizing. Its complexity is low, but its improvement to system reliability is a qualitative leap.

Top 10 AI Industry Trend Predictions for 2026

From multimodality to Agent, from open source to commercialization, what important changes will the AI industry have in 2026?

AI News

DeepSeek V4 Released: A New Milestone for Domestic Large Models

DeepSeek releases V4 series models, with comprehensive upgrades in reasoning, programming, and multilingual capabilities. Flash version price as low as $1/M tokens.

Comments are not yet available, stay tuned

← Back to Blog