Key Pool Management and Intelligent Routing: The Core of Building a High-Availability API Relay Service
In AI API relay services, stability is the top priority. Users call your interface expecting to always get results β even if an upstream provider experiences temporary failures, rate limits, or even goes offline. TokenCircle achieves 99.7% service availability through a comprehensive Key pool management and intelligent routing system. This article will deeply dissect the design concepts and technical implementation of this system.
Why Key Pool Management is Needed
Traditional API relay services usually adopt a simple "one channel, one Key" model. This approach has obvious single points of failure:
- Upstream rate limiting: Providers set RPM/TPM limits on a single API Key. Once a 429 error is triggered, all requests through that channel fail
- Balance exhausted: When a Key's quota runs out, the entire channel is cut off, requiring manual intervention to top up or switch
- Key invalidation: The upstream bans the Key due to violation detection, security policies, etc., causing direct service interruption
- Load imbalance: All requests hit the same Key, failing to leverage the aggregated capacity of multiple Keys
The core idea of a Key pool is simple: Treat multiple upstream Keys of each channel as a resource pool, uniformly scheduled by a router. This way, an anomaly in a single Key does not affect the overall service.
Data Model Design
TokenCircle's Key pool system is built on the channel_keys table, where each channel can be associated with multiple upstream Keys:
CREATE TABLE channel_keys (
id SERIAL PRIMARY KEY,
channel_id INTEGER NOT NULL,
api_key TEXT NOT NULL,
weight INTEGER DEFAULT 1, -- Weight for weighted round-robin
daily_limit NUMERIC, -- Daily request limit
monthly_limit NUMERIC, -- Monthly request limit
health_status VARCHAR(16) DEFAULT 'healthy', -- healthy/degraded/unhealthy
consecutive_failures INTEGER DEFAULT 0, -- Consecutive failure count
last_used_at INTEGER,
created_at INTEGER DEFAULT EXTRACT(epoch FROM now())::integer
);
Several key design decisions:
- Weight field: Different Keys may come from different pricing plans; Keys with higher weight should bear more traffic. For example, an enterprise Key has weight 5, personal Key weight 1
- Dual limits: Daily limits prevent burst consumption, monthly limits control long-term costs. The two dimensions are checked independently
- Three-level health status: Not a simple binary "available/unavailable" judgment. The degraded state indicates available but risky; routing reduces its priority
Intelligent Routing Algorithm
The core of routing is a multi-layer decision tree, executed in order of priority:
Layer 1: Channel Matching
Based on the model parameter in the request, look up the corresponding channel list in model_map. A model may map to multiple channels (e.g., deepseek-v4 goes through both DeepSeek official and SiliconFlow), which provides the basis for failover.
Layer 2: Weighted Random Selection
In the candidate Key list, perform weighted random selection (Weighted Round-Robin) based on the weight field. This is more flexible than simple round-robin:
function weightedRandomSelect(keys) {
const totalWeight = keys.reduce((sum, k) => sum + k.weight, 0);
let random = Math.random() * totalWeight;
for (const key of keys) {
random -= key.weight;
if (random <= 0) return key;
}
return keys[keys.length - 1];
}
A Key with weight 5 has 5 times the probability of being selected compared to a Key with weight 1. This way, high-spec Keys naturally handle more traffic, but do not completely overwhelm low-spec Keys.
Layer 3: Health Check Filtering
Before selection, filter out unhealthy Keys:
- healthy: Participates normally in routing, no additional restrictions
- degraded: Participates in routing with reduced weight (actual weight = original weight Γ 0.3)
- unhealthy: Completely skipped, not participating in this round of selection
Layer 4: Limit Check
After selecting a Key, check its daily and monthly limits. If exceeded, mark it as degraded and re-select.
Failure Detection and Automatic Recovery
This is the most critical part of the entire system. TokenCircle's relay layer updates the Key's health status based on the response status after each request:
async function updateKeyHealth(keyId, response) {
if (response.status === 429 || response.status === 402) {
// Rate limited or insufficient balance
await incrementFailures(keyId);
} else if (response.status >= 500) {
// Upstream server error
await incrementFailures(keyId);
} else if (response.ok) {
// Success, reset failure count
await resetFailures(keyId);
}
}
async function incrementFailures(keyId) {
const key = await getKey(keyId);
const newCount = key.consecutive_failures + 1;
let newStatus = 'degraded';
if (newCount >= 3) newStatus = 'unhealthy';
await updateKey(keyId, {
consecutive_failures: newCount,
health_status: newStatus
});
}
After 3 consecutive failures, it is marked unhealthy. Once marked, the system starts a 5-minute recovery timer:
async function scheduleRecovery(keyId) {
setTimeout(async () => {
await updateKey(keyId, {
health_status: 'degraded', // First downgrade to degraded
consecutive_failures: 0
});
// If successful in degraded state, auto-recover to healthy
}, 5 * 60 * 1000);
}
Recovery does not directly jump back to healthy; instead, it first goes to degraded for probing. If the request succeeds during the probing period, it upgrades to healthy. This gradual recovery avoids the "jitter" problem β if the upstream only briefly recovers and then fails again, the degraded state can quickly degrade again.
Request Retry Strategy
When a Key fails, the relay layer does not immediately return an error to the user, but automatically tries the next available Key:
async function retryWithFallback(channelId, request) {
const keys = await getHealthyKeys(channelId);
const sorted = keys.sort((a, b) => {
// healthy > degraded > unhealthy
const order = { healthy: 0, degraded: 1, unhealthy: 2 };
return order[a.health_status] - order[b.health_status];
});
for (const key of sorted) {
try {
const response = await callUpstream(key.api_key, request);
await updateKeyHealth(key.id, response);
return response;
} catch (err) {
await updateKeyHealth(key.id, { status: 500 });
continue; // Try next key
}
}
// All keys failed, try fallback channel
return await tryFallbackChannel(request);
}
The entire retry process is transparent to the user. What the user perceives is just "a slightly slower response", not "the service is down".
Multi-Channel Failover
Key pools solve high availability within a single channel, but true disaster recovery requires cross-channel failover. TokenCircle's model routing supports mapping one model to multiple channels:
// model_map example
{
"deepseek-v4": [
{ "channel_id": 1, "priority": 1 }, // DeepSeek official
{ "channel_id": 5, "priority": 2 }, // SiliconFlow
{ "channel_id": 8, "priority": 3 } // Other alternatives
]
}
When all Keys of a channel are unavailable, it automatically switches to the next priority channel. This two-level fault tolerance (Key level + Channel level) ensures extremely high availability.
Monitoring and Observability
TokenCircle's backend provides a complete monitoring view for each Key:
- Real-time health status: Green/yellow/red three-color indicator, easy to see at a glance
- Request volume statistics: Daily/monthly request counts, with limit warnings
- Error details: HTTP status code and error message of the most recent error
- Cost tracking: Cumulative consumption amount for each Key
Administrators can adjust Key weights, limits, add new Keys, or disable abnormal Keys at any time in the backend. All changes take effect in real time without restarting the service.
Actual Results
After launching the Key pool system, TokenCircle's service quality has significantly improved:
- Availability: Increased from 98.5% to 99.7%, users hardly perceive upstream failures
- Failure recovery time: Reduced from 10-30 minutes of manual intervention to seconds for automatic switching
- Operational cost: No longer need 7Γ24 on-duty to watch upstream status
- Capacity elasticity: Linearly scale throughput by increasing the number of Keys
Conclusion
Key pool management and intelligent routing are the infrastructure of API relay services. TokenCircle's implementation follows several core principles: weighted scheduling ensures reasonable resource allocation, multi-level health checks achieve precise fault detection, gradual recovery avoids state jitter, and two-level fault tolerance ensures availability in extreme scenarios. This system has been running stably, supporting tens of thousands of API calls per day for TokenCircle.
If you are building a similar API gateway or relay service, the Key pool is a feature worth prioritizing. Its complexity is low, but its improvement to system reliability is a qualitative leap.
π Related Articles
Top 10 AI Industry Trend Predictions for 2026
From multimodality to Agent, from open source to commercialization, what important changes will the AI industry have in 2026?
AI NewsDeepSeek V4 Released: A New Milestone for Domestic Large Models
DeepSeek releases V4 series models, with comprehensive upgrades in reasoning, programming, and multilingual capabilities. Flash version price as low as $1/M tokens.
π¬ Comments are not yet available, stay tuned