Building a Multi-Model Intelligent Gateway: Ciyuano's Technical Architecture Revealed
Overview
The core of Ciyuano is a smart model gateway. This article walks you through its architecture design.
System Architecture
User Request β Authentication & Billing β Model Router β [DeepSeek/GLM/Qwen Channel] β Streaming Response
Core Modules
1. Channel Manager
Each upstream model provider is abstracted as a "channel":
typescript
interface Channel {
id: number;
provider: string; // deepseek | zhipu | aliyun
modelId: string; // deepseek-v4 | glm-5
baseUrl: string; // upstream API address
apiKey: string; // upstream API Key
weight: number; // load balancing weight
isEnabled: boolean; // whether enabled
healthStatus: string; // healthy | degraded | down
}
2. Smart Router
Routing decision flow:
Model parsing: auto β all channel candidates; specified model β match channels
Health filtering: exclude channels with isEnabled=false or healthStatus=down
Weighted selection: weighted random by weight, higher weight β higher probability of selection
Failover: request fails β auto-mark β retry next channel
python
def select_channel(model, channels):
candidates = [c for c in channels
if c.is_enabled and c.model_id == model]
if not candidates:
raise NoChannelAvailable()
total_weight = sum(c.weight for c in candidates)
r = random.uniform(0, total_weight)
cumulative = 0
for c in candidates:
cumulative += c.weight
if r <= cumulative:
return c
3. Health Check
The backend regularly sends lightweight probe requests to each channel, marking three states: healthy / degraded / down.
4. Protocol Conversion
API formats from upstream vendors vary widely, uniformly convert to OpenAI format output:
Upstream β Unified Intermediate Representation β OpenAI format response
5. Real-time Billing
python
cost = (
prompt_tokens * channel.prompt_price_per_1k / 1000 +
completion_tokens * channel.completion_price_per_1k / 1000
)
key.balance -= cost
key.total_cost += cost
Key Design Decisions
Why use SQLite instead of PostgreSQL?
Simple deployment: zero configuration, one file does it all
Sufficient performance: SQLite's read/write performance is more than enough for medium-scale API services
LiteFS / Turso and other solutions can easily scale to multi-node
Why not do model fine-tuning?
Our positioning is a "gateway" rather than a "model factory". Focus on channels and routing.
Why not introduce a caching layer?
Same input in AI conversations may not produce the same output (temperature, random seed). Blind caching may undermine generation diversity.
Performance Metrics
The additional latency of Ciyuano is typically 50-150ms, mainly from authentication, billing, and routing decisions.
When streaming, the time to first token (TTFT) is almost unaffected because we use a transparent proxy rather than buffered forwarding.
Summary
A good API middle station technically needs to do three things: compatibility, routing, billing. Doing these three well is the greatest value to developers.
π Related Articles
Building an Intelligent Customer Service Robot Using AI API: A Complete Practical Solution
Intelligent customer service robots are one of the most common application scenarios of AI APIs. This article will guide you from scratch to build an intelligent customer service system that supports multi-turn dialogue, knowledge base retrieval, and streaming output.
Dev PracticeBuild a DeepSeek chatbot using Streamlit and Ciyuano
Build a complete AI chat web application with Streamlit in 30 minutes, supporting streaming output, conversation memory, and multi-model switching. Entirely pure Python, zero frontend code.
Dev PracticeHands-on AI Agent Development: Building Autonomous Decision-Making Agents
Build an AI Agent from scratch with tool calling, memory management, and autonomous planning capabilities, covering mainstream frameworks such as ReAct and Plan-and-Execute.
π¬ Comments are not yet available, stay tuned