Building an Intelligent Customer Service Robot Using AI API: A Complete Practical Solution

2026/06/07·6 min read·38 views

Building an Intelligent Customer Service Robot

Intelligent customer service robots are one of the most common application scenarios for AI APIs. This article will guide you from scratch to build an intelligent customer service system that supports multi-turn conversations, knowledge base retrieval, and streaming output.

Why Use AI for Customer Service

Traditional customer service systems face several core pain points:

High labor costs: 7x24 hour coverage requires a large workforce
Slow response speed: Waiting in line during peak hours affects user experience
Inconsistent knowledge: Different agents have inconsistent responses
Cannot scale: The customer service team struggles to expand synchronously with business growth

The advantages of AI customer service include: sub-second response, 24/7 availability, consistent answers, and controllable costs. With the Ciyuano API, you can get an intelligent customer service that understands natural language and supports multi-turn conversations in just a few lines of code.

System Architecture

A complete AI customer service system includes the following components:

Component	Responsibility	Technology Choice
Frontend Interface	User chat entry point	React / Vue / Mini Program
Backend Service	Business logic, session management	Node.js / Python / Go
Knowledge Base	FAQ, product documentation	Vector database / local files
LLM API	Natural language understanding and generation	Ciyuano API

Basic Implementation

1. Initialize Client

from openai import OpenAI

client = OpenAI(
    base_url="https://www.ciyuano.com/v1",
    api_key="sk-your-api-key"
)

2. Define System Prompt

The system prompt determines the role, tone, and behavioral boundaries of the customer service agent:

SYSTEM_PROMPT = """You are「Ciyuano」's AI assistant。

Your responsibilities：
1. Answer user questions about API integration、pricing、model selection issues
2. Help troubleshoot common integration issues
3. For issues that cannot be resolved，Guide user to contact human support

Rules：
- Only answer questions related to Ciyuano products
- Don't fabricate uncertain information，Inform user confirmation is needed
- Answer concisely，Avoid verbosity
- Answer in the user's language"""

3. Implement Multi-turn Conversation

class CustomerServiceBot:
    def __init__(self):
        self.client = OpenAI(
            base_url="https://www.ciyuano.com/v1",
            api_key="sk-your-api-key"
        )
        self.conversations = {}  # session_id -> messages

    def chat(self, session_id: str, user_message: str) -> str:
        # Get or create session
        if session_id not in self.conversations:
            self.conversations[session_id] = [
                {"role": "system", "content": SYSTEM_PROMPT}
            ]

        messages = self.conversations[session_id]
        messages.append({"role": "user", "content": user_message})

        # Call LLM
        response = self.client.chat.completions.create(
            model="deepseek-v4",
            messages=messages,
            temperature=0.3,
            max_tokens=1000
        )

        assistant_message = response.choices[0].message.content
        messages.append({"role": "assistant", "content": assistant_message})

        return assistant_message

RAG Knowledge Base Retrieval

Relying solely on the general knowledge of LLMs cannot answer business-specific questions. RAG (Retrieval-Augmented Generation) solves this by first retrieving relevant documents and then having the LLM generate answers based on the retrieved results.

Knowledge Base Construction Process

import numpy as np

class KnowledgeBase:
    def __init__(self):
        self.documents = []
        self.embeddings = []

    def add_document(self, content: str, metadata: dict):
        """Add documents to knowledge base"""
        embedding = self._get_embedding(content)
        self.documents.append({"content": content, "metadata": metadata})
        self.embeddings.append(embedding)

    def search(self, query: str, top_k: int = 3) -> list:
        """Retrieve most relevant documents"""
        query_embedding = self._get_embedding(query)
        similarities = [
            self._cosine_similarity(query_embedding, emb)
            for emb in self.embeddings
        ]
        top_indices = np.argsort(similarities)[-top_k:][::-1]
        return [self.documents[i] for i in top_indices]

    def _get_embedding(self, text: str) -> list:
        """Get text vectors"""
        response = client.embeddings.create(
            model="text-embedding-3-small",
            input=text
        )
        return response.data[0].embedding

    def _cosine_similarity(self, a, b) -> float:
        """Calculate cosine similarity"""
        return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

RAG-Enhanced Conversation

def chat_with_rag(self, session_id: str, user_message: str) -> str:
    # 1. Retrieve relevant knowledge
    relevant_docs = self.knowledge_base.search(user_message, top_k=3)
    context = "
".join([doc["content"] for doc in relevant_docs])

    # 2. Assemble prompt with knowledge
    enhanced_prompt = f"""{SYSTEM_PROMPT}

The following is relevant information retrieved from the knowledge base，Please answer user questions based on this information：

{context}
"""

    messages = [
        {"role": "system", "content": enhanced_prompt},
        {"role": "user", "content": user_message}
    ]

    # 3. Call LLM
    response = self.client.chat.completions.create(
        model="deepseek-v4",
        messages=messages,
        temperature=0.3
    )

    return response.choices[0].message.content

Streaming Output

Conversation Flow

Streaming output allows users to see a "typing effect," greatly improving the waiting experience:

def chat_stream(self, session_id: str, user_message: str):
    """Stream response，Output character by character"""
    messages = self._get_messages(session_id, user_message)

    stream = self.client.chat.completions.create(
        model="deepseek-v4",
        messages=messages,
        temperature=0.3,
        stream=True  # Enable streaming
    )

    full_response = ""
    for chunk in stream:
        if chunk.choices[0].delta.content:
            token = chunk.choices[0].delta.content
            full_response += token
            yield token  # Stream to frontend character by character

    # Save complete reply to session history
    messages.append({"role": "assistant", "content": full_response})

The frontend receives streaming data via SSE (Server-Sent Events):

// Frontend JavaScript
async function sendMessage(message) {
  const response = await fetch('/api/chat', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ message })
  });

  const reader = response.body.getReader();
  const decoder = new TextDecoder();

  while (true) {
    const { done, value } = await reader.read();
    if (done) break;

    const text = decoder.decode(value);
    appendToChat(text);  // Display character by character
  }
}

Model Selection Recommendations

Scenario	Recommended Model	Reason
Simple FAQ Q&A	Qwen Plus	Low cost, fast response
Complex Business Consultation	DeepSeek V4	Strong reasoning ability, good Chinese
Requires Tool Calling	MiMo v2.5 PRO	Good Function Calling support
Cost-effective Priority	MiMo v2.5	Limited time free

Optimization Suggestions

Session Management

Set session expiration time (recommended to auto-close after 30 minutes of inactivity)
Limit historical message rounds (recommend retaining the last 10 rounds) to avoid token overflow
Summarize and compress long conversations, keep key information

Prompt Optimization

Define role boundaries: tell the model what to answer and what not to answer
Provide Few-shot examples: give standard Q&A pairs to unify the response style
Set fallback strategy: when the model is uncertain, guide the user to contact human service

Cost Control

Use small models for simple questions, large models for complex questions (routing strategy)
Set max_tokens to limit output length
Cache responses for repeated questions

Complete Example Code

Below is a complete example that can be run directly:

from openai import OpenAI

client = OpenAI(
    base_url="https://www.ciyuano.com/v1",
    api_key="sk-your-api-key"
)

SYSTEM_PROMPT = """You are an AI assistant，Be concise、accurately answer user questions。"""

def chat(user_message: str, history: list = None):
    if history is None:
        history = []

    messages = [{"role": "system", "content": SYSTEM_PROMPT}]
    messages.extend(history)
    messages.append({"role": "user", "content": user_message})

    response = client.chat.completions.create(
        model="deepseek-v4",
        messages=messages,
        temperature=0.3,
        stream=True
    )

    full_response = ""
    for chunk in response:
        if chunk.choices[0].delta.content:
            token = chunk.choices[0].delta.content
            full_response += token
            print(token, end="", flush=True)

    history.append({"role": "user", "content": user_message})
    history.append({"role": "assistant", "content": full_response})
    return full_response, history

# Interactive conversation
if __name__ == "__main__":
    history = []
    print("AI assistant started，Input 'quit' Exit
")
    while True:
        user_input = input("User: ")
        if user_input.lower() == "quit":
            break
        print("Support: ", end="")
        response, history = chat(user_input, history)
        print("
")

Summary

The core steps to build an AI customer service system:

Choose an API: Register with Ciyuano, get an API Key
Design the prompt: Define the role, responsibilities, and answering rules of the customer service agent
Implement conversation: Multi-turn conversation management + streaming output
Integrate knowledge base: RAG retrieval augmentation to answer business-specific questions
Continuous optimization: Adjust prompts and knowledge base based on user feedback

The entire process requires only a few dozen lines of Python code, combined with the Ciyuano API. For small and medium-sized businesses, this solution can cover more than 80% of customer service scenarios.

Next Steps: Try integrating Function Calling to enable the customer service bot to perform practical operations such as checking orders and logistics.

View API Documentation · Select Model

← Back to Blog