Complete Guide to Local Deployment of Ollama: Running Large Models on Your Own Computer | Ciyuano

Complete Guide to Local Deployment of Ollama: Run Large Models on Your Own Computer

2026/06/04·2 min read·81 views

Ollama Local Deployment Complete Guide

Want to run large models on your own computer? Ollama makes it simple. This guide will take you from installation to deployment in under 10 minutes.

What is Ollama?

Ollama is an open-source local large model running tool that supports macOS, Linux, and Windows. Its features include:

Minimal Installation: Done with a single command
Model Management: Automatic download, quantization, and running
API Compatibility: Provides an OpenAI-compatible API interface
No GPU Required: CPU can also run (though GPU is faster)

Installation

macOS / Linux

curl -fsSL https://ollama.ai/install.sh | sh

Windows

Visit ollama.ai to download the installer, then double-click to install.

Run Your First Model

# Run DeepSeek V4

ollama run deepseek-v4

# Run Qwen 2.5

ollama run qwen2.5:14b

# Run LLaMA 3

ollama run llama3:8b

The first run will automatically download the model (a few GB to tens of GB), subsequent launches are instant.

Common Commands

Command	Description
`ollama list`	View downloaded models
`ollama pull model name`	Download a model
`ollama run model name`	Run a model (interactive)
`ollama rm model name`	Delete a model
`ollama serve`	Start API service

Using the API

After starting, Ollama provides an OpenAI-compatible API at http://localhost:11434:

from openai import OpenAI

client = OpenAI(

base_url="http://localhost:11434/v1",

api_key="ollama" # fill in arbitrarily

)

response = client.chat.completions.create(

model="deepseek-v4",

messages=[{"role": "user", "content": "Write a quick sort in Python"}]

)

print(response.choices[0].message.content)

Hardware Recommendations

Model Size	Minimum Memory	Recommended Configuration
7B	8 GB	16 GB RAM
14B	16 GB	32 GB RAM
70B	64 GB	64 GB RAM + GPU

Frequently Asked Questions

Q: Can I run without a GPU?

A: Yes, a 7B model can run smoothly on a laptop with 16GB of memory, just slightly slower.

Q: Where are models stored?

A: On macOS, they are at ~/.ollama/models, on Linux at /usr/share/ollama/.ollama/models.

Q: How to connect to Token Circle?

A: Ollama's API is compatible with OpenAI. You can replace the local call with Token Circle's API Key to achieve a cloud + local hybrid deployment.

Conclusion

Ollama is the best choice for running large models locally. Whether you want to protect privacy, save API costs, or use AI offline, Ollama can meet your needs.

The 10 most noteworthy AI open-source projects on GitHub

From large models to development frameworks, these open-source projects are shaping the future of AI development.

Comments are not yet available, stay tuned

← Back to Blog