Complete Guide to Local Deployment of Ollama: Run Large Models on Your Own Computer
Ollama Local Deployment Complete Guide
Want to run large models on your own computer? Ollama makes it simple. This guide will take you from installation to deployment in under 10 minutes.
What is Ollama?
Ollama is an open-source local large model running tool that supports macOS, Linux, and Windows. Its features include:
- Minimal Installation: Done with a single command
- Model Management: Automatic download, quantization, and running
- API Compatibility: Provides an OpenAI-compatible API interface
- No GPU Required: CPU can also run (though GPU is faster)
Installation
macOS / Linux
curl -fsSL https://ollama.ai/install.sh | sh
Windows
Visit ollama.ai to download the installer, then double-click to install.
Run Your First Model
# Run DeepSeek V4
ollama run deepseek-v4
# Run Qwen 2.5
ollama run qwen2.5:14b
# Run LLaMA 3
ollama run llama3:8b
The first run will automatically download the model (a few GB to tens of GB), subsequent launches are instant.
Common Commands
| Command | Description |
|---|---|
ollama list |
View downloaded models |
ollama pull model name |
Download a model |
ollama run model name |
Run a model (interactive) |
ollama rm model name |
Delete a model |
ollama serve |
Start API service |
Using the API
After starting, Ollama provides an OpenAI-compatible API at http://localhost:11434:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:11434/v1",
api_key="ollama" # fill in arbitrarily
)
response = client.chat.completions.create(
model="deepseek-v4",
messages=[{"role": "user", "content": "Write a quick sort in Python"}]
)
print(response.choices[0].message.content)
Hardware Recommendations
| Model Size | Minimum Memory | Recommended Configuration |
|---|---|---|
| 7B | 8 GB | 16 GB RAM |
| 14B | 16 GB | 32 GB RAM |
| 70B | 64 GB | 64 GB RAM + GPU |
Frequently Asked Questions
Q: Can I run without a GPU?
A: Yes, a 7B model can run smoothly on a laptop with 16GB of memory, just slightly slower.
Q: Where are models stored?
A: On macOS, they are at ~/.ollama/models, on Linux at /usr/share/ollama/.ollama/models.
Q: How to connect to Token Circle?
A: Ollama's API is compatible with OpenAI. You can replace the local call with Token Circle's API Key to achieve a cloud + local hybrid deployment.
Conclusion
Ollama is the best choice for running large models locally. Whether you want to protect privacy, save API costs, or use AI offline, Ollama can meet your needs.
π Related Articles
π¬ Comments are not yet available, stay tuned