How to Run DeepSeek R1 Locally

DeepSeek R1 is making waves as a free, open-source alternative to OpenAI’s $200/month model. It offers impressive performance at a fraction of the cost, making it an excellent option for developers and AI enthusiasts alike.

In this guide, I’ll walk you through setting up DeepSeek R1 on your local machine (even without a GPU) and deploying it on a cloud GPU for more extensive workloads.

Run DeepSeek R1 Locally
1. Install Ollama
2. Download and Run the Model
Serving DeepSeek R1 Over HTTP
Chat via Web UI
Deploy to a GPU Instance
1. Launch a GPU Instance
2. Install Ollama
3. Serve DeepSeek R1 via API
4. Interact via API
Wrapping Up

Run DeepSeek R1 Locally

The first step is to install the Ollama CLI, a simple tool that allows you to download and run various open-source models, including DeepSeek R1.

1. Install Ollama

For macOS and Windows, follow the installation instructions on the Ollama website.

For Linux, use the one-liner installation script:

curl -fsSL https://ollama.com/install.sh | sh

Once installed, you’re ready to download and run a DeepSeek R1 model based on your hardware.

2. Download and Run the Model

DeepSeek R1 is available in various sizes, balancing performance and resource requirements:

Model	Size	Recommended Hardware	Command
Smaller 1.5B model	1.1GB	Consumer CPU	`ollama run deepseek-r1:1.5b`
Default 7B model	4.7GB	Consumer GPU	`ollama run deepseek-r1`
Larger 70B model	24GB	High-end GPU	`ollama run deepseek-r1:70b`
Full DeepSeek R1 (671B)	336GB	High-end GPU Clusters	`ollama run deepseek-r1:671b`

For now, let’s run the lightweight 1.5B model just to try it out:

ollama run deepseek-r1:1.5b

This will automatically download the model and start an interactive chat in your terminal:

>>> what can you do?
Hi there! I'm DeepSeek-R1, an AI assistant. How can I help you today?

Serving DeepSeek R1 Over HTTP

You can turn DeepSeek R1 into an API server by running:

ollama serve

This starts a server at http://localhost:11434, allowing you to send HTTP requests.

Example request using curl:

curl http://localhost:11434/api/generate -d '{
  "model": "deepseek-r1:1.5b",
  "stream": false,
  "prompt": "What do you think about ChatGPT?"
}'

Full API documentation is available here.

Chat via Web UI

To chat with DeepSeek R1 via a web interface, you can use Open Web UI:

docker run -p 3000:8080 --rm --name open-webui \
  --add-host=host.docker.internal:host-gateway \
  -v open-webui:/app/backend/data \
  ghcr.io/open-webui/open-webui:main

This launches a chat UI at http://localhost:3000, assuming Ollama is running locally.

Deploy to a GPU Instance

For more intensive workloads, deploying DeepSeek R1 to a cloud GPU instance provides better performance. Here’s how to do it using DataCrunch (or any other cloud provider).

1. Launch a GPU Instance

Select an instance with enough VRAM for your chosen model. For example, running the 70B model requires at least 24GB VRAM.

Example: Spot instance with Nvidia A100 (40GB VRAM) at ~€0.47/h

2. Install Ollama on Cloud

On your cloud instance, install Ollama:

curl -fsSL https://ollama.com/install.sh | sh

3. Serve DeepSeek R1 via API

Download and serve the model:

ollama pull deepseek-r1:70b
ollama serve

4. Interact via API

Once the model is running, you can interact with it using HTTP requests or a Python SDK.

Using Python SDK

First, install the ollama package:

pip install ollama

Then, interact with the model programmatically:

from ollama import Client

client = Client(host='http://YOUR_SERVER_IP:11434')

response = client.chat(
    model='deepseek-r1:70b',
    messages=[
        {'role': 'user', 'content': 'What do you think about ChatGPT?'}
    ]
)

Important Security Note:

Secure your server with a firewall or private network.
Disable root login.
Use SSH keys for authentication.

Wrapping Up

You now have DeepSeek R1 running locally and on a cloud GPU instance. You can: Chat with it via the terminal. Serve it over HTTP. Use a web UI for interaction. Access it programmatically via API.

For more details, visit the DeepSeek official website.

- 11

Was this article helpful?

YesNo

How to Run DeepSeek R1 Locally

Table of Contents

Run DeepSeek R1 Locally

1. Install Ollama

2. Download and Run the Model

Serving DeepSeek R1 Over HTTP

Chat via Web UI

Deploy to a GPU Instance

1. Launch a GPU Instance

2. Install Ollama on Cloud

3. Serve DeepSeek R1 via API

4. Interact via API

Using Python SDK

Wrapping Up

Llama 3.1 8B hardware requirements

What are some examples of popular foundation models?

Deep Learning Architectures for Sequence Processing in Python

Introduction to Ollama CLI

DeepSeek R1 Hardware Requirements for Small, Mid, and Large Models

Running DeepSeek Locally on Windows with WebUI

Table of Contents

Run DeepSeek R1 Locally

1. Install Ollama

2. Download and Run the Model

Serving DeepSeek R1 Over HTTP

Chat via Web UI

Deploy to a GPU Instance

1. Launch a GPU Instance

2. Install Ollama on Cloud

3. Serve DeepSeek R1 via API

4. Interact via API

Using Python SDK

Wrapping Up

Similar Posts