Why Run DeepSeek-R1 Locally?

Running DeepSeek-R1 locally provides several benefits:

  • Privacy & Security: Data stays on your machine.
  • Uninterrupted Access: No rate limits or service disruptions.
  • Performance: Faster responses with local inference.
  • Customization: Modify parameters and fine-tune prompts.
  • Cost Efficiency: Avoid API fees.
  • Offline Availability: Use without an internet connection after downloading the model.

Setting Up DeepSeek-R1 Locally with Ollama

Step 1: Install Ollama

Download and install Ollama from the official website: Ollama

Step 2: Download and Run DeepSeek-R1

Open a terminal and run the following command:

ollama run deepseek-r1

If your hardware cannot support the full 671B parameter model, you can run a smaller version:

ollama run deepseek-r1:Xb  # Replace X with 1.5b, 7b, 8b, 14b, etc.

Step 3: Running DeepSeek-R1 as a Background Server

To keep DeepSeek-R1 running and accessible via API:

ollama serve

Using DeepSeek-R1 Locally

Running Inference via CLI

Once downloaded, interact with DeepSeek-R1 directly:

ollama run deepseek-r1

Accessing DeepSeek-R1 via API

Use curl to send API requests:

curl http://localhost:11434/api/chat -d '{
  "model": "deepseek-r1",
  "messages": [{ "role": "user", "content": "Solve: 25 * 25" }],
  "stream": false
}'

Accessing DeepSeek-R1 via Python

Install Ollama Python package:

pip install ollama

Use Python to interact with DeepSeek-R1:

import ollama
response = ollama.chat(
    model="deepseek-r1",
    messages=[{"role": "user", "content": "Explain Newton's second law of motion"}]
)
print(response["message"]["content"])

Building a Local RAG Application with DeepSeek-R1

Prerequisites

Install necessary dependencies:

pip install langchain chromadb gradio
pip install -U langchain-community

Processing an Uploaded PDF

import gradio as gr
from langchain_community.document_loaders import PyMuPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain_community.embeddings import OllamaEmbeddings
import ollama

def process_pdf(pdf_bytes):
    if pdf_bytes is None:
        return None, None, None

    loader = PyMuPDFLoader(pdf_bytes)
    data = loader.load()

    text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
    chunks = text_splitter.split_documents(data)

    embeddings = OllamaEmbeddings(model="deepseek-r1")
    vectorstore = Chroma.from_documents(
        documents=chunks, embedding=embeddings, persist_directory="./chroma_db"
    )
    retriever = vectorstore.as_retriever()
    return text_splitter, vectorstore, retriever

Combining Retrieved Document Chunks

def combine_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

Querying DeepSeek-R1 Using Ollama

import re

def ollama_llm(question, context):
    formatted_prompt = f"Question: {question}\n\nContext: {context}"

    response = ollama.chat(
        model="deepseek-r1",
        messages=[{"role": "user", "content": formatted_prompt}]
    )

    response_content = response["message"]["content"]
    final_answer = re.sub(r"<think>.*?</think>", "", response_content, flags=re.DOTALL).strip()
    return final_answer

The RAG Pipeline

def rag_chain(question, text_splitter, vectorstore, retriever):
    retrieved_docs = retriever.invoke(question)
    formatted_content = combine_docs(retrieved_docs)
    return ollama_llm(question, formatted_content)

Creating the Gradio Interface

def ask_question(pdf_bytes, question):
    text_splitter, vectorstore, retriever = process_pdf(pdf_bytes)
    if text_splitter is None:
        return None  # No PDF uploaded
    result = rag_chain(question, text_splitter, vectorstore, retriever)
    return {result}

interface = gr.Interface(
    fn=ask_question,
    inputs=[
        gr.File(label="Upload PDF (optional)"),
        gr.Textbox(label="Ask a question")
    ],
    outputs="text",
    title="Ask questions about your PDF",
    description="Use DeepSeek-R1 to answer your questions about the uploaded PDF document."
)

interface.launch()

Conclusion

This tutorial covers:

  • Installing and running DeepSeek-R1 locally with Ollama.
  • Using the CLI, API, and Python to interact with the model.
  • Building a RAG-based application using LangChain, ChromaDB, and Gradio.

Was this article helpful?
YesNo

Similar Posts