Why Run DeepSeek-R1 Locally?
Running DeepSeek-R1 locally provides several benefits:
- Privacy & Security: Data stays on your machine.
- Uninterrupted Access: No rate limits or service disruptions.
- Performance: Faster responses with local inference.
- Customization: Modify parameters and fine-tune prompts.
- Cost Efficiency: Avoid API fees.
- Offline Availability: Use without an internet connection after downloading the model.
Setting Up DeepSeek-R1 Locally with Ollama
Step 1: Install Ollama
Download and install Ollama from the official website: Ollama
Step 2: Download and Run DeepSeek-R1
Open a terminal and run the following command:
ollama run deepseek-r1
If your hardware cannot support the full 671B parameter model, you can run a smaller version:
ollama run deepseek-r1:Xb # Replace X with 1.5b, 7b, 8b, 14b, etc.
Step 3: Running DeepSeek-R1 as a Background Server
To keep DeepSeek-R1 running and accessible via API:
ollama serve
Using DeepSeek-R1 Locally
Running Inference via CLI
Once downloaded, interact with DeepSeek-R1 directly:
ollama run deepseek-r1
Accessing DeepSeek-R1 via API
Use curl
to send API requests:
curl http://localhost:11434/api/chat -d '{
"model": "deepseek-r1",
"messages": [{ "role": "user", "content": "Solve: 25 * 25" }],
"stream": false
}'
Accessing DeepSeek-R1 via Python
Install Ollama Python package:
pip install ollama
Use Python to interact with DeepSeek-R1:
import ollama
response = ollama.chat(
model="deepseek-r1",
messages=[{"role": "user", "content": "Explain Newton's second law of motion"}]
)
print(response["message"]["content"])
Building a Local RAG Application with DeepSeek-R1
Prerequisites
Install necessary dependencies:
pip install langchain chromadb gradio
pip install -U langchain-community
Processing an Uploaded PDF
import gradio as gr
from langchain_community.document_loaders import PyMuPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain_community.embeddings import OllamaEmbeddings
import ollama
def process_pdf(pdf_bytes):
if pdf_bytes is None:
return None, None, None
loader = PyMuPDFLoader(pdf_bytes)
data = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
chunks = text_splitter.split_documents(data)
embeddings = OllamaEmbeddings(model="deepseek-r1")
vectorstore = Chroma.from_documents(
documents=chunks, embedding=embeddings, persist_directory="./chroma_db"
)
retriever = vectorstore.as_retriever()
return text_splitter, vectorstore, retriever
Combining Retrieved Document Chunks
def combine_docs(docs):
return "\n\n".join(doc.page_content for doc in docs)
Querying DeepSeek-R1 Using Ollama
import re
def ollama_llm(question, context):
formatted_prompt = f"Question: {question}\n\nContext: {context}"
response = ollama.chat(
model="deepseek-r1",
messages=[{"role": "user", "content": formatted_prompt}]
)
response_content = response["message"]["content"]
final_answer = re.sub(r"<think>.*?</think>", "", response_content, flags=re.DOTALL).strip()
return final_answer
The RAG Pipeline
def rag_chain(question, text_splitter, vectorstore, retriever):
retrieved_docs = retriever.invoke(question)
formatted_content = combine_docs(retrieved_docs)
return ollama_llm(question, formatted_content)
Creating the Gradio Interface
def ask_question(pdf_bytes, question):
text_splitter, vectorstore, retriever = process_pdf(pdf_bytes)
if text_splitter is None:
return None # No PDF uploaded
result = rag_chain(question, text_splitter, vectorstore, retriever)
return {result}
interface = gr.Interface(
fn=ask_question,
inputs=[
gr.File(label="Upload PDF (optional)"),
gr.Textbox(label="Ask a question")
],
outputs="text",
title="Ask questions about your PDF",
description="Use DeepSeek-R1 to answer your questions about the uploaded PDF document."
)
interface.launch()
Conclusion
This tutorial covers:
- Installing and running DeepSeek-R1 locally with Ollama.
- Using the CLI, API, and Python to interact with the model.
- Building a RAG-based application using LangChain, ChromaDB, and Gradio.