How to Install Ollama on Linux (Ubuntu, Debian, Arch)

Tested on: Ubuntu 24.04 LTS, Ubuntu 22.04 LTS, Debian 12 — Last updated: June 2026
This guide shows you how to install Ollama on Linux — including Ubuntu 24.04, Debian 12, Arch Linux and Docker. Ollama handles model downloads, GPU acceleration and exposes a local API, all with a single command. No cloud required, no data leaves your machine.
Prerequisites
- Ubuntu 24.04 LTS / 22.04 LTS, or Debian 12 (Bookworm)
- At least 8 GB RAM (16 GB recommended for 7B models)
- Optional: NVIDIA GPU with CUDA drivers installed, or AMD GPU with ROCm
- curl installed (
sudo apt install curl)
Step 1 — Install Ollama
Ollama provides an official install script that detects your OS and GPU automatically:
curl -fsSL https://ollama.com/install.sh | shThe script installs Ollama as a systemd service and starts it automatically. To verify the installation:
ollama --versionYou should see something like ollama version 0.x.x. The service starts automatically at boot.
Step 2 — Download and Run Your First Model
Pull a model from the Ollama library. Start with Llama 3.2 3B if you have limited RAM, or Llama 3.1 8B if you have 16 GB or more:
# Lightweight — works on 8 GB RAM
ollama pull llama3.2:3b
# Recommended — needs 16 GB RAM
ollama pull llama3.1:8bOnce downloaded, start an interactive chat session:
ollama run llama3.2:3bType your prompt and press Enter. Use /bye to exit the session.
Step 3 — Check GPU Acceleration
If you have a compatible GPU, Ollama uses it automatically. Verify with:
ollama psThis shows running models and whether they are loaded on GPU or CPU. For NVIDIA cards, you can also confirm with:
nvidia-smi | grep ollamaIf no GPU is detected and you have an NVIDIA card, check that the CUDA drivers are installed correctly. Ollama will fall back to CPU automatically if no GPU is available — models just run slower.
Step 4 — Use the Ollama API
Ollama runs a local REST API on http://localhost:11434. You can query it directly from the terminal or from any application:
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2:3b",
"prompt": "Explain what a kernel is in one sentence.",
"stream": false
}'The API is OpenAI-compatible, which means any tool that supports OpenAI's API (Open WebUI, Continue.dev, shell scripts) works with Ollama out of the box by pointing it to http://localhost:11434.
Step 5 — Manage the Ollama Service
Ollama runs as a systemd service. Standard service commands apply:
# Check service status
sudo systemctl status ollama
# Stop Ollama
sudo systemctl stop ollama
# Restart after config changes
sudo systemctl restart ollama
# View logs
journalctl -u ollama -fInstall Ollama on Arch Linux
On Arch Linux and Arch-based distributions (Manjaro, EndeavourOS), Ollama is available in the AUR:
# Using yay
yay -S ollama
# Or using paru
paru -S ollamaAfter installation, enable and start the service:
sudo systemctl enable --now ollamaInstall Ollama with Docker
If you prefer containers or want to isolate Ollama from your system, Docker is the cleanest option:
# CPU only
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
# With NVIDIA GPU
docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollamaModels are stored in a named volume (ollama), so they persist across container restarts. Pull and run models the same way — just exec into the container or use the API on port 11434.
Useful Ollama Commands
# List downloaded models
ollama list
# Remove a model
ollama rm llama3.2:3b
# Pull a specific version
ollama pull mistral:7b
# Show model details
ollama show llama3.1:8b
# Copy a model under a new name
ollama cp llama3.1:8b my-custom-modelTroubleshooting
Error: could not connect to ollama app, is it running?
The Ollama service is not running. Start it:
sudo systemctl start ollamaError: model not found
Pull the model first before running it:
ollama pull llama3.2:3b
ollama run llama3.2:3bGPU not detected after installation
For NVIDIA cards, install or reinstall the CUDA drivers and reboot. For Ollama installed via the script, CUDA libraries are bundled — no separate CUDA toolkit installation is required. If GPU is still missing from ollama ps, check the service logs:
journalctl -u ollama --no-pager | grep -i "gpu\|cuda\|error"Model runs extremely slowly (CPU instead of GPU)
This happens when GPU memory (VRAM) is not enough to load the full model. Ollama falls back to CPU for the layers that don't fit. Solutions: use a smaller model (3B instead of 8B), or use a quantized version (ollama pull llama3.1:8b-instruct-q4_0).
Port 11434 already in use
Another process is using the port. Find it and stop it, or change Ollama's port by editing the systemd service file and adding Environment="OLLAMA_HOST=0.0.0.0:11435".
What to Do Next
Ollama running locally is useful on its own, but it becomes significantly more powerful with a web interface. The next step is installing Open WebUI — a full ChatGPT-like interface that connects to your local Ollama instance and runs entirely in your browser. No data leaves your machine.