How to Install Ollama on Linux (Ubuntu, Debian, Arch)

Install Ollama on Linux terminal

Tested on: Ubuntu 24.04 LTS, Ubuntu 22.04 LTS, Debian 12 — Last updated: June 2026

This guide shows you how to install Ollama on Linux — including Ubuntu 24.04, Debian 12, Arch Linux and Docker. Ollama handles model downloads, GPU acceleration and exposes a local API, all with a single command. No cloud required, no data leaves your machine.

Table
  1. Prerequisites
  2. Step 1 — Install Ollama
  3. Step 2 — Download and Run Your First Model
  4. Step 3 — Check GPU Acceleration
  5. Step 4 — Use the Ollama API
  6. Step 5 — Manage the Ollama Service
  7. Install Ollama on Arch Linux
  8. Install Ollama with Docker
  9. Useful Ollama Commands
  10. Troubleshooting
    1. Error: could not connect to ollama app, is it running?
    2. Error: model not found
    3. GPU not detected after installation
    4. Model runs extremely slowly (CPU instead of GPU)
    5. Port 11434 already in use
  11. What to Do Next

Prerequisites

  • Ubuntu 24.04 LTS / 22.04 LTS, or Debian 12 (Bookworm)
  • At least 8 GB RAM (16 GB recommended for 7B models)
  • Optional: NVIDIA GPU with CUDA drivers installed, or AMD GPU with ROCm
  • curl installed (sudo apt install curl)

Step 1 — Install Ollama

Ollama provides an official install script that detects your OS and GPU automatically:

curl -fsSL https://ollama.com/install.sh | sh

The script installs Ollama as a systemd service and starts it automatically. To verify the installation:

ollama --version

You should see something like ollama version 0.x.x. The service starts automatically at boot.

Step 2 — Download and Run Your First Model

Pull a model from the Ollama library. Start with Llama 3.2 3B if you have limited RAM, or Llama 3.1 8B if you have 16 GB or more:

# Lightweight — works on 8 GB RAM
ollama pull llama3.2:3b

# Recommended — needs 16 GB RAM
ollama pull llama3.1:8b

Once downloaded, start an interactive chat session:

ollama run llama3.2:3b

Type your prompt and press Enter. Use /bye to exit the session.

Step 3 — Check GPU Acceleration

If you have a compatible GPU, Ollama uses it automatically. Verify with:

ollama ps

This shows running models and whether they are loaded on GPU or CPU. For NVIDIA cards, you can also confirm with:

nvidia-smi | grep ollama

If no GPU is detected and you have an NVIDIA card, check that the CUDA drivers are installed correctly. Ollama will fall back to CPU automatically if no GPU is available — models just run slower.

Step 4 — Use the Ollama API

Ollama runs a local REST API on http://localhost:11434. You can query it directly from the terminal or from any application:

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2:3b",
  "prompt": "Explain what a kernel is in one sentence.",
  "stream": false
}'

The API is OpenAI-compatible, which means any tool that supports OpenAI's API (Open WebUI, Continue.dev, shell scripts) works with Ollama out of the box by pointing it to http://localhost:11434.

Step 5 — Manage the Ollama Service

Ollama runs as a systemd service. Standard service commands apply:

# Check service status
sudo systemctl status ollama

# Stop Ollama
sudo systemctl stop ollama

# Restart after config changes
sudo systemctl restart ollama

# View logs
journalctl -u ollama -f

Install Ollama on Arch Linux

On Arch Linux and Arch-based distributions (Manjaro, EndeavourOS), Ollama is available in the AUR:

# Using yay
yay -S ollama

# Or using paru
paru -S ollama

After installation, enable and start the service:

sudo systemctl enable --now ollama

Install Ollama with Docker

If you prefer containers or want to isolate Ollama from your system, Docker is the cleanest option:

# CPU only
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

# With NVIDIA GPU
docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Models are stored in a named volume (ollama), so they persist across container restarts. Pull and run models the same way — just exec into the container or use the API on port 11434.

Useful Ollama Commands

# List downloaded models
ollama list

# Remove a model
ollama rm llama3.2:3b

# Pull a specific version
ollama pull mistral:7b

# Show model details
ollama show llama3.1:8b

# Copy a model under a new name
ollama cp llama3.1:8b my-custom-model

Troubleshooting

Error: could not connect to ollama app, is it running?

The Ollama service is not running. Start it:

sudo systemctl start ollama

Error: model not found

Pull the model first before running it:

ollama pull llama3.2:3b
ollama run llama3.2:3b

GPU not detected after installation

For NVIDIA cards, install or reinstall the CUDA drivers and reboot. For Ollama installed via the script, CUDA libraries are bundled — no separate CUDA toolkit installation is required. If GPU is still missing from ollama ps, check the service logs:

journalctl -u ollama --no-pager | grep -i "gpu\|cuda\|error"

Model runs extremely slowly (CPU instead of GPU)

This happens when GPU memory (VRAM) is not enough to load the full model. Ollama falls back to CPU for the layers that don't fit. Solutions: use a smaller model (3B instead of 8B), or use a quantized version (ollama pull llama3.1:8b-instruct-q4_0).

Port 11434 already in use

Another process is using the port. Find it and stop it, or change Ollama's port by editing the systemd service file and adding Environment="OLLAMA_HOST=0.0.0.0:11435".

What to Do Next

Ollama running locally is useful on its own, but it becomes significantly more powerful with a web interface. The next step is installing Open WebUI — a full ChatGPT-like interface that connects to your local Ollama instance and runs entirely in your browser. No data leaves your machine.

Go up