Setting Up Ollama with Open-WebUI: A Docker Compose Guide

In this blog post, we'll dive into setting up a powerful AI development environment using Docker Compose. The setup includes running the Ollama language model server and its corresponding web interface, Open-WebUI, both containerized for ease of use.

Ollama is an open-source platform designed for running artificial intelligence models locally, leveraging GPU acceleration to enhance performance. It serves as a framework that allows users to deploy and manage AI models efficiently, particularly those requiring significant computational resources.

The Open-WebUI (Graphical User Interface) component of Ollama provides a user-friendly interface accessible via a web browser. This WebUI enables interaction with the AI models hosted by Ollama without the need for technical expertise or direct command-line access. Despite its name, the Open-WebUI may still require GPU support due to real-time interactions with computationally intensive tasks, though this is less common and might vary based on specific use cases.

Open WebUI

Open WebUI is an extensible, self-hosted interface for AI that adapts to your workflow, all while operating entirely offline; Supported LLM runners include Ollama and OpenAI-compatible APIs.

he provided Docker Compose file sets up two services:

ollama: An instance of the Ollama language model server.
open-WebUI: A web-based interface that interacts with the Ollama server.

This setup allows you to run AI models locally and access them through a browser, all while leveraging Docker's containerization for easy management.

Let's Break Down the Compose File

1. The `ollama` Service

services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    ports:
       - "11434:11434"
    volumes:
       - ollama_data:/root/.ollama
    deploy:
      resources:
        reservations:
          devices: 
             - driver: nvidia
              count: 2
              capabilities: [gpu]
    environment:
       - OLLAMA_HOST=0.0.0.0:11434
    restart: unless-stopped

Key Configurations

Image: Uses the official ollama/ollama image in its latest version.
Container Name: The container will be named ollama.
Port Mapping: Maps port 11434 on the host to port 11434 in the container. This is necessary because Ollama serves its API on this port.
Volumes: Persists data using a Docker volume named ollama_data, mounted at /root/.ollama. This ensures that any downloaded models or configurations are retained even if the container stops or restarts.
GPU Resources: The deploy.resources.reservations.devices section configures NVIDIA GPU usage. Specifically, it requests two GPUs with gpu capabilities. This is essential for running GPU-accelerated AI models.
Environment Variable: Sets OLLAMA_HOST=0.0.0.0:11434, which tells Ollama to listen on all interfaces (i.e., not just localhost) and the specified port.
Restart Policy: The container will restart unless explicitly stopped (restart: unless-stopped).

2. The `open-WebUI` Service

services:
  open-webui:
    image: ghcr.io/open-WebUI/open-WebUI:main
    container_name: open-WebUI
    ports:
       - "3000:8080"
    volumes:
       - openwebui_Data:/app/Backend/Data
    environment:
       - OLLAMA_BASE_URL=http://ollama:11434
    extra_hosts:
       - "host.docker.internal:host-gateway"
    restart: always

Key Configurations

Image: Uses the ghcr.io/open-WebUI/open-WebUI image from GitHub Container Registry.
Container Name: The container will be named open-WebUI.
Port Mapping: Maps port 3000 on the host to port 8080 in the container. This is where the web interface will be accessible.
Volumes: Persists data using a Docker volume named openwebui_Data, mounted at /app/Backend/Data. This allows for persistent storage of any user configurations or generated content.
Environment Variable: Sets OLLAMA_BASE_URL=http://ollama:11434, which configures the web interface to communicate with the Ollama server running in its own container.
Extra Hosts: Adds an entry to /etc/hosts inside the container, mapping host.docker.internal to the Docker host's IP address. This is useful for connecting to services on the host machine from within a container.
Restart Policy: The container will always restart upon failure (restart: always).

3. Volumes

volumes:
  ollama_data:
  openwebui_data:

This section defines two named volumes:

ollama_data: Stores data for the Ollama service.
openwebui_data: Stores data for the Open-WebUI service.

Named volumes are recommended as they provide better control over data persistence and separation compared to bind mounts.

How It All Works

Ollama Service:
- Starts with GPU support, allowing it to run compute-intensive AI models.
- Listens on port 11434 for incoming API requests.
- Data is persisted in the ollama_data volume.
Open-WebUI Service:
- Provides a web interface accessible at http://localhost:3000.
- Communicates with the Ollama server via the specified base URL (http://ollama:11434).
- Data is persisted in the openwebui_data volume.

Getting Started

To use this setup:

Install Docker: Ensure Docker and Docker Compose are installed on your system.
NVIDIA Drivers: Make sure you have NVIDIA drivers and CUDA installed for GPU support.
Docker Permissions: Grant Docker permission to access your GPUs.
Run the Setup:
- Save the provided compose file as docker-compose.yml.
- Run docker compose up to start both services.
Access the Web Interface:
- Open a browser and navigate to http://localhost:3000.

Key Considerations

GPU Usage: The current configuration requests two GPUs. Adjust this based on your system's capabilities.
Ports: Ensure that ports 11434 and 3000 are not being used by other services.
Volumes: Named volumes can be managed using Docker commands, allowing you to back up or restore data as needed.

This setup provides a robust environment for working with AI models locally. By leveraging Docker Compose, you can manage both the Ollama server and its web interface efficiently. Whether you're developing new AI applications or experimenting with existing ones, this configuration offers flexibility and scalability, especially when combined with GPU acceleration.

Let me know if you have any questions or need further clarification and I will explain why I setup this instance in a next blog post !