How to Run Your Own Local AI with Ollama on Windows

Running AI models locally gives you full control over your data and eliminates dependency on cloud services. Ollama is a great service to serve and inference AI models locally, providing super fast speeds. In this guide, we'll set up Ollama with Open WebUI using Docker on Windows with WSL2.

Prerequisites¶

Before we begin, you'll need:

Windows 10/11 with WSL2 support
NVIDIA GPU (recommended for better performance)
Basic familiarity with command line

Hardware Requirements¶

When choosing which model to run, consider your hardware:

Model Size	VRAM Required	Recommended GPU
7B	8-16 GB	NVIDIA RTX 3060
14B	24 GB	NVIDIA RTX 3090/4090
32B	48+ GB	Multi-GPU setups

Note

Models can also run on CPU with sufficient RAM (16-32 GB for 7B models), but performance will be significantly slower.

Step 1: Install WSL2 and Ubuntu¶

First, install WSL2 and Ubuntu as a subsystem in Windows:

wsl --install -d Ubuntu

Start Ubuntu via terminal:

wsl -d ubuntu

Step 2: Update Ubuntu¶

Update your Ubuntu installation:

apt-get update
apt-get dist-upgrade

Step 3: Install Docker Engine¶

Install Docker Engine on Ubuntu:

sudo apt-get update
sudo apt-get install apt-transport-https ca-certificates curl gnupg lsb-release

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg

echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

sudo apt-get update
sudo apt-get install docker-ce docker-compose

# Verify installation
docker --version

Step 4: Install NVIDIA Container Toolkit¶

The NVIDIA Container Toolkit is essential for enabling GPU acceleration within Docker containers, allowing your AI models to run at peak performance.

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

# Configure NVIDIA Container Toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Step 5: Create Docker Compose File¶

Create a docker-compose.yml file (e.g., at C:\Users\<username>\ollama\docker-compose.yml):

services:
  ollama:
    volumes:
      - ./ollama/ollama:/root/.ollama
    container_name: ollama
    pull_policy: always
    tty: true
    restart: unless-stopped
    image: ollama/ollama:latest
    ports:
      - 7869:11434
    environment:
      - OLLAMA_KEEP_ALIVE=24h
    networks:
      - ollama-docker
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

  ollama-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: ollama-webui
    volumes:
      - ./ollama/ollama-webui:/app/backend/data
      - ./ollama/ollama-webui-docs:/app/backend/data/docs
    depends_on:
      - ollama
    ports:
      - 8080:8080
    environment:
      - OLLAMA_BASE_URLS=http://host.docker.internal:7869
      - ENV=dev
      - WEBUI_AUTH=False
      - WEBUI_NAME=Local AI
      - WEBUI_URL=http://localhost:8080
      - WEBUI_SECRET_KEY=your-secret-key
    extra_hosts:
      - host.docker.internal:host-gateway
    restart: unless-stopped
    networks:
      - ollama-docker

networks:
  ollama-docker:
    external: false

Step 6: Start the Services¶

Navigate to your compose file directory and start the services:

docker compose up -d

To stop the services:

docker compose down

Verify the containers are running:

docker ps

Step 7: Pull and Run Models¶

Once Ollama is running, you can pull and run models. Access the Ollama container:

docker exec -it ollama bash

Pull a model (e.g., Qwen 2.5 Coder):

ollama pull qwen2.5-coder:7b

Or pull other models:

ollama pull phi4
ollama pull qwen2.5-coder:14b
ollama pull qwen2.5:14b-instruct

Recommended Models

The Qwen 2.5 Coder series offers significant improvements in code generation, code reasoning, and code fixing. Check the BigCode Models Leaderboard for more options.

Semantic Kernel & Function Calling

If you plan to use Semantic Kernel with function calling or tool calling capabilities, you must use models with instruct support (e.g., qwen2.5:14b-instruct). Standard or coder models may not properly support the structured function/tool calling that Semantic Kernel requires for AI agents and plugins.

Accessing the Services¶

After setup, you can access:

Open WebUI: http://localhost:8080
Ollama API: http://localhost:7869

Monitoring GPU Usage¶

To check GPU usage, open a terminal and run:

nvidia-smi

For continuous monitoring:

# Refresh every 1 second
nvidia-smi -l 1

# On Linux, use watch for cleaner output
watch -n 1 nvidia-smi

Useful Docker Commands¶

View container logs:

docker logs open-webui

Stop and remove a container:

docker stop <container_id>
docker rm <container_id>

Remove a Docker image:

docker rmi <image_id>

Optional: Google Drive Integration¶

Open WebUI supports Google Drive integration for uploading documents as context to your chat. To enable:

Create a Google Cloud project with Google Picker API and Google Drive API enabled
Set GOOGLE_DRIVE_API_KEY and GOOGLE_DRIVE_CLIENT_ID environment variables
Enable in Admin Panel > Settings > Documents