How to Run Your Own Local AI with Ollama on Windows
Running AI models locally gives you full control over your data and eliminates dependency on cloud services. Ollama is a great service to serve and inference AI models locally, providing super fast speeds. In this guide, we'll set up Ollama with Open WebUI using Docker on Windows with WSL2.
Prerequisites¶
Before we begin, you'll need:
- Windows 10/11 with WSL2 support
- NVIDIA GPU (recommended for better performance)
- Basic familiarity with command line
Hardware Requirements¶
When choosing which model to run, consider your hardware:
| Model Size | VRAM Required | Recommended GPU |
|---|---|---|
| 7B | 8-16 GB | NVIDIA RTX 3060 |
| 14B | 24 GB | NVIDIA RTX 3090/4090 |
| 32B | 48+ GB | Multi-GPU setups |
Note
Models can also run on CPU with sufficient RAM (16-32 GB for 7B models), but performance will be significantly slower.
Step 1: Install WSL2 and Ubuntu¶
First, install WSL2 and Ubuntu as a subsystem in Windows:
wsl --install -d Ubuntu
Start Ubuntu via terminal:
wsl -d ubuntu
Step 2: Update Ubuntu¶
Update your Ubuntu installation:
apt-get update
apt-get dist-upgrade
Step 3: Install Docker Engine¶
Install Docker Engine on Ubuntu:
sudo apt-get update
sudo apt-get install apt-transport-https ca-certificates curl gnupg lsb-release
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install docker-ce docker-compose
# Verify installation
docker --version
Step 4: Install NVIDIA Container Toolkit¶
The NVIDIA Container Toolkit is essential for enabling GPU acceleration within Docker containers, allowing your AI models to run at peak performance.
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
# Configure NVIDIA Container Toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
Step 5: Create Docker Compose File¶
Create a docker-compose.yml file (e.g., at C:\Users\<username>\ollama\docker-compose.yml):
services:
ollama:
volumes:
- ./ollama/ollama:/root/.ollama
container_name: ollama
pull_policy: always
tty: true
restart: unless-stopped
image: ollama/ollama:latest
ports:
- 7869:11434
environment:
- OLLAMA_KEEP_ALIVE=24h
networks:
- ollama-docker
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
ollama-webui:
image: ghcr.io/open-webui/open-webui:main
container_name: ollama-webui
volumes:
- ./ollama/ollama-webui:/app/backend/data
- ./ollama/ollama-webui-docs:/app/backend/data/docs
depends_on:
- ollama
ports:
- 8080:8080
environment:
- OLLAMA_BASE_URLS=http://host.docker.internal:7869
- ENV=dev
- WEBUI_AUTH=False
- WEBUI_NAME=Local AI
- WEBUI_URL=http://localhost:8080
- WEBUI_SECRET_KEY=your-secret-key
extra_hosts:
- host.docker.internal:host-gateway
restart: unless-stopped
networks:
- ollama-docker
networks:
ollama-docker:
external: false
Step 6: Start the Services¶
Navigate to your compose file directory and start the services:
docker compose up -d
To stop the services:
docker compose down
Verify the containers are running:
docker ps
Step 7: Pull and Run Models¶
Once Ollama is running, you can pull and run models. Access the Ollama container:

docker exec -it ollama bash
Pull a model (e.g., Qwen 2.5 Coder):
ollama pull qwen2.5-coder:7b
Or pull other models:
ollama pull phi4
ollama pull qwen2.5-coder:14b
ollama pull qwen2.5:14b-instruct
Recommended Models
The Qwen 2.5 Coder series offers significant improvements in code generation, code reasoning, and code fixing. Check the BigCode Models Leaderboard for more options.
Semantic Kernel & Function Calling
If you plan to use Semantic Kernel with function calling or tool calling capabilities, you must use models with instruct support (e.g., qwen2.5:14b-instruct). Standard or coder models may not properly support the structured function/tool calling that Semantic Kernel requires for AI agents and plugins.
Accessing the Services¶
After setup, you can access:
- Open WebUI: http://localhost:8080
- Ollama API: http://localhost:7869
Monitoring GPU Usage¶
To check GPU usage, open a terminal and run:
nvidia-smi
For continuous monitoring:
# Refresh every 1 second
nvidia-smi -l 1
# On Linux, use watch for cleaner output
watch -n 1 nvidia-smi
Useful Docker Commands¶
View container logs:
docker logs open-webui
Stop and remove a container:
docker stop <container_id>
docker rm <container_id>
Remove a Docker image:
docker rmi <image_id>
Optional: Google Drive Integration¶
Open WebUI supports Google Drive integration for uploading documents as context to your chat. To enable:
- Create a Google Cloud project with Google Picker API and Google Drive API enabled
- Set
GOOGLE_DRIVE_API_KEYandGOOGLE_DRIVE_CLIENT_IDenvironment variables - Enable in Admin Panel > Settings > Documents