Running Ollama on your desktop GPU from anywhere with Cloudflare Tunnel ❯ ~/code.charliegleason.com

At some point in the last few years I thought it was monumentally important that I build a gaming PC. I don't use it often as I'd hoped, but it turns out the 4090 is great for running local LLMs. I wanted to use it from my laptop when I'm traveling, because tokens are expensive and little projects are great. The solution turned out to be Cloudflare Tunnel - a secure way to expose Ollama without opening ports or dealing with VPNs.

This is how I set it up to work with opencode, the open source AI coding assistant that makes it easy to use various LLM providers, including your own.

The architecture

The setup looks like this:

[Laptop] → opencode → HTTPS → [Cloudflare Access] → [Cloudflare Tunnel] → [Windows PC:11434] → Ollama → 4090 GPU

Your laptop talks to Cloudflare over HTTPS, Cloudflare authenticates the request, then forwards it through an encrypted tunnel to your Windows PC (or whatever machine you're running whatever service on). From opencode's perspective, it's just hitting an HTTPS endpoint. From your PC's perspective, it's just receiving local requests.

No port forwarding, no dynamic DNS, no exposing your home network to the internet. Easy as.

Why this is useful

Running LLMs locally is nice. You get full control over the model, no API costs, and reasonable speed if you have decent hardware. But "local" usually means you're sitting at that specific machine.

Cloudflare Tunnel lets you access your local Ollama instance from anywhere - at a coffee shop, your phone, or shared with others. All without the usual networking hassles.

Part 1: Windows PC setup

Let's start by getting Ollama running properly on the Windows machine, but these instructions should work for any other service you want to expose.

Installing Ollama

Download the Windows installer from ollama.com and run it. It'll install natively and automatically use CUDA if you have an NVIDIA GPU.

Choosing a model

For coding with tool calling support, Qwen3 works well. The 30B MoE version fits comfortably in 24GB VRAM:

ollama pull qwen3-coder:30b

NOTE

A note on model choice: I initially tried Qwen 2.5 Coder but ran into issues with tool calling in Ollama. Qwen3 handles it much better. Also avoid the VL (vision-language) variants if you're tight on memory.

Allowing network access

By default, Ollama only listens on localhost. To accept connections from the tunnel, set an environment variable.

NOTE

Ollama does have a setting to expose itself to the network, so you can also try that. I probably should have, on reflection.

In PowerShell (as administrator):

[Environment]::SetEnvironmentVariable("OLLAMA_HOST", "0.0.0.0:11434", "Machine")

Restart the Ollama service after setting this. You can do this byfinding the Ollama service in Task Manager or in your running applications.

Installing cloudflared

The tunnel software is called cloudflared. Install it with winget:

winget install --id Cloudflare.cloudflared

If you're on a Mac, you can use brew:

brew install cloudflared

Or you download it from the Cloudflare downloads page.

Authenticating cloudflared

Run this to authenticate with your Cloudflare account:

cloudflared tunnel login

This opens your browser. Select the domain you want to use for the tunnel (you'll need a domain in your Cloudflare account).

Creating the tunnel

Create a named tunnel:

cloudflared tunnel create ollama-tunnel

Note the UUID that gets printed out. You'll need it in the config file.

Configuring the tunnel

Create a config file at C:\Users\<username>\.cloudflared\config.yml:

tunnel: <TUNNEL_UUID>
credentials-file: C:\Users\<username>\.cloudflared\<TUNNEL_UUID>.json

ingress:
  - hostname: ollama.yourdomain.com
    service: http://localhost:11434
  - service: http_status:404

Replace <TUNNEL_UUID> with the UUID from the previous step, and ollama.yourdomain.com with your actual domain.

The ingress rules tell cloudflared where to route requests. Anything hitting ollama.yourdomain.com gets forwarded to localhost:11434 (where Ollama is listening).

Adding DNS

This creates the DNS record that points your domain to the tunnel:

cloudflared tunnel route dns ollama-tunnel ollama.yourdomain.com

Running the tunnel

Test it first:

cloudflared tunnel run ollama-tunnel

You can also install it as a Windows service so it starts automatically.

There you go, your very own tunnel to the outside world.

Part 2: Cloudflare Access setup

Cloudflare Tunnel creates a secure connection between Cloudflare's edge and your PC, but by default anyone who knows the URL can use it. Cloudflare Access adds authentication.

Generating a service token

Since opencode uses API calls, we need a token that can authenticate without a browser.

Go to Access → Service Auth → Service Tokens, then click Create Service Token.

Name it something like opencode-laptop.

You'll get two values: CF-Access-Client-Id and CF-Access-Client-Secret. Copy both and keep them somewhere safe. You'll need them on your laptop.

These tokens provide full API access, so treat them like passwords. If they leak, rotate them immediately.

Creating an Access application

Go to your Cloudflare Dashboard → Zero Trust → Access controls → Applications, then click "Add an application" and choose "Self-hosted."

Application name:    Ollama API
Session duration:    24 hours (or whatever you prefer)
Application domain:  ollama.yourdomain.com

Creating the right policy

Next we need to define who can access our tunnel.

When you create an access policy, there are different actions you can choose. For browser-based access, you'd use "Allow" and authenticate with email or SSO.

Policy name:         Service Token Access
Action:              Service Auth (not "Allow")
Include rule:
 - Selector:         Service Token
 - Value:            Select your service token (the one you made earlier)

Without the Service Auth action, API requests get redirected to a browser login page and fail. This took me longer to figure out than I'd like to admit.

Part 3: Laptop setup

Now we configure opencode to use the remote Ollama instance.

Installing opencode

On macOS:

brew install anomalyco/tap/opencode

Configuring opencode

Create or edit ~/.config/opencode/opencode.json:

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "ollama-remote": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "Ollama (Remote)",
      "options": {
        "baseURL": "https://ollama.yourdomain.com/v1",
        "headers": {
          "CF-Access-Client-Id": "<YOUR_CLIENT_ID>",
          "CF-Access-Client-Secret": "<YOUR_CLIENT_SECRET>"
        }
      },
      "models": {
        "qwen3-coder": {
          "name": "Qwen 3 Coder 30B"
        }
      }
    }
  }
}

Replace the placeholder values with your actual domain and service token credentials.

The baseURL uses /v1 because Ollama exposes an OpenAI-compatible API at that path. The headers include your Cloudflare Access credentials, which get sent with every request.

Testing it

First, test the API directly:

curl -X POST https://ollama.yourdomain.com/api/generate \
  -H "CF-Access-Client-Id: <YOUR_CLIENT_ID>" \
  -H "CF-Access-Client-Secret: <YOUR_CLIENT_SECRET>" \
  -H "Content-Type: application/json" \
  -d '{"model": "qwen3-coder", "prompt": "Say hello", "stream": false}'

If you get a JSON response with generated text, it's working.

Then try opencode:

opencode

You should see your remote model available as an option.

Troubleshooting

Here are the issues I ran into and how I fixed them.

Tool calls failing

If the model generates responses but tool calls don't work, check your context window. Tool calling needs at least 16K context. Make sure your Modelfile has PARAMETER num_ctx 16384 or higher.

Also double-check you're using Qwen3, not Qwen 2.5 Coder. The latter has known issues with tool calling in Ollama.

Connection refused

If you can't connect at all, verify the OLLAMA_HOST environment variable is set to 0.0.0.0:11434. Then restart the Ollama service.

Check the tunnel is running:

cloudflared tunnel info ollama-tunnel

401/403 errors or redirects

If you're getting authentication errors, verify your service token headers are correct in the opencode config.

Also double-check that your Access policy uses "Service Auth" action, not "Allow." This was my mistake - I had created an email-based policy initially, which doesn't work for API requests.

Model crashes with memory errors

If you see "memory layout cannot be allocated" errors, reduce the context size in your Modelfile. Try 8192 instead of 16384.

You can also check VRAM usage with:

nvidia-smi

If you're maxing out, consider using a smaller model or a more aggressive quantization.

Slow responses

This is normal for large models. The 30B model takes a few seconds to start generating, especially on the first request after loading the model into VRAM.

I typically see around 70 tokens per second with the Q4 quantization on a 4090, which is fast enough for coding.

Security considerations

A few things to keep in mind.

Service tokens bypass browser authentication. If someone gets your tokens, they have full access to your Ollama instance. Don't commit them to git, don't share screenshots with them visible, and rotate them if you suspect they've leaked.

The tunnel credentials (the JSON file in .cloudflared) should also stay private. They're what allow cloudflared to connect to your specific tunnel.

If your laptop has a static IP, you can add IP restrictions in Cloudflare Access for an extra layer of security.

What's next

This setup has been surprisingly reliable. The tunnel can reconnect automatically if my home internet drops, and Cloudflare's edge handles the authentication before anything reaches my PC.

Latency is good - usually under 100ms - which is barely noticeable compared to the time the model takes to generate responses.

The same pattern works for other services too. I've used Cloudflare Tunnel to expose Jupyter notebooks, local web servers, and various development tools. It's become my default way of accessing home-hosted services remotely.

Now go put that gaming GPU to work.