At some point in the last few years I thought it was monumentally important that I build a gaming PC. I don't use it often as I'd hoped, but it turns out the 4090 is great for running local LLMs. I wanted to use it from my laptop when I'm traveling, because tokens are expensive and little projects are great. The solution turned out to be Cloudflare Tunnel - a secure way to expose Ollama without opening ports or dealing with VPNs.
This is how I set it up to work with opencode, the open source AI coding assistant that makes it easy to use various LLM providers, including your own.
The architecture
The setup looks like this:
[Laptop] → opencode → HTTPS → [Cloudflare Access] → [Cloudflare Tunnel] → [Windows PC:11434] → Ollama → 4090 GPUYour laptop talks to Cloudflare over HTTPS, Cloudflare authenticates the request, then forwards it through an encrypted tunnel to your Windows PC (or whatever machine you're running whatever service on). From opencode's perspective, it's just hitting an HTTPS endpoint. From your PC's perspective, it's just receiving local requests.
No port forwarding, no dynamic DNS, no exposing your home network to the internet. Easy as.
Why this is useful
Running LLMs locally is nice. You get full control over the model, no API costs, and reasonable speed if you have decent hardware. But "local" usually means you're sitting at that specific machine.
Cloudflare Tunnel lets you access your local Ollama instance from anywhere - at a coffee shop, your phone, or shared with others. All without the usual networking hassles.
Part 1: Windows PC setup
Let's start by getting Ollama running properly on the Windows machine, but these instructions should work for any other service you want to expose.
Installing Ollama
Download the Windows installer from ollama.com and run it. It'll install natively and automatically use CUDA if you have an NVIDIA GPU.
Choosing a model
For coding with tool calling support, Qwen3 works well. The 30B MoE version fits comfortably in 24GB VRAM:
ollama pull qwen3-coder:30bNOTE
A note on model choice: I initially tried Qwen 2.5 Coder but ran into issues with tool calling in Ollama. Qwen3 handles it much better. Also avoid the VL (vision-language) variants if you're tight on memory.
Allowing network access
By default, Ollama only listens on localhost. To accept connections from the tunnel, set an environment variable.
NOTE
Ollama does have a setting to expose itself to the network, so you can also try that. I probably should have, on reflection.
In PowerShell (as administrator):
[Environment]::SetEnvironmentVariable("OLLAMA_HOST", "0.0.0.0:11434", "Machine")Restart the Ollama service after setting this. You can do this byfinding the Ollama service in Task Manager or in your running applications.
Installing cloudflared
The tunnel software is called cloudflared. Install it with winget:
winget install --id Cloudflare.cloudflaredIf you're on a Mac, you can use brew:
brew install cloudflaredOr you download it from the Cloudflare downloads page.
Authenticating cloudflared
Run this to authenticate with your Cloudflare account:
cloudflared tunnel loginThis opens your browser. Select the domain you want to use for the tunnel (you'll need a domain in your Cloudflare account).
Creating the tunnel
Create a named tunnel:
cloudflared tunnel create ollama-tunnelNote the UUID that gets printed out. You'll need it in the config file.
Configuring the tunnel
Create a config file at C:\Users\<username>\.cloudflared\config.yml:
tunnel: <TUNNEL_UUID>
credentials-file: C:\Users\<username>\.cloudflared\<TUNNEL_UUID>.json
ingress:
- hostname: ollama.yourdomain.com
service: http://localhost:11434
- service: http_status:404Replace <TUNNEL_UUID> with the UUID from the previous step, and ollama.yourdomain.com with your actual domain.
The ingress rules tell cloudflared where to route requests. Anything hitting ollama.yourdomain.com gets forwarded to localhost:11434 (where Ollama is listening).
Adding DNS
This creates the DNS record that points your domain to the tunnel:
cloudflared tunnel route dns ollama-tunnel ollama.yourdomain.comRunning the tunnel
Test it first:
cloudflared tunnel run ollama-tunnelYou can also install it as a Windows service so it starts automatically.
There you go, your very own tunnel to the outside world.
Part 2: Cloudflare Access setup
Cloudflare Tunnel creates a secure connection between Cloudflare's edge and your PC, but by default anyone who knows the URL can use it. Cloudflare Access adds authentication.
Generating a service token
Since opencode uses API calls, we need a token that can authenticate without a browser.
Go to Access → Service Auth → Service Tokens, then click Create Service Token.
Name it something like opencode-laptop.
You'll get two values: CF-Access-Client-Id and CF-Access-Client-Secret. Copy both and keep them somewhere safe. You'll need them on your laptop.
These tokens provide full API access, so treat them like passwords. If they leak, rotate them immediately.
Creating an Access application
Go to your Cloudflare Dashboard → Zero Trust → Access controls → Applications, then click "Add an application" and choose "Self-hosted."
Application name: Ollama API
Session duration: 24 hours (or whatever you prefer)
Application domain: ollama.yourdomain.comCreating the right policy
Next we need to define who can access our tunnel.
When you create an access policy, there are different actions you can choose. For browser-based access, you'd use "Allow" and authenticate with email or SSO.
Policy name: Service Token Access
Action: Service Auth (not "Allow")
Include rule:
- Selector: Service Token
- Value: Select your service token (the one you made earlier)Without the Service Auth action, API requests get redirected to a browser login page and fail. This took me longer to figure out than I'd like to admit.
Part 3: Laptop setup
Now we configure opencode to use the remote Ollama instance.
Installing opencode
On macOS:
brew install anomalyco/tap/opencodeConfiguring opencode
Create or edit ~/.config/opencode/opencode.json:
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"ollama-remote": {
"npm": "@ai-sdk/openai-compatible",
"name": "Ollama (Remote)",
"options": {
"baseURL": "https://ollama.yourdomain.com/v1",
"headers": {
"CF-Access-Client-Id": "<YOUR_CLIENT_ID>",
"CF-Access-Client-Secret": "<YOUR_CLIENT_SECRET>"
}
},
"models": {
"qwen3-coder": {
"name": "Qwen 3 Coder 30B"
}
}
}
}
}Replace the placeholder values with your actual domain and service token credentials.
The baseURL uses /v1 because Ollama exposes an OpenAI-compatible API at that path. The headers include your Cloudflare Access credentials, which get sent with every request.
Testing it
First, test the API directly:
curl -X POST https://ollama.yourdomain.com/api/generate \
-H "CF-Access-Client-Id: <YOUR_CLIENT_ID>" \
-H "CF-Access-Client-Secret: <YOUR_CLIENT_SECRET>" \
-H "Content-Type: application/json" \
-d '{"model": "qwen3-coder", "prompt": "Say hello", "stream": false}'If you get a JSON response with generated text, it's working.
Then try opencode:
opencodeYou should see your remote model available as an option.
Troubleshooting
Here are the issues I ran into and how I fixed them.
Tool calls failing
If the model generates responses but tool calls don't work, check your context window. Tool calling needs at least 16K context. Make sure your Modelfile has PARAMETER num_ctx 16384 or higher.
Also double-check you're using Qwen3, not Qwen 2.5 Coder. The latter has known issues with tool calling in Ollama.
Connection refused
If you can't connect at all, verify the OLLAMA_HOST environment variable is set to 0.0.0.0:11434. Then restart the Ollama service.
Check the tunnel is running:
cloudflared tunnel info ollama-tunnel401/403 errors or redirects
If you're getting authentication errors, verify your service token headers are correct in the opencode config.
Also double-check that your Access policy uses "Service Auth" action, not "Allow." This was my mistake - I had created an email-based policy initially, which doesn't work for API requests.
Model crashes with memory errors
If you see "memory layout cannot be allocated" errors, reduce the context size in your Modelfile. Try 8192 instead of 16384.
You can also check VRAM usage with:
nvidia-smiIf you're maxing out, consider using a smaller model or a more aggressive quantization.
Slow responses
This is normal for large models. The 30B model takes a few seconds to start generating, especially on the first request after loading the model into VRAM.
I typically see around 70 tokens per second with the Q4 quantization on a 4090, which is fast enough for coding.
Security considerations
A few things to keep in mind.
Service tokens bypass browser authentication. If someone gets your tokens, they have full access to your Ollama instance. Don't commit them to git, don't share screenshots with them visible, and rotate them if you suspect they've leaked.
The tunnel credentials (the JSON file in .cloudflared) should also stay private. They're what allow cloudflared to connect to your specific tunnel.
If your laptop has a static IP, you can add IP restrictions in Cloudflare Access for an extra layer of security.
What's next
This setup has been surprisingly reliable. The tunnel can reconnect automatically if my home internet drops, and Cloudflare's edge handles the authentication before anything reaches my PC.
Latency is good - usually under 100ms - which is barely noticeable compared to the time the model takes to generate responses.
The same pattern works for other services too. I've used Cloudflare Tunnel to expose Jupyter notebooks, local web servers, and various development tools. It's become my default way of accessing home-hosted services remotely.
Now go put that gaming GPU to work.