I wanted to try pointing Claude CLI to a local LLM and see how well it would work. My machine is running linux EndevourOS has the following specs:

Monitor     3840x2160 in 27", 60 Hz [External]
Monitor     3840x2160 in 32", 60 Hz [External]
CPU         12th Gen Intel(R) Core(TM) i9-12900KF @5.20 GHz 52.0°C
GPU         AMD Radeon RX 9070 - 48.0°C [Discrete]
G Driver    amdgpu
Vulkan      1.4.335 - radv [Mesa 26.0.5-arch1.1]
Motherboard PRIME Z690M-PLUS D4 (Rev 1.xx)
Bios        3811 (38.11)
RAM         ●●●● 23.94 GiB / 62.60 GiB (38%)

I use llama-cpp to run the model locally. During this experiment I kept hitting the context limit. I bumped it pretty high but then again, I have plenty of memory.

llama-server -hf unsloth/Qwen3.6-35B-A3B-GGUF:UD-IQ2_M \
  --temp 0.6 \
  --top-p 0.95 \
  --top-k 20 \
  --min-p 0.00 \
  --port 11434 \
  -c 256000

I had to edit ~/.claude/settings.json

{
  "promptSuggestionEnabled": false,
  "env": {
    "CLAUDE_CODE_ENABLE_TELEMETRY": "0",
    "CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1",
    "CLAUDE_CODE_ATTRIBUTION_HEADER": "0"
  },
  "attribution": {
    "commit": "",
    "pr": "

Run claude in an empty directory

export ANTHROPIC_API_KEY=sk-no-key-required
export ANTHROPIC_BASE_URL=http://localhost:11434
claude

Executed /init followed by /plan and asked

| create a missile command game using rust

After it created a plan it took roughly 8 hours of back and forth going to the next task or fixing some issue.

Watch the video

Gitlab.com/geoffcorey/missile-command