I wanted to try pointing Claude CLI to a local LLM and see how well it would work. My machine is running linux EndevourOS has the following specs:
Monitor 3840x2160 in 27", 60 Hz [External]
Monitor 3840x2160 in 32", 60 Hz [External]
CPU 12th Gen Intel(R) Core(TM) i9-12900KF @5.20 GHz 52.0°C
GPU AMD Radeon RX 9070 - 48.0°C [Discrete]
G Driver amdgpu
Vulkan 1.4.335 - radv [Mesa 26.0.5-arch1.1]
Motherboard PRIME Z690M-PLUS D4 (Rev 1.xx)
Bios 3811 (38.11)
RAM ●●●● 23.94 GiB / 62.60 GiB (38%)
I use llama-cpp to run the model locally. During this experiment I kept hitting the context limit. I bumped it pretty high but then again, I have plenty of memory.
llama-server -hf unsloth/Qwen3.6-35B-A3B-GGUF:UD-IQ2_M \
--temp 0.6 \
--top-p 0.95 \
--top-k 20 \
--min-p 0.00 \
--port 11434 \
-c 256000
I had to edit ~/.claude/settings.json
{
"promptSuggestionEnabled": false,
"env": {
"CLAUDE_CODE_ENABLE_TELEMETRY": "0",
"CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1",
"CLAUDE_CODE_ATTRIBUTION_HEADER": "0"
},
"attribution": {
"commit": "",
"pr": "
Run claude in an empty directory
export ANTHROPIC_API_KEY=sk-no-key-required
export ANTHROPIC_BASE_URL=http://localhost:11434
claude
Executed /init followed by /plan and asked
| create a missile command game using rust
After it created a plan it took roughly 8 hours of back and forth going to the next task or fixing some issue.