// Posts tagged: llm

GLM Is the New Hotness, So Let's Test It On the Homelab

2026.06.30·14 min read

GLM is suddenly everywhere in developer conversations. Before we run the bakeoff, we need to answer two questions: what is GLM, and is it suitable for a single RTX 5090 homelab?

model-showdown benchmark ai llm homelab building-in-public

Model Showdown Round 7: Five Local Models vs. One Cloud Model on a Real Coding Task

2026.06.17·13 min read

I gave five local LLMs and one frontier cloud model the same coding task on my homelab: build a tag manager for the blog's admin panel. Only two shipped anything. Here's what happened.

model-showdown benchmark ai llm homelab building-in-public coder

Frontier Bakeoff: We Benchmarked Fable 5 Hours Before the Shutdown

2026.06.13·8 min read

Four frontier models, ten tasks, one government shutdown. We ran Claude Fable 5 through the homelab benchmark harness three hours before Anthropic pulled the plug — and it came in second. Here's the full bakeoff.

model-showdown benchmark ai llm building-in-public

Homelab Bakeoff: OpenClaw Outperforms Hermes… With Hermes Models

2026.06.11·15 min read

Two Discord bots, one 14B model, five fitness-tracker tasks. Both agents failed on the first try. Getting them working required debugging context overflow, silent tool parameter drops, and a chat template flag that changes everything. The results reveal as much about the state of local AI agents as they do about which framework won.

agents llm homelab building-in-public openclaw opinion

Friday Fixes: Housekeeping the Homelab and Hub

2026.06.05·11 min read

A model refresh on the homelab (Qwen 3.6, new embeddings, 469 llama.cpp builds), a feature sprint on the vacation planning site (calendar sync, expense tracking, and three bugs that taught us more than the features did), and automating Substack syndication after discovering two more undocumented quirks. Three unrelated workstreams, one theme: maintenance is where the real learning happens.

meta building-in-public agents llm next-js substack

Hermes Agent: First Contact

2026.06.02·7 min read

I've been running OpenClaw on the homelab for a month. A recommendation sent me down the Hermes Agent rabbit hole — and the research before the first real test revealed my daily driver model was broken for tool calling all along.

agents llm building-in-public meta

Thursday Thoughts: The Models We Can't Run

2026.05.14·7 min read

DeepSeek V4-Pro, V4-Flash, and Zyphra ZAYA1 are three of the most exciting new models in local AI. None of them run on our RTX 5090 homelab — for completely different reasons. Here's the research, the math, and what it means for anyone building a local inference rig.

agents ai llm homelab meta building-in-public

Model Showdown Round 4: Opus vs Qwen — Writers, Not Coders

2026.05.11·13 min read

Two AI models got the same prompt: review the blog fodder, check for redundancy, and draft a post. Opus chose a debugging war story. Qwen chose a data-driven redesign. Neither picked the same fodder. Here's what the difference reveals about how models think about content.

ai llm benchmark agents building-in-public

Model Showdown Round 3: Ditching Ollama in Favor of llama.cpp

2026.05.10·17 min read

We ripped out Ollama, migrated to llama.cpp, and benchmarked five local models across 12 tasks on an RTX 5090. The results surprised us — and the winner wasn't who we expected.

ai llm benchmark homelab

Slaying the Gemma Beast: How We Fixed Local AI and Shipped Search

2026.05.04·17 min read

Gemma 4 failed to build a single feature in our last test. This time we diagnosed the problem, switched from Ollama to llama.cpp, tuned the inference settings, and Gemma shipped a working search feature to production. Then Opus reviewed the code and made it better. Here's what we learned about making local models actually work.

ai llm benchmark homelab gemma agents

The Agentic Gap: Claude Oneshots, Gemma Fails

2026.04.29·12 min read

We pitted Gemma 4 against Opus 4.6 on a real feature build for vibescoder.dev. Gemma is the fastest model in our benchmark. It also couldn't finish the job. Here's what happened when we stopped testing toy apps and started building production code.

ai llm benchmark homelab gemma agents

Model Showdown Round 2: Adding Gemma, Kimi, and 579 GB of Stubborn Optimism

2026.04.26·15 min read

We added Google's Gemma 4 and Moonshot's 1-trillion-parameter Kimi K2 to the local model benchmark. Five out of six models scored perfect. Gemma 4 is the new speed king. And yes, we ran a 579 GB model off an NVMe drive — at 0.6 tokens per second.

ai llm benchmark homelab gemma

Model Showdown: Benchmarking Local vs Cloud LLMs on a Real Coding Task

2026.04.22·18 min read

We gave six LLM models the exact same coding prompt and measured everything: speed, tokens, and whether the code actually works. Three models scored perfect. Two built the wrong kind of app. One ran out of tokens mid-line.

ai llm benchmark homelab

Putting the GPU to Work: Running Local LLMs on a Home Lab

2026.04.22·12 min read

Installing Ollama, pulling five purpose-built models, wiring local inference into Coder Agents, and running agentic coding on an RTX 5090 workstation. 44 GB of models, zero cloud API calls, fully self-hosted.

ai homelab llm