// Posts tagged: model-showdown

GLM Is the New Hotness, So Let's Test It On the Homelab

2026.06.30·14 min read

GLM is suddenly everywhere in developer conversations. Before we run the bakeoff, we need to answer two questions: what is GLM, and is it suitable for a single RTX 5090 homelab?

model-showdown benchmark ai llm homelab building-in-public

Model Showdown Round 7: Five Local Models vs. One Cloud Model on a Real Coding Task

2026.06.17·13 min read

I gave five local LLMs and one frontier cloud model the same coding task on my homelab: build a tag manager for the blog's admin panel. Only two shipped anything. Here's what happened.

model-showdown benchmark ai llm homelab building-in-public coder

Frontier Bakeoff: We Benchmarked Fable 5 Hours Before the Shutdown

2026.06.13·8 min read

Four frontier models, ten tasks, one government shutdown. We ran Claude Fable 5 through the homelab benchmark harness three hours before Anthropic pulled the plug — and it came in second. Here's the full bakeoff.

model-showdown benchmark ai llm building-in-public

Showdown Thoughts: The Three-Pass Pattern

2026.05.19·6 min read

The Round 5 bakeoff produced four implementations. None of them shipped. What shipped was a merge of the best pieces from all four, then a polish pass against real data. Bakeoff → Merge → Polish is a generalizable pattern for any feature where the design space is genuinely unclear.

agents vibe-coding model-showdown building-in-public

Model Showdown Round 5: Four Agents Build the Same Feature

2026.05.17·19 min read

Four LLM models built the same admin feature in isolated Coder Agents sessions. I judged them blind. The headline result: Sonnet 4.6 beat Opus 4.6 on a coding task. The deeper story is what each model did with the same prompt — and what it took to make the bakeoff fair in the first place.

model-showdown agents vibe-coding