vibescoder
all tags

// Posts tagged: model-showdown

GLM Is the New Hotness, So Let's Test It On the Homelab

·14 min read

GLM is suddenly everywhere in developer conversations. Before we run the bakeoff, we need to answer two questions: what is GLM, and is it suitable for a single RTX 5090 homelab?

Model Showdown Round 7: Five Local Models vs. One Cloud Model on a Real Coding Task

·13 min read

I gave five local LLMs and one frontier cloud model the same coding task on my homelab: build a tag manager for the blog's admin panel. Only two shipped anything. Here's what happened.

Frontier Bakeoff: We Benchmarked Fable 5 Hours Before the Shutdown

·8 min read

Four frontier models, ten tasks, one government shutdown. We ran Claude Fable 5 through the homelab benchmark harness three hours before Anthropic pulled the plug — and it came in second. Here's the full bakeoff.

Showdown Thoughts: The Three-Pass Pattern

·6 min read

The Round 5 bakeoff produced four implementations. None of them shipped. What shipped was a merge of the best pieces from all four, then a polish pass against real data. Bakeoff → Merge → Polish is a generalizable pattern for any feature where the design space is genuinely unclear.

Model Showdown Round 5: Four Agents Build the Same Feature

·19 min read

Four LLM models built the same admin feature in isolated Coder Agents sessions. I judged them blind. The headline result: Sonnet 4.6 beat Opus 4.6 on a coding task. The deeper story is what each model did with the same prompt — and what it took to make the bakeoff fair in the first place.