GLM Is the New Hotness, So Let's Test It On the Homelab
GLM is suddenly everywhere in developer conversations. Before we run the bakeoff, we need to answer two questions: what is GLM, and is it suitable for a single RTX 5090 homelab?
GLM is suddenly everywhere in developer conversations. Before we run the bakeoff, we need to answer two questions: what is GLM, and is it suitable for a single RTX 5090 homelab?
A finance intern is spending her summer observing business processes and vibe coding automation tools. Not a CS major. Not shadowing someone. Building something real. It is a small example that says something big about how AI is reshaping internships, careers, and what the word "developer" actually means.
Vibe coding has moved from hobbyist curiosity to enterprise rollout across knowledge workers, and the next wave of AI adoption will be defined by governance and token economics.
A CEO panel at an AI event sparked a simple but powerful question every startup founder should ask themselves: does your business get better as AI models improve, or does it get worse?
I gave five local LLMs and one frontier cloud model the same coding task on my homelab: build a tag manager for the blog's admin panel. Only two shipped anything. Here's what happened.
Four frontier models, ten tasks, one government shutdown. We ran Claude Fable 5 through the homelab benchmark harness three hours before Anthropic pulled the plug — and it came in second. Here's the full bakeoff.
DeepSeek V4-Pro, V4-Flash, and Zyphra ZAYA1 are three of the most exciting new models in local AI. None of them run on our RTX 5090 homelab — for completely different reasons. Here's the research, the math, and what it means for anyone building a local inference rig.
Two AI models got the same prompt: review the blog fodder, check for redundancy, and draft a post. Opus chose a debugging war story. Qwen chose a data-driven redesign. Neither picked the same fodder. Here's what the difference reveals about how models think about content.
We ripped out Ollama, migrated to llama.cpp, and benchmarked five local models across 12 tasks on an RTX 5090. The results surprised us — and the winner wasn't who we expected.
Gemma 4 failed to build a single feature in our last test. This time we diagnosed the problem, switched from Ollama to llama.cpp, tuned the inference settings, and Gemma shipped a working search feature to production. Then Opus reviewed the code and made it better. Here's what we learned about making local models actually work.
We pitted Gemma 4 against Opus 4.6 on a real feature build for vibescoder.dev. Gemma is the fastest model in our benchmark. It also couldn't finish the job. Here's what happened when we stopped testing toy apps and started building production code.
We added Google's Gemma 4 and Moonshot's 1-trillion-parameter Kimi K2 to the local model benchmark. Five out of six models scored perfect. Gemma 4 is the new speed king. And yes, we ran a 579 GB model off an NVMe drive — at 0.6 tokens per second.
We gave six LLM models the exact same coding prompt and measured everything: speed, tokens, and whether the code actually works. Three models scored perfect. Two built the wrong kind of app. One ran out of tokens mid-line.