Friday Fixes: The Agent Was Flying Blind

Last Friday's post covered nine small improvements — CSS fixes, social cards, Slack integrations. This week's fixes are different. These aren't cosmetic. I discovered that the AI agent powering this entire blog had been silently broken since day one, compensating on its own without telling me. The startup script, the MCP config, the skill files — none of it was being delivered to workspaces. Every session started from scratch.

Here's how I found it, fixed it, and then kept pulling the thread until the whole developer experience was rebuilt.

1. The CRLF Bug That Broke Everything

The problem: I asked the agent to publish a blog post. Simple task — flip published: false to true, push, done. Instead, the agent spent several minutes exploring two repos, trying to figure out where posts live, how deploys work, and which repo to push to. It had zero context.

Root cause: The workspace template's startup script had Windows-style CRLF line endings (\r\n). Bash choked on line 1:

/bin/bash: line 1: set: -\r: invalid option

That meant .mcp.json and .agents/skills/vibescoder-blog/SKILL.md were never created in any workspace. Every agent session started completely blind. The agent compensated by installing tools itself and exploring repos manually — per the system instructions — so nothing visibly broke. But the efficiency gains I'd described in earlier posts? Aspirational, not operational.

The fix: Rewrote the startup script with LF line endings. But that was just the beginning.

The uncomfortable part: Two earlier published posts — "From Idea to Infrastructure" and "Downtime Is a Feature" — describe the startup script toolchain and MCP setup as working. They accurately describe what was configured, but the CRLF bug was already present. None of it was actually delivered until this fix. Worth noting for honesty's sake.

2. Teaching the Agent to Remember

The problem: Even after fixing the CRLF bug, the agent still needed to be told everything about the blog's architecture every session. Two repos, a deploy pipeline, frontmatter schema, security rules, writing conventions — all living in my head instead of the workspace.

The fix: Created the vibescoder-blog agent skill — a 4.6 KB markdown file at .agents/skills/vibescoder-blog/SKILL.md that documents everything the agent needs:

Both repos and their roles (engine vs. content)
The deploy pipeline (push content → GitHub Action → Vercel deploy hook)
Step-by-step publishing instructions
Post frontmatter schema
Content repo directory layout
Blog fodder format conventions
Writing style guidelines
Security redaction rules

The key line: "You do NOT need to touch the engine repo to publish content. Just push to the content repo."

With a user instruction — When I mention the blog, vibescoder, or content work, read the skill "vibescoder-blog" — the agent lazy-loads this context on first reference. A 30-second publishing task is now actually a 30-second task.

3. Templates in Git

The problem: The Coder workspace template was edited through the web UI. No version control, no way for the agent to propose template fixes, and CRLF issues could creep in from browser-based editing. The CRLF bug that broke everything? Probably introduced during a UI edit.

The fix: Created a carryologist/coder-templates repo with the full Terraform source:

coder-templates/
├── docker/
│   ├── build/
│   │   └── Dockerfile
│   └── main.tf
└── README.md

The workflow: edit main.tf in the repo → push → SSH into workstation → coder templates push docker --yes. Optional GitHub Actions CI for auto-push on merge. Template changes are now reviewable, diffable, and blame-able.

4. The Nested Heredoc That Wouldn't Die

The problem: With the startup script in Git and CRLF fixed, the next step was embedding the MCP config and skill file directly in the Terraform template. Nested shell heredocs inside Terraform's <<-EOT seemed like the obvious approach.

Root cause: Terraform's <<-EOT strips leading whitespace from all lines — including the closing delimiters of nested heredocs. The shell never sees the unindented MCP or SKILL terminators, so the heredoc never closes:

syntax error: unexpected end of file

The fix: Base64 encoding. Encoded both files as base64 strings and decoded at runtime:

echo '<base64-string>' | base64 -d > /home/coder/.mcp.json
echo '<base64-string>' | base64 -d > /home/coder/.agents/skills/vibescoder-blog/SKILL.md

No heredocs, no whitespace sensitivity, no Terraform interpolation issues. Ugly but bulletproof.

The full iteration log:

Attempt	Error	Fix
1	`set: -\r: invalid option`	CRLF → LF
2	`syntax error: unexpected end of file`	Nested heredocs → base64
3	Same error	Cached template version — re-cloned and pushed again
4	`Module "nvm" cannot be found`	Removed phantom nvm module
5	Success	Clean boot, skill + MCP config present

Five attempts. Each one taught something different about how Terraform, bash, and Coder templates interact.

5. Workspace Boot: 91 Seconds to 5

The problem: Coder workspaces took 91 seconds to start. Every single boot, not just the first one. The agent logs had been telling me the whole time:

2026-04-28 20:58:09.394 [info]  running agent script...
2026-04-28 20:59:40.948 [info]  script completed  execution_time=1m31.55332s  exit_code=0

Root cause: The default Docker template pattern — count = data.coder_workspace.me.start_count — recreates the container on every start/stop cycle. Only /home/coder persists via a Docker volume. Everything installed to /usr is gone. The startup script was running three apt-get update calls (37.7 MB of metadata each), reinstalling gh, nodejs, npm, sqlite3, redis-tools, and running npm install -g vercel (30+ seconds alone) on every boot.

The fix: Built a custom Docker image that bakes everything in:

FROM codercom/enterprise-base:ubuntu
USER root
RUN apt-get update && \
    apt-get install -y --no-install-recommends \
      curl git zip unzip sqlite3 redis-tools nodejs npm \
    && rm -rf /var/lib/apt/lists/*
RUN curl -fsSL https://cli.github.com/packages/githubcli-archive-keyring.gpg \
      | dd of=/usr/share/keyrings/githubcli-archive-keyring.gpg 2>/dev/null && \
    echo "deb [arch=...] https://cli.github.com/packages stable main" \
      > /etc/apt/sources.list.d/github-cli.list && \
    apt-get update && apt-get install -y --no-install-recommends gh && \
    rm -rf /var/lib/apt/lists/*
RUN npm install -g vercel
RUN curl -LsSf https://astral.sh/uv/install.sh | sh && \
    mv /root/.local/bin/uv /usr/local/bin/uv
USER coder

Updated main.tf with a docker_image resource that uses filemd5 — Docker only rebuilds when the Dockerfile actually changes:

resource "docker_image" "workspace" {
  name = "coder-workspace:latest"
  build { context = "./build" }
  triggers = {
    dockerfile_hash = filemd5("./build/Dockerfile")
  }
}

Stripped the startup script to auth + config only. No more install blocks.

Metric	Before	After
Startup time	91s	~5s
`apt-get update` calls per boot	3	0
Packages downloaded per boot	~52 MB	0 MB
First image build	N/A	~2 min (one-time)

18x faster. The answer was in the agent logs the whole time.

6. GitHub Auth Fix

The problem: Discovered while debugging the CRLF fallout — $GITHUB_TOKEN was empty in agent sessions. The agent was silently failing on GitHub operations and working around it by using gh auth login interactively or skipping auth-dependent steps entirely.

Root cause: The startup script was supposed to run coder external-auth access-token github and export the token, but the CRLF bug meant that line never executed. Even after the CRLF fix, the auth block needed to be in the right place in the stripped-down startup script.

The fix: Added the auth sequence to the new minimal startup script:

GITHUB_TOKEN=$(coder external-auth access-token github)
export GITHUB_TOKEN
echo "$GITHUB_TOKEN" | gh auth login --with-token
git config --global credential.helper \
  '!f() { echo "password=$GITHUB_TOKEN"; }; f'

Token from Coder's external auth → exported to env → piped into gh CLI → wired into git credential helper. Four lines, zero interactive prompts.

7. Automated Screenshot Pipeline

The problem: Screenshots from homelab sessions were piling up with no workflow. Taking a screenshot on the workstation, manually committing to the content repo, then during blog sessions manually analyzing dozens of images one by one. We'd just done this for the Cloudflare and Round 2 posts — it took real time.

The fix: Built an automated screenshot inbox using inotifywait and a systemd user service:

Take a screenshot on the workstation (Print Screen, gnome-screenshot, etc.)
The sync-screenshots service detects the new file in ~/Pictures/Screenshots/
Auto-commits and pushes to blog-drafts/screenshots/ in the content repo
During blog sessions, agents git pull and find screenshots waiting in the inbox
Agents analyze, select, rename, and place — the editorial process stays human/agent-directed

The whole thing is an idempotent setup script: verify gh CLI → clone repo if needed → install inotify-tools → write watcher script → write systemd service → enable + start → enable lingering (survives reboots without a desktop session).

Meta moment: The first screenshot the pipeline synced was a screenshot of the setup script's own "Setup complete!" output.

8. Blog Post Style Consistency

The problem: The Round 2 Model Showdown draft was missing the ending structure every other post follows. Analysis of all 12 published posts revealed a consistent template: "What I Learned" → "What's Next" → "By the Numbers." The Round 2 draft had the first but was missing "What's Next" entirely, and "By the Numbers" used plain text instead of bold metrics.

The fix: Added a "What's Next" section teasing the Gemma vs Opus head-to-head on a real production task. Reformatted "By the Numbers" from - 6 local models benchmarked (parenthetical) to - **6** local models benchmarked — context with em dash. Matches every other post on the site.

Small, but consistency is the difference between a blog and a collection of posts.

9. Image Curation and Security Review

The problem: 21 raw screenshots from the April 25–26 sessions sitting in blog-drafts/ with timestamp filenames. Two unpublished posts (Cloudflare/MCP and Round 2 Showdown) had zero images despite covering highly visual topics.

The fix: Analyzed all 21 screenshots via OCR. Selected 9, rejected 12.

Selected — 6 for the Cloudflare post (DNS setup, tunnel success, SSL config, the AI bot toggle screenshot every content creator needs to see) and 3 for Round 2 (the 579 GB download progress bar, the conversation mode bug, raw terminal benchmark output).

Rejected — 5 redundant Cloudflare UI pages, 1 full desktop with email visible in browser tabs, 1 Coder settings page with "API Keys" sidebar visible (no keys shown but bad optics), 1 low-res terminal, 2 workspace debugging screenshots, 1 unrelated content, 1 benchmark command list.

Every selected image passed a security review for API keys, tokens, email addresses, internal URLs, and passwords before inclusion.

10. Scheduling This Post (While Writing This Post)

This one happened in real time. I was reviewing the draft of this very post with the agent and realized: I'm going to want this to go live at 7:00 AM on Friday, not whenever I happen to remember to flip a flag. Thursday Thoughts on Thursdays, Friday Fixes on Fridays — if the blog has a recurring content calendar, it needs scheduled publishing.

The problem: Publishing a post meant manually flipping published: false to true and pushing. No way to write a post on Wednesday night and have it go live Friday morning.

The fix: A new publishAt frontmatter field and a GitHub Action that runs every 15 minutes:

published: false
publishAt: '2026-05-02T07:00:00-05:00'

The scheduled-publish.yml workflow scans all .mdx files for the combination of published: false and a publishAt timestamp in the past. When it finds one, it flips the flag, removes the publishAt line (so frontmatter stays clean), commits as scheduled-publish[bot], and pushes. The existing deploy trigger fires on that push — Vercel rebuilds, post goes live.

The publishAt field accepts any ISO 8601 timestamp with timezone offset, so 07:00:00-05:00 means 7:00 AM Central regardless of where the GitHub runner is.

The meta moment: The first post to use scheduled publishing is this one. The publishAt in the frontmatter above was added during the same agent session that wrote the workflow. We built the feature and immediately dogfooded it — the agent shipped the infrastructure and then used it on itself.

What I Learned

Invisible bugs are the most expensive. The CRLF bug didn't crash anything. The agent silently compensated — installing tools itself, exploring repos manually — so nothing visibly broke. But every session paid a tax: minutes of unnecessary exploration, 91 seconds of unnecessary boot time, zero institutional memory. The "working" system was burning time on every interaction.

Skills are the agent's long-term memory. Without the skill file, every session started with the agent rediscovering the blog's architecture from scratch. With it, the agent knows both repos, the deploy pipeline, the frontmatter schema, and the security rules before it writes a single line. The difference between a capable assistant and an amnesiac one is a 4.6 KB markdown file.

Bake what you know, script what changes. The startup script pattern — install tools on every boot — works for prototyping. Once you know your toolchain, put it in a Docker image and strip the script to auth and config. 91 seconds to 5 seconds, and the only cost is a two-minute one-time build.

Five attempts is normal. The template push cycle — CRLF, heredocs, caching, phantom module, success — felt frustrating in the moment. But each failure was a different class of bug (encoding, Terraform semantics, caching behavior, dependency resolution). Five attempts across five different failure modes isn't thrashing. It's debugging.

Files Changed

docker/main.tf — startup script CRLF fix, base64 skill/MCP injection, boot optimization, auth fix
docker/build/Dockerfile — new, custom workspace image with all tools baked in
.agents/skills/vibescoder-blog/SKILL.md — new, agent skill for blog operations (delivered via base64 in template)
.mcp.json — new, MCP server config (delivered via base64 in template)
scripts/setup-screenshot-sync.sh — new, idempotent screenshot pipeline installer
~/.local/bin/sync-screenshots.sh — new, inotifywait-based file watcher
~/.config/systemd/user/sync-screenshots.service — new, systemd user service
blog-drafts/screenshots/README.md — new, screenshot inbox conventions
content/posts/model-showdown-round-2-*.mdx — added "What's Next," reformatted "By the Numbers"
.github/workflows/scheduled-publish.yml — new, cron-based auto-publisher

What's Next

Gemma 4 isn't done. The Model Showdown Round 2 results were disappointing — Gemma stopped generating mid-response and scored a zero on the website search task. But the research since then uncovered the real problem: invisible thinking tokens eating the num_predict budget, meaning Gemma was silently using its output quota on reasoning before it ever started writing code.

The fix is straightforward — bump num_ctx and num_predict to 32768, giving both the thinking process and the actual output room to breathe. The VRAM math works on the RTX 5090 with Q4_K_M quantization. We're going to get local models performing better and rerun the exact same website search task. Same prompt, same evaluation criteria, updated config. If Gemma can actually finish the task, the local-vs-cloud story gets a lot more interesting.

By the Numbers

10 fixes shipped in one week
5 template push attempts before clean success
91s → 5s workspace boot time (18x faster)
4.6 KB agent skill file replacing minutes of exploration per session
3 apt-get update calls eliminated per boot
52 MB → 0 MB downloaded per boot
21 screenshots analyzed, 9 selected, 12 rejected
1 CRLF bug silently broken since template creation
1 meta screenshot — the pipeline capturing its own setup
~2 hours from "publish a blog post" to fully working template with skills
~30 seconds estimated time for the same task going forward
1 post scheduled to publish itself using the feature it describes