The Fix That Was Fixed Four Times

My wife started using the homelab Coder instance this week. She's a fellow vibe coder, she has her own GitHub account, and she wanted to push code from her workspace. The agent told her GitHub was scoped to read-only.

That's how a Sunday afternoon turned into a five-problem debugging cascade, a forced migration cleanup, an accidental outage of the very AI assistant helping me debug, and an archaeological dig through my own blog fodder that revealed the same bug had been discovered and "fixed" four times in ten days — without ever actually being deployed.

1. The Config That Wasn't

The first thing I checked was Admin Settings → External Authentication. The page showed exactly one thing: "No providers have been configured!"

This was confusing, because I distinctly remembered setting up a GitHub OAuth App weeks ago. The Client ID was in my GitHub Developer Settings. The env file at /etc/coder.d/coder.env had all the right variables:

CODER_EXTERNAL_AUTH_0_TYPE=github
CODER_EXTERNAL_AUTH_0_CLIENT_ID=Ov23li<redacted>
CODER_EXTERNAL_AUTH_0_CLIENT_SECRET=<redacted>
CODER_EXTERNAL_AUTH_0_MCP_URL=https://api.githubcopilot.com/mcp/
CODER_EXPERIMENTS=oauth2,mcp-server-http

Everything looked right. But "looks right" and "is loaded" are different things.

The agent suggested checking whether the running Coder process actually had these variables:

sudo cat /proc/$(pgrep -f 'coder server')/environ \
  | tr '\0' '\n' \
  | grep CODER_EXTERNAL_AUTH

Nothing. The process had zero external auth variables. The env file existed. Coder was running. But the two had never met.

Root cause: The systemd service file had no EnvironmentFile= directive. The env file was sitting there, perfectly formatted, completely ignored. The service file looked like this:

[Service]
Type=simple
ExecStart=/usr/bin/coder server --http-address 0.0.0.0:3000
Restart=always
RestartSec=5
User=youruser
Environment=HOME=/home/youruser
Environment=CODER_EXPERIMENTS=agents

No EnvironmentFile=/etc/coder.d/coder.env. One line missing, entire feature broken.

The fix: Add EnvironmentFile=/etc/coder.d/coder.env to the [Service] section. Then daemon-reload and restart.

2. The Flag I Killed While Fixing the Flag

The service file also had a hardcoded Environment=CODER_EXPERIMENTS=agents line. The agent told me to remove it since the env file already had CODER_EXPERIMENTS defined. Made sense — don't duplicate config.

After restarting, the External Auth page showed the GitHub provider. Progress. But the Agents tab was gone.

The env file had CODER_EXPERIMENTS=oauth2,mcp-server-http. The hardcoded line I just removed was the only thing enabling agents. Nobody had ever added it to the env file.

The fix: Update the env file to CODER_EXPERIMENTS=oauth2,mcp-server-http,agents.

The meta-moment: I couldn't use my homelab agent to fix this because the agents feature was the thing I'd just broken. I had to open a separate Coder session on my work instance and troubleshoot from there. Debugging your AI coding assistant with your AI coding assistant — when the first one is broken.

3. The Domain Migration Aftershock

With agents back and external auth configured, my wife tried to link her GitHub account. She clicked "Click to Login" in her user settings and got a redirect URI mismatch error from GitHub.

A few weeks ago, I migrated the Coder instance from a *.pit-1.try.coder.app tunnel URL to a Cloudflare-backed custom domain at coder.vibescoder.dev. The CODER_ACCESS_URL was updated. DNS was working. The UI loaded fine. But the GitHub OAuth App still had the old URLs:

Homepage URL: https://xxxxxxxxx.pit-1.try.coder.app
Callback URL: https://xxxxxxxxx.pit-1.try.coder.app/external-auth/github/callback

Updated both to the new domain. No Coder restart needed — this is GitHub-side config.

Gotcha on the callback path: The callback URL uses the provider ID, not a numeric index. Since I didn't set CODER_EXTERNAL_AUTH_0_ID explicitly, Coder defaults to using the TYPE value as the ID. The correct path is /external-auth/github/callback, not /external-auth/0/callback. The first attempt with /0/ failed silently.

After the fix, my wife authorized the OAuth App. GitHub showed "Authorize carryologist" — which briefly confused us, since that's my handle, not hers. But that's standard OAuth: the app is owned by me, and she's granting it permission to act on her behalf. App owner ≠ authorizing user.

4. The Agent That Still Couldn't Push

External auth: configured. OAuth app: linked. Second user: authenticated. Everything should work.

I went back to my Coder Agent workspace to push the blog fodder file I'd been writing about this whole saga. The agent couldn't clone my private repo. "Invalid username or token."

The external auth token existed — coder external-auth access-token github returned a valid token. But git operations failed because the credential helper was reading from an empty environment variable:

git config --global credential.helper
# → !f() { echo "password=$GITHUB_TOKEN"; echo "username=x-access-token"; }; f

The helper sends $GITHUB_TOKEN as the password. But $GITHUB_TOKEN was empty. It was being set in .bashrc:

export GITHUB_TOKEN=$(coder external-auth access-token github 2>/dev/null)
export GH_TOKEN="$GITHUB_TOKEN"

The problem: .bashrc only runs in interactive shells. The Coder Agent runs git operations in a non-interactive context. No .bashrc sourcing, no $GITHUB_TOKEN, no authentication. The credential helper faithfully sent a blank password on every request.

The fix: Change the credential helper to fetch the token inline instead of reading an environment variable:

# Before (broken for agents):
git config --global credential.helper \
  '!f() { echo "password=$GITHUB_TOKEN"; echo "username=x-access-token"; }; f'
 
# After (works everywhere):
git config --global credential.helper \
  '!f() { echo "password=$(coder external-auth access-token github 2>/dev/null)"; echo "username=x-access-token"; }; f'

Updated the template's main.tf, pushed it with coder templates push, then ran coder update blog-fodder to rebuild the workspace.

Another gotcha: coder restart does not pick up template changes. It restarts using the same build. You need coder update <workspace> to rebuild with the latest template version. This distinction will trip up anyone who doesn't know to look for it.

5. The Archaeology

With everything finally working, I asked the agent to search through all my previous blog fodder and published posts to find when this credential helper pattern was introduced. What it found was worse than I expected.

The same bug had been discovered and "fixed" four times in ten days:

Date	Session	What Happened	Did It Stick?
~Apr 24	Gemma research	Agent silently failing auth. Applied `.bashrc` export fix. Pushed via `coder templates push`.	Overwrote whatever was live with a weaker fix
Apr 28	Housekeeping	Recognized `.bashrc` doesn't work for agents.	No fix — documented workaround in a skill file
Apr 30	Deploy Day	Full 3-layer diagnosis. Applied the correct fix: Terraform `env {}` block + inline credential helper + `.bashrc` cleanup. Committed to `coder-templates` git repo.	Never pushed — agent lacked `coder templates push` permissions
May 3	This session	Second user can't push. Same broken pattern.	Fixed — finally

The April 30 fix was the right one. It used Terraform's data "coder_external_auth" resource to inject GITHUB_TOKEN and GH_TOKEN at the process level — no shell init files needed, no environment variable dependencies. It even cleaned up stale .bashrc entries. It's documented in my own published post, "Invisible Failures."

But the agent that wrote it didn't have template admin permissions. It committed the fix to the coder-templates git repo and moved on. Nobody ran coder templates push. The fix sat in version control, correct and complete, for three days.

Meanwhile, the workstation's local copy of ~/coder-templates was seven commits behind the git repo. When I tried to git pull today, it had a merge conflict with the manual credential helper edit I'd just made. After stashing and pulling, the full April 30 fix — including the Terraform env {} block — was finally pushed to the live Coder server.

Three days late. Four discoveries. Zero deployments until today.

What I Learned

A fix that can't be deployed isn't a fix. The agent committed a comprehensive solution to version control and moved on. But the agent didn't have permission to run coder templates push, and nobody flagged that as a follow-up. When an AI assistant tells you it committed a fix, you need to ask: "Is it deployed?" If the answer involves "you'll need to manually..." — that's not done, that's a TODO.

The same bug will keep hiding if the symptom is silent. The credential helper sent blank passwords. Git returned "authentication failed." The agent worked around it. Nobody crashed, nobody alerted, nobody noticed — until a second user showed up and didn't have the workarounds baked into her muscle memory. Adding a second user to any system is the fastest way to find configuration debt.

Shell init files are a liability for non-interactive contexts. .bashrc has an interactive guard. .profile runs for login shells. Neither is guaranteed in an agent's execute() call. If a token needs to be available everywhere, inject it at the process level — Terraform env {} blocks, systemd Environment= directives, container env vars. Anything that doesn't depend on which shell sourced which file.

coder update vs coder restart is a critical distinction. Restart reuses the existing build. Update rebuilds with the latest template. If you push a template change and restart, nothing changes. This will quietly waste an hour of anyone's time the first time they hit it.

Domain migrations have a long tail. I updated CODER_ACCESS_URL and DNS weeks ago. Everything seemed fine. But the GitHub OAuth App still had the old callback URL, silently waiting to break the first time someone tried to authenticate. Migration checklists need to include every external service that has a callback or webhook pointing at the old URL.

By the Numbers

5 chained problems from one "can't push to GitHub" symptom
3 Coder sessions needed to debug — homelab agent, work agent, homelab terminal
4 times the same credential helper bug was discovered and "fixed" in 10 days
3 days the correct fix sat in a git repo, never deployed
7 commits behind — the gap between the workstation's local template and the git repo
1 missing line (EnvironmentFile=) caused the first two problems
1 stale callback URL from a domain migration broke OAuth for every new user
0 crashes, 0 alerts — every failure was silent
3 experiment flags that all need to coexist: oauth2, mcp-server-http, agents
~90 minutes from "External Auth shows nothing" to fully working multi-user push access
1 wife who just wanted to push some code