vibescoder

Friday Fixes #1: Two Bugs, One Workflow

·6 min read

The scheduled-publish.yml workflow runs every 15 minutes. It scans every .mdx file, finds posts where published: false and publishAt is in the past, flips the flag, commits, and pushes. Vercel picks up the push. Post goes live. Simple.

It broke twice in nine days. The second break was caused by the fix for the first.

Bug 1: The Grep That Read the Whole File (May 3)

The workflow's detection logic was one line:

if grep -q 'published: false' "$file"; then

That scans the entire file — frontmatter and body text both. On May 3 a scheduled draft failed to publish, and the workflow log showed it dying in 7 seconds. The culprit: "Friday Fixes: The Agent Was Flying Blind."

That post was already live. It had published: true in its frontmatter. But it also had published: false in its body — in the section explaining how the publishAt field works, where I'd written out example frontmatter:

published: true

Grep matched the example in the prose. The workflow entered the processing block, tried to flip a flag that wasn't there in the frontmatter, and failed. Seven seconds, start to crash.

The self-referential shape is hard to miss. The post that introduced scheduled publishing was the first thing the feature's own bug tripped over.

The fix: extract frontmatter first, then grep and parse from that.

FRONTMATTER=$(sed -n '2,/^---$/p' "$file")
 
if echo "$FRONTMATTER" | grep -q 'published: false'; then
  PUBLISH_AT=$(echo "$FRONTMATTER" | grep '^publishAt:' | sed "s/publishAt: '//;s/'//")
  # ... rest of processing
fi

After the fix I ran the workflow manually. It correctly detected 5 real scheduled drafts, published them all, and left the already-published post alone. I also noticed the admin link was missing from the desktop nav — Header.tsx had it in the hamburger menu but not in the top bar. Added it while I was in there.

Bug 2: The Dead-Code Line That Wasn't Harmless (May 12)

The May 4 commit that introduced frontmatter extraction also included a verification line — something I'd written to sanity-check the sed pattern during development and then left in:

sed -n '1,/^---$/!{/^---$/,/^---$/p}' "$file" | head -1 > /dev/null 2>&1

This line discards everything. Stdout to /dev/null, stderr to /dev/null, exit code gone — except it wasn't, because set -euo pipefail was active.

Here's what happens. head -1 reads one line and exits, closing the read end of the pipe. sed writes to a closed pipe and receives SIGPIPE. Under normal circumstances that's fine — sed exits 141, everyone moves on. Under pipefail, the non-zero exit from the left side of the pipe propagates. Under set -e, the script dies. The > /dev/null 2>&1 redirect silences output; it does nothing about the exit code of the pipeline.

Why it took nine days to notice: the race is probabilistic. If sed finishes writing before head closes the pipe — because frontmatter is short and the file is small — sed exits cleanly. With one or two drafts, sed almost always won. As drafts accumulated, the probability of losing the race on at least one file per run climbed toward 100%.

Timeline:

  • May 4: bug introduced, 1–2 drafts in repo, sed almost always finished before head closed the pipe
  • May 5–8: intermittent — 58 successful runs, ~10 losses, looked like runner noise
  • May 9, 17:11 UTC: last successful run
  • May 9–12: 42 consecutive failures, zero notifications
  • May 12, ~12:00 UTC: the-fix-that-was-fixed-four-times misses its slot; I notice two hours later when the post isn't live

GitHub emails you when a workflow transitions from passing to failing. Keep failing and you get nothing. By the time I had 42 consecutive failures, the notification had fired once — probably on May 9 — and been absorbed into some digest I'd dismissed. The ongoing silence was indistinguishable from the workflow running cleanly.

The right health metric for a cron job isn't "did it fail" — it's "when did it last succeed." I had no visibility into the latter.

The fix: delete the line.

-sed -n '1,/^---$/!{/^---$/p}' "$file" | head -1 > /dev/null 2>&1

Three lines removed (command, blank line, and the comment above it). The real frontmatter extraction on the next line — FRONTMATTER=$(sed -n '2,/^---$/p' "$file") — had been working the entire time. The verification line was never doing anything useful.

After the delete and push, I ran gh workflow run scheduled-publish.yml manually to recover the missed slot. The post published within a minute.

What Connects Them

Both bugs are about code that looks inert but isn't.

In Bug 1, the grep line looked like a safe filter. The assumption that published: false would only appear in frontmatter was invisible — there was no code encoding that assumption, just the pattern itself. Body text violated it immediately.

In Bug 2, the dead-code line looked like it was doing nothing — output to /dev/null, stderr to /dev/null, result irrelevant. But it was creating a pipeline under a shell mode where broken pipelines are fatal. The > /dev/null made it look inert. The SIGPIPE made it a probabilistic kill switch.

Dead code with side effects is worse than dead code without. Under set -euo pipefail, any pipeline where the right side terminates early (head, grep -m 1, awk 'NR==1{exit}') can kill the script if the left side is still writing. If you want the first line of a file, read the first line — don't pipe the whole file through head.

This class of race condition doesn't fail cleanly. It produces a noise floor that rises asymptotically to 100%: indistinguishable from background noise until the crossover point, then complete silence — which looks identical to everything working.

If the unquoted-date YAML bug in Friday Fixes #2 feels related — it is. Same week, same content pipeline, different failure mode. That one hid in a draft post and only surfaced on a route that touches unpublished content. Same pattern of a bug that's invisible to the main code path until a specific condition exposes it.

By the Numbers

  • 7 seconds — time to failure for Bug 1
  • 1 self-referential bug — the post about scheduling broke scheduling
  • 5 real scheduled drafts correctly detected after the Bug 1 fix
  • 42 consecutive workflow failures during the Bug 2 window (May 9–12)
  • 0 email notifications during those 42 failures
  • 3 lines removed to fix Bug 2
  • 8 days between introducing Bug 2 and fixing it
  • ~2 hours between "post should have published" and "I noticed it didn't"
  • 1 manual workflow_dispatch to recover the missed slot

Comments