Friday Fixes #1: Two Bugs, One Workflow
The scheduled-publish.yml workflow runs every 15 minutes. It scans every
.mdx file, finds posts where published: false and publishAt is in the
past, flips the flag, commits, and pushes. Vercel picks up the push. Post
goes live. Simple.
It broke twice in nine days. The second break was caused by the fix for the first.
Bug 1: The Grep That Read the Whole File (May 3)
The workflow's detection logic was one line:
if grep -q 'published: false' "$file"; thenThat scans the entire file — frontmatter and body text both. On May 3 a scheduled draft failed to publish, and the workflow log showed it dying in 7 seconds. The culprit: "Friday Fixes: The Agent Was Flying Blind."
That post was already live. It had published: true in its frontmatter.
But it also had published: false in its body — in the section explaining
how the publishAt field works, where I'd written out example frontmatter:
published: trueGrep matched the example in the prose. The workflow entered the processing block, tried to flip a flag that wasn't there in the frontmatter, and failed. Seven seconds, start to crash.
The self-referential shape is hard to miss. The post that introduced scheduled publishing was the first thing the feature's own bug tripped over.
The fix: extract frontmatter first, then grep and parse from that.
FRONTMATTER=$(sed -n '2,/^---$/p' "$file")
if echo "$FRONTMATTER" | grep -q 'published: false'; then
PUBLISH_AT=$(echo "$FRONTMATTER" | grep '^publishAt:' | sed "s/publishAt: '//;s/'//")
# ... rest of processing
fiAfter the fix I ran the workflow manually. It correctly detected 5 real
scheduled drafts, published them all, and left the already-published post
alone. I also noticed the admin link was missing from the desktop nav —
Header.tsx had it in the hamburger menu but not in the top bar. Added it
while I was in there.
Bug 2: The Dead-Code Line That Wasn't Harmless (May 12)
The May 4 commit that introduced frontmatter extraction also included a verification line — something I'd written to sanity-check the sed pattern during development and then left in:
sed -n '1,/^---$/!{/^---$/,/^---$/p}' "$file" | head -1 > /dev/null 2>&1This line discards everything. Stdout to /dev/null, stderr to /dev/null,
exit code gone — except it wasn't, because set -euo pipefail was active.
Here's what happens. head -1 reads one line and exits, closing the read
end of the pipe. sed writes to a closed pipe and receives SIGPIPE. Under
normal circumstances that's fine — sed exits 141, everyone moves on. Under
pipefail, the non-zero exit from the left side of the pipe propagates.
Under set -e, the script dies. The > /dev/null 2>&1 redirect silences
output; it does nothing about the exit code of the pipeline.
Why it took nine days to notice: the race is probabilistic. If sed
finishes writing before head closes the pipe — because frontmatter is
short and the file is small — sed exits cleanly. With one or two drafts,
sed almost always won. As drafts accumulated, the probability of losing
the race on at least one file per run climbed toward 100%.
Timeline:
- May 4: bug introduced, 1–2 drafts in repo,
sedalmost always finished beforeheadclosed the pipe - May 5–8: intermittent — 58 successful runs, ~10 losses, looked like runner noise
- May 9, 17:11 UTC: last successful run
- May 9–12: 42 consecutive failures, zero notifications
- May 12, ~12:00 UTC:
the-fix-that-was-fixed-four-timesmisses its slot; I notice two hours later when the post isn't live
GitHub emails you when a workflow transitions from passing to failing. Keep failing and you get nothing. By the time I had 42 consecutive failures, the notification had fired once — probably on May 9 — and been absorbed into some digest I'd dismissed. The ongoing silence was indistinguishable from the workflow running cleanly.
The right health metric for a cron job isn't "did it fail" — it's "when did it last succeed." I had no visibility into the latter.
The fix: delete the line.
-sed -n '1,/^---$/!{/^---$/p}' "$file" | head -1 > /dev/null 2>&1Three lines removed (command, blank line, and the comment above it). The
real frontmatter extraction on the next line —
FRONTMATTER=$(sed -n '2,/^---$/p' "$file") — had been working the
entire time. The verification line was never doing anything useful.
After the delete and push, I ran gh workflow run scheduled-publish.yml
manually to recover the missed slot. The post published within a minute.
What Connects Them
Both bugs are about code that looks inert but isn't.
In Bug 1, the grep line looked like a safe filter. The assumption that
published: false would only appear in frontmatter was invisible — there
was no code encoding that assumption, just the pattern itself. Body text
violated it immediately.
In Bug 2, the dead-code line looked like it was doing nothing — output to
/dev/null, stderr to /dev/null, result irrelevant. But it was creating
a pipeline under a shell mode where broken pipelines are fatal. The
> /dev/null made it look inert. The SIGPIPE made it a probabilistic
kill switch.
Dead code with side effects is worse than dead code without. Under
set -euo pipefail, any pipeline where the right side terminates early
(head, grep -m 1, awk 'NR==1{exit}') can kill the script if the left
side is still writing. If you want the first line of a file, read the first
line — don't pipe the whole file through head.
This class of race condition doesn't fail cleanly. It produces a noise floor that rises asymptotically to 100%: indistinguishable from background noise until the crossover point, then complete silence — which looks identical to everything working.
If the unquoted-date YAML bug in Friday Fixes #2 feels related — it is. Same week, same content pipeline, different failure mode. That one hid in a draft post and only surfaced on a route that touches unpublished content. Same pattern of a bug that's invisible to the main code path until a specific condition exposes it.
By the Numbers
- 7 seconds — time to failure for Bug 1
- 1 self-referential bug — the post about scheduling broke scheduling
- 5 real scheduled drafts correctly detected after the Bug 1 fix
- 42 consecutive workflow failures during the Bug 2 window (May 9–12)
- 0 email notifications during those 42 failures
- 3 lines removed to fix Bug 2
- 8 days between introducing Bug 2 and fixing it
- ~2 hours between "post should have published" and "I noticed it didn't"
- 1 manual
workflow_dispatchto recover the missed slot