Syndicating to Substack: The Undocumented Path

Substack's onboarding screen lists nine platforms it "seamlessly imports from": Medium, Ghost, WordPress, Mailchimp, Beehiiv, SeekingAlpha, Tumblr, TinyLetter, Blogspot. Underneath, in smaller text: "...or website with an RSS feed."

That second sentence is the one that matters for any blog not on the list. It's also the sentence Substack has the least documentation on, the most quirky failure modes around, and — based on the four hours this took us — the most undocumented edge cases.

This is the writeup of getting vibescoder.dev (a custom Next.js blog with MDX content) onto Substack as a curated 13-post syndication. Every error we hit. Every dead end we ran down. Every workaround that ended up shipping.

The starting position

Vibescoder.dev has lived on its own domain since April. The stack is Next.js 16 on Vercel, MDX content in a private GitHub repo, deploy on push. The blog already had a working RSS feed at /feed.xml with all 34 published posts, full content in <content:encoded>. Total feed size: 544 KB.

The goal: get the curated subset (essays and frameworks, not build logs and homelab posts) onto Substack as a one-time bulk import. Treat Substack as a distribution channel, keep vibescoder.dev as the canonical home, use rel="canonical" to consolidate link equity back to the domain.

Simple, on paper.

Failure 1: Onboarding flow rejects unknown domains

First attempt: paste https://vibescoder.dev/feed.xml into substack.com/signup/import.

Unable to fetch any posts from this URL.

The onboarding importer at /signup/import is gated to known platforms. It pattern-matches the URL against domains it recognizes — *.medium.com, *.ghost.io, *.wordpress.com, etc. — and rejects anything else without trying very hard to parse it. The error wording suggests it tried to fetch your URL; in practice it barely tried at all.

The fix: create the publication first (skip the import step on signup), then use the in-dashboard importer at Settings → Import → Import posts. That importer is more permissive and actually tries to parse arbitrary RSS.

Failure 2: In-dashboard importer also rejects the feed

The in-dashboard importer at least tries. But it also returned:

Unable to fetch any posts from this URL.

Time to actually diagnose instead of guessing. The Substack JS bundle reveals the API endpoint: POST /api/v1/import/posts with {url: "..."} in the body. Hitting it directly:

curl -sX POST "https://substack.com/api/v1/import/posts" \
  -H "Content-Type: application/json" \
  -H "Origin: https://substack.com" \
  -H "Referer: https://substack.com/signup/import" \
  --data '{"url":"https://vibescoder.dev/feed.xml"}'

{"error":"Unable to fetch any posts from this URL.","type":"single"}

Same error, but now scriptable. This is the diagnostic loop unlock — every change we make from here is testable in two seconds.

What does work

To narrow the search, we pointed the same endpoint at six known-working blog feeds:

Feed	Size	`<content:encoded>`	Substack import
overreacted.io/rss.xml	23 KB	No	✅ 57 posts
stratechery.com/feed	47 KB	Yes	✅ 10 posts
kentcdodds.com/blog/rss.xml	97 KB	No	✅ 211 posts
joshwcomeau.com/rss.xml	114 KB	Yes	✅ 86 posts
leerob.com/feed.xml	(404 ATM)	—	❌
vibescoder.dev/feed.xml	544 KB	Yes	❌

Pattern visible: the working feeds top out around 114 KB. Ours was almost 5× that. Hypothesis: Substack rejects feeds above some size threshold.

The first rebuild: structural alignment

Before doing anything about size, we rebuilt the feed to mirror Ghost's structure (the platform Substack imports most cleanly from). Changes:

Switched <author>email (Name)</author> to <dc:creator>Name</dc:creator>. RSS 2.0 requires emails in <author>, which exposes the writer's address and trips some importers' privacy filters.
Added xmlns:dc to the <rss> root.
Added <generator>, <image>, and <ttl> to the channel.
Set <guid isPermaLink="false"> so parsers don't try to validate the guid as a URL.
Changed Content-Type from application/xml to application/rss+xml.

Result: still rejected.

The size theory looks confirmed (but isn't)

To test the size hypothesis without breaking the main feed, we added a second route: /syndicate.xml. Same Ghost-style structure, but only 13 posts (filtered via a new syndicate: true frontmatter flag), and with <content:encoded> omitted entirely. The thinking: Substack would follow each <link> URL and parse the article from the HTML page, the same way they do for Medium imports.

Result: feed dropped to 9.7 KB. Still rejected.

So size isn't the only thing. Or it isn't the thing at all.

The smoking gun

The breakthrough came from running an experiment we should have tried two hours earlier: serve the exact same XML bytes from a different host.

# Copy our feed to a temp GitHub repo, serve via raw.githubusercontent.com
git init && git add syndicate.xml && git commit -m test
gh repo create carryologist/feedtest --public --source=. --push
 
# Try the importer
curl -sX POST "https://substack.com/api/v1/import/posts" \
  --data '{"url":"https://raw.githubusercontent.com/carryologist/feedtest/master/feed.xml"}'

{
  "import_id": "f0b7b849-ffff-4070-b358-490cb694f38b",
  "importer_name": "RSSPostImporter",
  "pub": {"name": "Vibes Coder"},
  "num_posts": 13
}

200. 13 posts. Same exact bytes.

The feed was fine all along. Substack's importer specifically refuses to fetch from vibescoder.dev. We confirmed this by also proxying through webhook.site with the bytes mirrored — same result: works from anywhere except our origin.

Why does Substack reject our origin specifically?

We never fully solved this. The candidates we ruled out:

Not Cloudflare blocking — Substack's fetcher reaches our origin (we caught their requests via a webhook.site honeypot, traced them to AWS EC2 us-east-1, no User-Agent header).
Not the Content-Type — tried application/xml, application/rss+xml, text/xml; same rejection on each.
Not feed size — failed at 9 KB just as much as at 544 KB.
Not feed structure — bytes that work elsewhere fail at our origin.

Most plausible theory: domain reputation. The .dev TLD is relatively new, vibescoder.dev is two months old, and Substack likely has a domain-reputation check baked into their fetcher that silently 400s for unknown domains. Their fetcher running with no User-Agent reinforces this — it looks like a bot that's been hardened against scraping, and bots like that often have allowlists.

This is also consistent with Substack's incentive: they want to ingest from known blog platforms, not from arbitrary domains that might be content-spammers. False negatives (rejecting legitimate blogs) are cheaper for them than false positives (importing junk).

The workaround that shipped

Skip the fight. Mirror the feed via GitHub. The whole flow:

vibescoder.dev/syndicate.xml
  ↓  (manually re-publish to mirror repo)
github.com/carryologist/vibescoder-syndicate/main/syndicate.xml
  ↓
raw.githubusercontent.com/carryologist/vibescoder-syndicate/main/syndicate.xml
  ↓  (paste into Substack importer)
13 imported posts on vibescoder.substack.com

This is dumb. It's also reliable, free, and took ~5 minutes to set up. For a one-time bulk import, mirroring is the right answer.

For ongoing syndication (where you want every new post to flow automatically), a GitHub Action on push that copies syndicate.xml from your blog to the mirror repo turns this into a 30-second sync. We haven't built that yet; it's earned its place on the TODO list.

Failure 3: The first import succeeded but produced shells

Substack accepted the GitHub-hosted URL. 13 posts imported. We celebrated.

Then we opened one and saw three sentences of body text. Every imported post contained only the description field — no actual article.

Mistake on our part: we'd designed /syndicate.xml to omit <content:encoded>, on the theory that Substack would follow the <link> and parse the article from the page. That's not what Substack's importer does. It reads the body from <content:encoded> only. If the field is missing, the import is the description — a one-paragraph summary.

Fix: put <content:encoded> back. Same MDX→HTML pipeline we use for the main feed, scoped to the 13 syndicated posts. Total feed size with bodies included: 202 KB. Still under most "real" working feeds, and accepted by the importer when served from the GitHub mirror.

Failure 4: The re-import dedup gotcha

We deleted the 13 truncated posts from Substack, refreshed the mirror with the body-bearing feed, and re-ran the import. The API returned 200 with num_posts: 13.

But spot-checking the posts revealed that two of them were still truncated. The other 11 had full bodies; two had ~50 words each.

The cause is subtle. Substack's importer deduplicates against publication history, not just live posts. When you delete a post from your archive, its GUID stays in the importer's memory. Re-importing with the same <guid> for that URL gets silently skipped, even though the post no longer exists.

Of our 13 deletes, 11 cleared the dedup cache (we never figured out exactly why). Two — closing-the-loop-from-audit-to-ten-commits and thursday-thoughts-the-models-we-cant-run — were "remembered" by the importer and skipped on re-import.

The fix is mechanical: change the <guid> for just those posts. We added a #reimport-v2 fragment to the GUIDs in the mirror — <link> stays the real URL (so canonicals work), <guid> becomes a value Substack has never seen:

-<guid>https://vibescoder.dev/posts/closing-the-loop-from-audit-to-ten-commits</guid>
+<guid>https://vibescoder.dev/posts/closing-the-loop-from-audit-to-ten-commits#reimport-v2</guid>

Delete the two posts again, re-run the import, this time Substack treats them as new content. Full bodies. Done.

Failure 5: The "you can't index us yet" wall

13 posts imported with full bodies. Open one on vibescoder.substack.com and the HTML head contains:

<meta name="robots" content="noindex" />
<link rel="canonical" href="https://vibescoder.substack.com/p/..." />

Substack auto-noindexes any publication that consists entirely of imported posts, until the author has written at least one original post in the Substack editor. Their UI literally says so in the JS bundle:

"This publication is temporarily not available to search engines because the author needs to create a new post other than..."

This is an anti-spam policy — they don't want syndication-farm publications appearing in Google. Reasonable.

Fix: write a short native post in the Substack editor. 600 words is plenty. Anything that demonstrates you'll actually compose on the platform, not just pipe imports through. Once that ships, the publication-level noindex gets lifted on Substack's next moderation pass (24h-1wk, by reports).

Failure 6 (in progress): Canonical URLs not user-settable

The whole point of the canonical-URL strategy is to tell Google: "yes, this content also lives at vibescoder.substack.com/p/..., but the authoritative version is at vibescoder.dev/posts/.... Consolidate signals there."

Substack's data model has a canonical_url field on each post (we confirmed it in the JS bundle). The HTML template renders it as <link rel="canonical"> when set. But the editor UI does not expose an input control for it on every account.

Specifically, our publication's editor SEO panel shows:

SEO title
SEO description
Post URL (slug)
...and that's it.

No Canonical URL field. There may be a publication-level rollout in progress, or it may be account-tier-dependent (paid publications get it first?), or it may just be a rollout that hasn't reached us yet.

The available workarounds:

Wait. Substack rolls out UI changes gradually. The field may appear within weeks.
Email hello@substack.com. They've manually enabled canonical URLs for users in similar situations.
Accept the noindex. While the publication remains noindex, the absence of a canonical URL doesn't matter — Google ignores noindexed pages entirely.

We chose option 3 for now. If/when Substack lifts the noindex, we'll revisit options 1 and 2.

What the final flow looks like

Putting it all together for the next time someone needs to do this:

Build a curated feed. Add a syndicate: true frontmatter flag. Add a route (/syndicate.xml) that filters to flagged posts and emits Ghost-style RSS with <content:encoded> containing full HTML bodies. Mirror Overreacted/Ghost format byte-for-byte for safety.
Mirror through GitHub. Copy syndicate.xml to a public repo. Substack's importer accepts the raw.githubusercontent.com URL.
Create the Substack publication. Skip the onboarding import step.
Write one short original post in the Substack editor. This is the gate that lifts publication-level noindex.
Import. Settings → Import → Import posts → paste the raw.githubusercontent.com URL. Confirm.
Spot-check word counts on a few posts. If any are truncated, delete them, bust their <guid> values in the mirror (add a #v2 suffix), and re-import.
Set canonical URLs on each imported post — if and only if the field appears in your editor's SEO panel. Otherwise email Substack support or wait.
Wait 1-7 days for Substack to lift the publication-level noindex. Confirm by checking the meta robots tag on any post.

For ongoing syndication (not just one-time import), add a GitHub Action that re-syncs the mirror on every push to your content repo. Then any post with syndicate: true in frontmatter flows to Substack automatically. Pair that with a python-substack worker if you want the canonical URL set programmatically going forward.

What I'd do differently

Three things, in order of how much time they would have saved:

Start with the API, not the UI. Hitting POST /api/v1/import/posts directly turned a 5-minute-per-attempt UI loop into a 2-second curl loop. Should have done this in the first 15 minutes, not after an hour.
Test against known-working feeds early. Comparing our feed against Overreacted, Stratechery, etc. via the same API surfaced the size and structural diffs in one experiment. We did this; just two hours later than we should have.
Test the origin in isolation. The breakthrough was "is it our content or our domain?" — answerable in five minutes by serving the same bytes from a temp GitHub repo. We should have run that experiment the moment the structural fixes weren't working.

The meta-lesson: when a black-box system rejects you, the productive direction isn't trying more variations of what you're sending — it's narrowing down what specifically the system objects to. Every minute spent on the format theory was a minute not spent isolating that the format was fine.

By the Numbers

Time spent on initial diagnosis (wrong direction): ~2 hours
Time to fix once the actual problem was identified: ~30 minutes
Total commits to the engine: 5 (HTML rendering, Ghost-shape, /import.xml [reverted], /syndicate.xml, content:encoded re-add, content-type fix)
GitHub mirror repos created: 2 (one test, one production)
Substack imports attempted: 4 (failed, succeeded-but-truncated, partial, complete)
Posts in the final Substack archive: 13 (essays, frameworks, Thursday Thoughts, Showdown Thoughts)
Posts still blog-only: 21 (build logs, Friday Fixes, homelab posts, infrastructure writeups)
Substack subscribers: 0, at time of writing — which is exactly the right number to start with