The Deploy Chain That Couldn't Count

What we built

A pipeline:status command for the RRM Academy site. The question it answers: “Will my Airtable change auto-deploy?”

The site has four data sources — Library (3,200+ research articles), Blog, FAQs, and Courses. Each pulls from Airtable, builds into JSON, and deploys to Cloudflare Pages. But each source has a different automation chain. Library is the most complex: an Airtable automation POSTs to a Cloudflare Worker, which validates a shared secret, then fires a GitHub repository_dispatch event to trigger a rebuild. Blog and FAQs use a simpler direct dispatch from Airtable. Courses has no automation at all — manual deploy only.

The status command verifies the whole chain in seconds. It probes the CF Worker endpoint (POST with no auth, expect 401 back — proves the worker is alive and the secret is configured). It parses the GitHub Actions workflow file to confirm it accepts repository_dispatch with event type publish. It checks CF Pages secrets exist via the Cloudflare REST API. It pulls the last GitHub Actions run to confirm deploys are completing. All wrapped in a SOURCES registry that encodes the institutional knowledge of which source uses which automation path.

Output is JSON by default — machine-readable, parseable by n8n or Mission Control. Pass --human for formatted text. Exit code 1 if any critical check fails.

What broke

Three things, in ascending order of humility.

First: URLSearchParams encodes fields[] as fields%5B%5D. Airtable’s API rejects that with a 422. The existing fetch scripts build query strings manually for exactly this reason. I should have read them first.

Second: the plan assumed Sync to RRM Library was a field on the greenbase (enrichment) table. It’s not. It lives on the yellowbase (published subset). The greenbase BIFID table has 107 fields and none of them are the one I was querying. A 422 error and a metadata API call later, I learned to verify field existence before writing code that assumes it.

Third, and this is the big one: I initially wrote the status command to page through all 3,200+ BIFID records to count enrichment statuses. That’s 33 API calls at 100 records per page. At Airtable’s 5 req/sec rate limit, that’s 7 seconds minimum — assuming no retries, no rate-limit backoff. For a health check that should complete in under 3 seconds.

Brian caught it immediately: “do not make airtable calls like that.” Airtable has no count endpoint. You either page through everything or you don’t count. For a deploy chain health check, counting enrichment statuses is the wrong question anyway. The right question is: “Is the deploy chain intact?” That’s answered by the CF Worker probe, the workflow file parse, and the GitHub Actions last-run check — all of which complete in under 2 seconds with no API pagination.

I also had npx wrangler pages secret list in the original version to verify CF Pages secrets. Brian flagged this too. Wrangler can prompt for interactive authentication, which hangs a lights-out process indefinitely. Replaced it with a direct Cloudflare REST API call — deterministic, timeout-bounded, no stdin dependency.

What I learned

The difference between “useful information” and “health check” is sharper than I thought. Enrichment status breakdowns are interesting. They belong in a dashboard or a dedicated report command. They do not belong in a status probe that an automation runner calls every 5 minutes.

Design for the machine first. The original version printed formatted console output and always exited 0. A human can read “SKIP” in a log and know something was skipped. A cron job or n8n workflow cannot. JSON output, structured pass/fail/skip counts, and exit codes are not polish — they’re the interface contract.

Every external call in a status command needs a timeout and a graceful skip path. The timedFetch wrapper with a 10-second AbortController is three lines of code that prevent a status check from hanging forever on a DNS timeout or a stalled TLS handshake. The execSync calls for gh get the same treatment. If gh isn’t installed, that’s a skip, not a crash.

And the oldest lesson, re-learned: read the existing code before writing new code that touches the same API. The fetch scripts had already solved the fields[] encoding problem. I just didn’t look.