Editor's note (May 2026): this is a retrospective from when the studio was just the two founders, alone. Stackpulse is now founder-led with a small team behind us, shipping work for clients in five timezones, from London to Auckland. The lessons below are what got us to that point; the discipline points still hold.

For most of FeedPulse's first two years, we were running it badly. Not catastrophically badly — no months-long outages, no customer data leaks, nothing that ended up in a Hacker News thread about why early-stage SaaS shouldn't bootstrap. Just the slow, steady kind of badly that comes from running a multi-tenant system with two engineers, no on-call rota, and an unfashionable desire to keep operating costs under £200 a month for as long as we possibly could.

We crossed 14 active agencies and 71,000 synced products on the platform in May this year. Looking back at what that growth looked like operationally, there are maybe four decisions that mattered and a much longer list of decisions that didn't.

The boring monolith decision

The first non-decision we made was to keep FeedPulse a Laravel monolith. That sounds like a deliberate engineering call now; looking back, it was mostly that we couldn't afford the time to split it up, and it wasn't breaking. The pressure to fragment came almost entirely from outside — a couple of advisors, a contracted engineer who'd come from a microservices shop, one early enterprise prospect who asked "how do you scale this?" in a way that was clearly fishing for the right answer.

We never gave them the right answer. Two years later, the same monolith handles every channel adapter, every queue worker, every tenant's feed-transformation job, the customer dashboard, and the admin console. The Postgres instance has grown from a db.t3.small to a db.r6g.large; the Redis from a 256MB cache to a 1.5GB. That's the entire infrastructure timeline.

The boring monolith is the single decision that made running this tractable for two engineers. Every other operational quality we've built — observability, deployment cadence, anomaly detection — has been cheaper to add because there's only one place to add it.

The on-call rota that didn't exist

For the first eighteen months, "on-call" meant "Rikky has notifications on overnight". There was no rotation. No formal escalation. No runbooks worth the name. The alerts that fired were fired by Sentry and a custom Slack channel, and the ones that mattered were also fired by customers emailing hello@.

This was bad. It was also fine, because the alert volume was low enough that one person's sleep was bearable for a while. The system was simple enough that recovery actions were obvious to whoever was awake.

We added a real two-person rota in month 19, after the third 3am page in a single fortnight (one of which is the subject of another post on this journal). The rota didn't change the fundamental quality of the system — it changed who got broken by it. Rikky stopped being a single point of failure for the platform's recoverability.

The microservices pitch we declined

Around month 14, with five active agencies and our first push into ad-channel territory, we got pitched a microservices rebuild. The pitch was good. Channel adapters as separate services, dedicated workers per tenant tier, a service mesh, a message bus, the whole shape. It would have cost us roughly six months of engineering capacity and forced us to operate at least eight production components instead of three.

We declined for one reason: a two-engineer studio cannot competently operate eight production components. Even if the architecture is theoretically more scalable, you can't scale past an operational ceiling that's set by your headcount.

The right architecture for a two-engineer SaaS is the smallest architecture you can fit your business into. We fit FeedPulse into one Laravel app, one Postgres, one Redis, one queue, one observability stack. Six things instead of fifteen.

What the numbers actually say

Two years in, the platform's sat at 99.97% uptime on a trailing 90-day window. We've had three measurable customer-facing incidents in that period. One was the GMC mass-disapproval wave; one was a Cloudflare Workers regional outage we had nothing to do with but had to communicate around; one was a deployment that introduced a regression in feed transformation for tenants using a specific edge-case rule combination, caught in 28 minutes by the affected tenant's internal QA.

We deploy roughly twice a week. There's no formal release cadence; we ship when something's ready and reverts are rare. Reverts are rare not because we're great, but because the surface area we deploy to is small enough that the test pass we run before deploys catches the obvious things.

What we'd do differently

The list is shorter than you'd expect. The big architectural decisions hold up. What I'd change, in order of how much it would have helped:

Build the on-call rota at month one, not month nineteen. Even if the alert volume is zero. The cultural cost of having no rota outpaces the operational cost.
Invest in observability before we needed it. The 3am page caught us flat-footed because we hadn't set up the disapproval-rate anomaly detection until after the incident. Most monitoring you wish you had costs nothing in advance.
Write the runbooks earlier. Not because you'll need them — most days you won't — but because writing them forces you to notice the assumptions embedded in your architecture. We found two latent issues just from drafting the post-incident runbook for the GMC problem.

And what I wouldn't change: the monolith, the cost discipline, the "stay two engineers" constraint. The unfashionable choices have all paid back.

The boring monolith decision

The on-call rota that didn't exist

The microservices pitch we declined

What the numbers actually say

What we'd do differently

Show us your stack.We’ll tell you exactly what we’d ship.

Show us your stack.
We’ll tell you exactly what we’d ship.