My SQLite in production epiphany

My work environment email alerts service – 666a – is a Rails app using SQLite for its production database. I'm super happy with this stack, but it wasn't an obvious choice from the start, and I know a lot of folks still haven't heard the growing hype about using SQLite in production in the Rails community. Here's how I ended up shipping a production app with a SQLite database, and how it helped me rediscover some joy in full-stack work.

Back on Rails

I built the very first prototype version of 666a in Rails. Nothing beats Active Record when you need to iterate really quick on a concept for a database schema. It ran on our old MacBook Air where our Plex server lives. Then, in late August, I got sidetracked by the idea that for the "real" version I needed to redo it in a more "modern" web stack like Next.js.

What a waste of a month this turned out to be. You have to jump through so many hoops for this new edge computing paradigm, but none of the benefits are applicable to an app like 666a with relatively stable load whose users are concentrated in one geographic location. Plus, the core things 666a actually needs – mundane stuff like an ORM or a nice mature localisation framework – are all third party plugins in Next.js. Afterthoughts.

I churned through so many libraries and platforms trying to make this mismatch work. The ones I can remember include Kysely, Prisma, Planetscale, Vercel Postgres, and Inngest. It seemed like no matter what I picked, it was going to get expensive fast, and yet at the same time nothing's fully working right yet because the whole edge functions thing is still so bleeding edge.

By the end of September I was fed up. I deleted my Next.js app and ran rails new with some considerable relief.

But… SQLite? In production?

As I was making that decision to jump back to Rails, I was aware of a growing wave of hype about using SQLite in production. I eagerly followed Stephen Margheim's series of SQLite-related blog posts, AppSignal's series on LiteStack, as well as anything Fly.io published about it.

This used to be unthinkable. In fact it's only in the past month that the Rails core team removed the warning message about running SQLite in production. I reckon it's a combination of factors: it's enabled by technological advances, and driven by a reduced focus on exponential growth due to high interest rates.

Whatever the underlying reason for the mindset shift, it doesn't seem to be news to the folks behind SQLite itself. The quote below has been on their site since at least 2005.

SQLite works great as the database engine for most low to medium traffic websites (which is to say, most websites). Generally speaking, any site that gets fewer than a 100,000 hits/day should work fine with SQLite.

666a is right in the centre of the sweet spot for a SQLite in production use-case. It's a service targeting a niche interest group in a tiny country of 10 million people. I reckon the entire Swedish labour movement could sign up before I'd even need to starting googling for performance tips.

The experience so far

It's rare for me to find this much joy in the back-end & infrastructure side of a project. Normally it's the front-end work that gives me these feelings, but there's something truly special about this tech stack. And the fact that it's cheaper – 666a currently fits within the Fly.io free tier – is only part of it.

The sensation of being able to ship an entire full-stack app as a single Docker image and not even have a docker-compose.yml at all is difficult to describe. There's something weirdly cosy about it. It also feels like it shouldn't even be possible, and brings back nostalgic memories of the 2000s-era trend for PHP features to use a "flat file" for storage instead of a database.

Four months in, there have been no downsides to speak of. It's literally just… fantastic. My database is around 350MB, doing something on the order of 1000ish writes and (really guessing now) 10,000ish reads per day. It's got indexes, I'm doing joins and group by statements and all that normal stuff, and I'm never being confronted with unexpected barriers resulting from the choice of SQLite.

In fact I'm even using ULID primary keys cos I didn't want the infosec headaches of autoincrementing integers (any labour movement-related software always deserves a bit of paranoia) or the awkward ergonomics of regular UUIDs. Setting that up was a piece of piss too.

The contrast I find the most striking in all this is that in Next.js I was just setting up a normal app the normal way, yet somehow still it felt like swimming against the tide, whereas with Rails, even configuring a quirky primary key format on a non-standard database was easy and actually kind of fun.

When you consider the fact that I have zero professional Rails experience, and that I've spent the better part of a decade working mostly with React, it does seem a bit weird. I mean I even really like React for fuck's sake. Seems silly that it's somehow not the easiest way for me to ship a simple app like this one.

My gut feeling is that it's partly an incentives thing. Vercel's in the PaaS business, so obviously the product strategy for Next.js is wrapped up in that and it makes sense to prioritise promoting the serverless paradigm they've bet their entire company on over e.g. integrating a production-ready i18n solution. By contrast, nothing about 37signals' SaaS business incentivises Rails to double as the top of a freemium conversion funnel.

Nothing showcases that difference in incentives better than production SQLite. We use Next.js at my current job, where it's a great fit, and I think I kinda get it: this new serverless stuff is way more worth the trouble when you're hoping to scale up as fast and as big as possible. For more modest projects with smaller troubleshooting time budgets, maybe it’s natural that a tool built by and for the VC ecosystem doesn't fit so great. It feels great that this production SQLite adventure has helped me understand all this a bit better, and after a decade of React I'm stoked about the prospect of having a new go-to stack with more batteries included.

Operating 666a

In the final quarter of 2023 I launched my little side project 666a, which is a service providing work environment email notifications for union reps and people like that. It took a few months of work to build something shippable, and I've been refining it ever since to increase its reliability. I thought it'd be nice to share a glimpse of what that refinement work looks like. This is gonna be a technical one, so apologies to anyone who preferred the union propaganda.

How 666a works

The beating heart of 666a is a daily cron job that checks for new public filings on the Work Environment Authority's website and sends email notifications whenever it finds one matching a user's subscription criteria.

Filings are published with one day of delay, which means Monday's filings become available on Tuesday and so on. So each day, 666a scans the filings from the previous day.

Backdated filings and "document lag"

The complicating factor here is that not all Monday's filings are immediately available on Tuesday. In fact, they might not even be filed at all until next Friday. This is a phenomenon I've been referring to as "document lag". The Work Environment Authority isn't an API: it's an organisation powered by people, interactions, emails, and even good old fashioned physical mail. This makes for a much more socially constructed model of time than I'm used to in my everyday tech work where machine-generated timestamps with millisecond accuracy are the norm.

I didn't spot this nuance until after launch. I take the responsibility of operating this service really really seriously, and I'm not comfortable with the idea of anyone missing out on important information because of a detail like that. So I've been slowly tweaking the design of the system to meet this requirement.

Manually triggered backfill

My first move once I understood the system was missing some of the data was to start manually triggering the scan job from time to time. I was very deliberate about designing the actual jobs around this kind of versatility, so a bit of manual intervention was easy and didn't require any code changes. This is more or less what I'd run in the rails console from time to time.

WorkEnvironment::DayJob.perform_now("2023-11-04")

Doing this every day was a labour-intensive approach, but it was a nice way to feel out the scope of the problem. Took about a week for this to give me a nice clear sense of how far back stuff was typically getting backdated.

Scheduled backfill

Based on that informal feel for the typical lag between document filing dates and their actual first appearance, I began scheduling a bunch of backfill jobs like these.

scheduler.cron("0 14 * * *") do
  WorkEnvironment::DayJob.perform_later(
    2.days.ago.strftime("%Y-%m-%d")
  )
end

These have been running nicely since early December. For most days, the average document lag is only about a day or two. So except for the occasional rogue document filed 30 days after its official filing date, this was mission accomplished.

Christmas

The arrival of Christmas unlocked a new level of this puzzle. Suddenly, my daily document count graph flatlined. Everybody had buggered off to watch Donald Duck, and even the steady everyday drumbeat of the asbestos removal company filings fell silent.


      Bar chart with y axis going up to 1500 and x axis of dates ranging across December 2023.
      The first three weeks of the month average between 500 and 1000 documents per weekday.
      The final week averages zero.
Documents per day in December 2023, as of New Year's Eve

This would be no big deal, except by this point I'd also backfilled the data well into 2022. And the December 2022 graph's final week looked nothing like this.


      Bar chart with y axis going up to 1000 and x axis of dates ranging across December 2022.
      The first three weeks of the month average between 500 and 1000 documents per weekday.
      The final week averages 250.
Documents per day in December 2022

I immediately suspected that this was another case of document lag. In other words everyone's off for Christmas, and when they get back they'll backdate all the filings that were missed during the holidays.

Quantifying document lag

The only way to find out if my document lag theory was the right explanation was to ignore the problem, enjoy Christmas, and wait and see. It probably didn't matter anyway: I don't think many world-changing public filings are dropping during the Christmas season.

I did want some way to visualise the issue when the backdated filings began to trickle in though. So I created a new chart displaying the document lag for each day since launch. The bar height here represents the average number of days difference between that date and the true chronological first appearance of the documents officially filed on that date.


      Bar chart with y axis going up to 10 and x axis of dates ranging across Q3 of 2023.
      Until late December the values average at one or two with occasional blips.
      After December averages above 5 are more tpical.
Average document lag per day since launch day

The Christmas spike represents the fact that the documents officially filed on e.g. December 27th were in fact not filed until January 3rd when people began returning to work.

Next Steps

I think the scale of this phenomenon will probably more or less double for Midsummer, so that's what I want to be prepared for next. Once the dust settles from the Christmas period, I'm thinking I'll figure out the maximum document lag measured in that week, double it, and then wire up the scheduler to backfill that far back every single day. I want Midsummer to run smoothly even if I'm too busy to do any manual interventions in the moment.

By the way, if you're a tech worker in Sweden and the events of the past year or so have got you thinking some new thoughts about labour relations in our industry, you might wanna check out 666a more closely. You can grab your employer's organisationsnummer from Bolagsverket's company search tool and then subscribe to get notified (for free) about their correspondence with the Work Environment Authority. I'm also hosting web versions of the Swedish government's unofficial English translations of the Work Environment Act, the Co-Determination Act and the Employment Protection Act, which can be useful resources to know about depending on how 2024 unfolds at your job. Check it out!