I have been spending a lot of time recently in a specific kind of conversation. Someone shows me their data application, or describes one they want to build, and asks whether they should use Snowflake or BigQuery or Databricks or some other managed platform — or whether they should build the thing themselves on cheaper components.
The answer I find myself giving has stabilized into a metaphor: cloud platforms are luxury cars. You pay extra to skip the assembly. Whether that's worth it depends on whether you have a garage and a wrench, and how you value your time.
This is the essay-length version of an argument that runs through several of my recent posts on data infrastructure — the hub piece Why Snowflake's Bet on Streamlit Just Works is the strategic version, Cortex Search vs Hybrid SQLite RAG is the numbers version, and Cloudflare Tunnel as the Indie Developer's Public IP is the personal-infrastructure version. This one is about the underlying philosophy and when each philosophy is right.
The two coherent positions
Let me be specific about what I mean by the two paths, because the conversation often gets muddied by people defending the wrong middle ground.
The managed-platform path. You put your data in Snowflake (or BigQuery, or Databricks, or your cloud's equivalent). You build your applications on top using their tooling — Streamlit in Snowflake, BigQuery's built-in BI tools, Databricks notebooks. Retrieval is Cortex Search or a vendor equivalent. Authentication is the platform's identity layer. Governance, audit, compliance, lineage — the platform handles it. You pay for compute by the second, storage by the gigabyte-month, and queries by complexity.
The self-hosted path. You put your data in SQLite (or Postgres, or DuckDB, or Parquet files on a disk). You build your applications on a laptop or a small server. Retrieval is sqlite-vec plus FTS5 with a fusion ranker you wrote. Authentication is OAuth proxied through Cloudflare Access or your own JWT layer. Governance, audit, compliance — those are your job. You pay for electricity and a domain name.
Both paths are coherent. Both work. The mistake people make is trying to live in the middle — a managed warehouse with a custom application layer on a different cloud, or a self-hosted database with a managed BI tool on top. The middle is where you pay full price for the platform and take on most of the work of assembling things yourself.
What cloud actually charges for
The intuitive complaint about managed platforms is that they're expensive. This is true, but the framing is incomplete. They're not expensive because they mark up compute and storage — those costs are real but not the bulk of what you pay. They're expensive because you're buying engineering hours you don't have to hire.
Consider what a competent Snowflake setup at a mid-sized company actually provides:
- Encryption at rest, encryption in transit, key rotation. Done.
- Role-based access control with row-level and column-level security. Done.
- Audit logs that satisfy SOC 2 and similar compliance frameworks. Done.
- Automatic scaling of compute up and down with query load. Done.
- Time-travel queries that let you reconstruct any table at any point in the recent past. Done.
- Disaster recovery across regions. Done.
- Integrations with every common BI tool, ETL system, and identity provider. Done.
For a regulated enterprise, building any one of those properly is a six-month project with at least one full-time engineer. Building all of them is a team of five working for a year. Paying Snowflake $200K/year to have all of it handled, with SLAs and a support contract, is straightforwardly cheaper than the headcount.
The luxury-car metaphor is exact. A BMW is mechanically not that different from a Toyota — both have engines, transmissions, suspensions. You pay for the BMW because someone tuned the suspension, picked nicer materials, made the dashboard cohesive, and put their reputation behind it. If you have the time and the tools, you can absolutely build something equivalent. If you don't, the BMW gets you to work on Monday morning.
What self-hosted actually charges for
The mirror-image complaint about self-hosted setups is that they're complicated. This is also true, but also incomplete. They're not complicated because the components are bad — SQLite, Postgres, Cloudflare Tunnel, uv, and modern Linux are all extraordinarily good software. They're complicated because you're paying with your own attention what the managed platform pays for with money.
Concretely, the things you take on:
- Patching the OS, the database, the runtime, the application dependencies.
- Backups, and (more importantly) restoring from backups — which is the only test that matters.
- Monitoring. Logs. Alerts. Knowing when something is broken before your users tell you.
- Security: keeping up with CVEs, understanding what's exposed and what isn't, rotating credentials.
- Capacity planning: when do you need to upgrade the disk, add a server, partition the data?
- Documentation that survives you forgetting what you built.
These are not exotic. A solo developer can do all of them. But they take real ongoing time, and if you don't enjoy infrastructure work, that time feels like a tax. Cloud platforms exist for the people who want to spend their time on the business logic and let someone else handle the rest.
The honest version of the self-hosted argument is: "I find infrastructure work interesting, I'm good at it, I want fine-grained control over my stack, and the cost savings are real enough to matter." All four parts have to be true for self-hosting to be the right call. If any one of them isn't, you're better off paying the platform.
The crossover point
Different people will pick different paths even with the same data. The crossover point is mostly about three variables.
Headcount. If you have engineers and they have time, you can self-host. If you have one engineer (you) and you also need to write the actual application, you cannot afford to also operate the database, the search index, the network layer, and the deployment pipeline. Buy the platform.
Regulatory exposure. If your data is subject to HIPAA, SOX, PCI-DSS, GDPR with strict residency requirements, or any equivalent — the managed platforms have spent millions of dollars on compliance work that you would have to replicate. This is true even if you technically could build a compliant self-hosted system. The audit cost is the real expense, and the platforms bundle it.
Predictability of load. Managed platforms are good at "we don't know how much traffic we'll get." They scale up and you pay for what you used. Self-hosted is good at "we know roughly how much traffic we'll have." You provision for that and pay a fixed cost. If your load is genuinely unpredictable and could 10x overnight, you probably want managed. If it's roughly known and bounded, self-hosted is cheaper.
For a regulated enterprise with unpredictable load and a thin engineering team, the managed platform is the only correct answer. For a solo builder with steady personal traffic and a hobby relationship with their stack, self-hosting is the only correct answer. Most actual situations are somewhere between these poles, and the right call is some mixture.
The mixed strategy nobody describes
The vendor case-studies and the indie-developer blog posts both present pure positions, but the actual right answer for many organizations is hybrid in a specific way.
The pattern that keeps working in practice: regulated and sensitive data lives in the managed platform. Bulk, non-sensitive, easily-replaceable data lives in the self-hosted layer. A thin orchestration tier routes queries based on what each store contains.
A concrete example. A mid-sized financial-services company has customer transaction data (regulated), customer support ticket text (sensitive but lower exposure), and a knowledge base of public documents like manuals and FAQs (not sensitive at all). The managed platform handles the transactions. A self-hosted RAG handles the public knowledge base, because there's no reason to pay Snowflake to search PDFs that are already on the public internet. The support tickets sit in a private encrypted store with selective replication to the managed platform for queries that need to join across both.
The total cost is significantly lower than putting everything in the managed platform. The compliance posture is the same, because the regulated data is still in the place that's certified to hold it. The engineering effort is real but bounded, because most of the operational work is concentrated on the support-ticket store, and the public RAG is the kind of system a single developer can build in a week.
Most organizations don't do this because it requires a team that's comfortable on both sides of the philosophical divide. Pure-cloud teams don't think to spin up SQLite for the non-sensitive corpus. Pure-self-hosted teams resist using the managed platform even where it's clearly the right tool. The hybrid requires someone who has actually worked both ways and doesn't have an emotional preference.
Why the philosophical framing matters
The reason "luxury car vs. assembled car" is useful as a metaphor — instead of, say, "expensive vs. cheap" or "convenient vs. flexible" — is that it captures the moral content of the choice.
If you buy a BMW because you can afford it and you don't want to spend Saturdays under a car, that's a legitimate, defensible position. Nobody is going to write a blog post about how you should have built your own car. If you spend Saturdays building a track-day Miata because you enjoy the work and the result is better-suited to your specific use case, that's also legitimate and defensible.
What is not defensible is judging people in the other camp for their choice. The BMW driver who looks down on the Miata builder is being a snob; the Miata builder who looks down on the BMW driver is being a different kind of snob. Both are correct about their own car and wrong about the other person's.
The same applies in technology. The instinct in indie-developer circles to dunk on Snowflake-style platforms misses that those platforms are correctly serving enterprises with constraints that indie developers don't share. The instinct in enterprise architecture circles to dismiss self-hosted stacks as "unprofessional" misses that those stacks are correctly serving developers with constraints that enterprises don't share.
Both philosophies are right. The interesting question is which constraints you actually have, and being honest about that is the harder part of the conversation.
What I actually do
For full disclosure: my current stack is heavily on the self-hosted side. I run SQLite for almost everything, a few Streamlit apps on Community Cloud and on my own machines via Cloudflare Tunnel, uv for environment management, local models for embeddings where possible. The variable cost of my infrastructure is close to zero.
This is correct for me because I enjoy infrastructure work, I have the time, my data is not subject to anyone's compliance regime, and my traffic is predictable. None of those would be true if I were running an enterprise's analytics infrastructure, and I would make different choices in that role.
The conclusion is not "self-hosted is better." It's that the question "should I use cloud or self-host?" is almost never a technical question. It's a question about you — your time, your skills, your constraints, your relationship with infrastructure work. Get honest about those, and the right architecture mostly chooses itself.
---
For the practical components of the self-hosted side, see Building a Hybrid RAG in 200 Lines, The uv Era, and Cloudflare Tunnel as the Indie Developer's Public IP. For the strategic case for the managed side, see Why Snowflake's Bet on Streamlit Just Works and Cortex Search vs Hybrid SQLite RAG.