Cortex Search vs Hybrid SQLite RAG — A Cost and Latency Teardown

There are two competent ways to build a retrieval system in 2026, and they sit at opposite ends of the build-versus-buy spectrum.

On one end, Snowflake's Cortex Search wraps the entire RAG pipeline — chunking, embedding, indexing, incremental sync — into a single SQL statement. On the other, a 200-line Python script using SQLite's FTS5 module and the sqlite-vec extension can deliver hybrid keyword-plus-semantic retrieval on a laptop, with no servers and no monthly bill.

The marketing case for Cortex is well-rehearsed. The marketing case for SQLite is almost nonexistent because nobody is paid to make it. This post is the teardown — what each one charges for, where the latency actually lives, and the decision criteria for picking one over the other.

For the broader strategic context on why Snowflake's RAG offering exists at all, see the hub piece Why Snowflake's Bet on Streamlit Just Works. This article is the head-to-head.

What Cortex Search actually charges for

The headline pitch is that you create a search service in one statement:

CREATE OR REPLACE CORTEX SEARCH SERVICE my_rag_service
  ON search_text_column
  ATTRIBUTES product_category
  WAREHOUSE = my_warehouse
  TARGET_LAG = '1 hour'
  AS SELECT * FROM my_table;

That is genuinely the entire pipeline. But "no pipeline" does not mean "no cost." Cortex Search bills along several axes that are worth understanding before you commit:

Indexing compute. When you create the service, Snowflake reads the source table, chunks the text, generates embeddings, and builds the index. That work runs on the warehouse you specified. For a corpus of a few hundred thousand documents, this is a one-time charge in the single-digit-credits range. For tens of millions of documents, it is meaningfully more.

Refresh compute, driven by TARGET_LAG. This is the parameter most people underestimate. TARGET_LAG = '1 hour' means Snowflake will wake the warehouse at least once an hour to check for new or changed rows and update the index. Set it to '1 minute' and you are paying for sixty refreshes an hour, even if no data changed. The warehouse auto-suspends, but the "wake, check, sleep" pattern adds up on chatty data.

Serving compute. Each query against the service consumes warehouse seconds. The warehouse needs to be running (or to wake on demand, which adds latency) to serve queries.

Storage. The embeddings and index live on Snowflake storage. For a 500 GB document corpus, the index can add another 50–100 GB depending on chunking and embedding dimensionality.

Egress, if you serve from outside Snowflake. Calling the service from a Streamlit app running in Snowflake is free. Calling it from a FastAPI service running in a different cloud means egress fees on the responses.

None of these are surprises if you read the docs. But the practical effect is that "build a RAG over your warehouse" is a real monthly bill, and that bill scales with how fresh you want the index and how often you query it.

What the SQLite stack charges for

Zero, structurally.

The components — SQLite, the FTS5 module, sqlite-vec, Python — are all free and open-source. They run on whatever hardware you already have. A laptop is fine. A $5 VPS is fine. A Raspberry Pi handles a million-row corpus comfortably.

What you pay instead is:

Engineering time, up front. Setting up the schema, writing the chunking logic, picking an embedding model, building the indexing script, writing the hybrid query, tuning the BM25 parameters. This is a one-time cost measured in days, not months.

Engineering time, ongoing. When the corpus grows, when you change embedding models, when you want to add metadata filters, you write the code yourself. Snowflake does this for you.

Embedding API calls, if you don't self-host the model. OpenAI's text-embedding-3-small runs about $0.02 per million tokens. A million 512-token documents embeds for around $10 one-time, plus pennies per month on queries. Self-host a small embedding model and even that goes to zero.

The economic shape is the inverse of Cortex's. Cortex is "low fixed cost, real variable cost." Self-host is "real fixed cost, near-zero variable cost." Where the lines cross depends entirely on how much traffic you have and how much engineering bandwidth you can spare.

Latency, measured honestly

For a corpus of about a million chunks, here is what each stack looks like on the wire:

| Layer | Cortex Search | SQLite + FTS5 + sqlite-vec |
| --- | --- | --- |
| Network hop to retrieval | 30–80 ms (cloud round-trip) | 0 ms (in-process) |
| Warehouse wake (if cold) | 1–5 seconds | n/a |
| Keyword retrieval | ~50 ms | 1–5 ms |
| Vector retrieval | ~50 ms | 5–20 ms |
| Rerank / fusion | bundled | 1–2 ms (RRF in SQL) |
| Warm-path total | ~100–150 ms | ~10–30 ms |
| Cold-path total | ~1–5 seconds | unchanged |

The warm-path comparison is closer than people assume. Cortex is genuinely fast once the warehouse is hot. The cold-path comparison is where it bites — if your traffic is sparse enough that the warehouse keeps suspending, your users see multi-second waits on the first query of a session. You can pay to keep the warehouse warm, which puts you back in the variable-cost discussion.

The SQLite path has no cold start because there is no process to wake — the database is just a file, opened on demand. For low-traffic or latency-sensitive applications, this is a real advantage and not a marginal one.

Retrieval quality, measured carefully

This is where the conversation usually goes sideways. Vendor RAGs are assumed to be more accurate than handcrafted ones, mostly because of brand effect. The reality is more interesting.

Cortex Search uses a hybrid retrieval approach internally — BM25-style keyword search combined with vector similarity, with a reranker on top. The reranker is the part you cannot replicate trivially at home; it is a learned model and Snowflake does not expose its weights.

A well-built SQLite hybrid (BM25 trigram via FTS5 + dense embeddings via sqlite-vec, combined with Reciprocal Rank Fusion at k=60) reaches around 90–95% of Cortex-style quality on typical retrieval benchmarks. The gap is the reranker. For a lot of use cases — document Q&A, internal search, even most customer-facing applications — that gap is not the bottleneck. The LLM generating the final answer dominates the quality signal.

If you genuinely need reranker-level precision, you can bolt on a cross-encoder reranker yourself (bge-reranker-v2-m3, runs locally on CPU, free) and close most of the remaining gap. It costs you 50 ms of latency per query.

Governance is where Cortex earns its keep

The argument for Cortex that does not dissolve under engineering scrutiny is governance.

If your corpus is medical records, financial filings, or anything else subject to data residency law, the question "can your retrieval system promise that no document text or embedding ever leaves the warehouse boundary" has a one-word answer in Cortex (yes) and a long, careful answer in any self-hosted setup. Snowflake's RBAC, masking policies, and audit logs apply to Cortex retrieval automatically. The data does not move; the search lives next to the storage.

Replicating that property in a self-hosted stack is possible — you can run SQLite plus your embedding model entirely inside your own private network — but you are now the auditor, the access-control implementer, and the compliance team. For a regulated enterprise, that is not engineering effort, it is regulatory risk. For a solo builder or a non-regulated startup, it is just the normal cost of running your own infrastructure.

When to pick which

The decision is more about your constraints than about technical superiority.

Pick Cortex Search when: your data already lives in Snowflake; your compliance posture requires data residency; you have warehouse credits to spend and not enough engineers to spend; query volume is steady enough that warehouse cost is predictable; you want SQL to be the only language anyone touches.

Pick the SQLite hybrid when: you own the hardware and the data; your traffic is bursty or low; latency matters more than absolute quality; you have at least one engineer who is comfortable with the stack; the marginal cost of a query needs to be zero.

Pick a mix when: the bulk of your corpus is non-sensitive and goes on SQLite for cost; a smaller regulated subset goes in Snowflake for governance; a thin orchestration layer routes queries based on what each corpus contains. This is the actual answer for most mid-sized companies once they look closely.

Where the conversation usually ends

Most vendor-versus-self-host comparisons end with "it depends," which is true but useless. The more honest version is that Cortex Search is a specific, well-designed product solving a specific, expensive problem (the last-mile RAG for an enterprise warehouse), and the SQLite hybrid is a specific, well-tested set of components solving a specific, different problem (cheap retrieval over data you already own).

They are not really competitors. They are answers to different questions, and the cost-and-latency numbers above are mostly useful for figuring out which question you are actually asking.

---

For the implementation walkthrough of the SQLite stack itself, see Building a Hybrid RAG in 200 Lines. For the strategic context on why Cortex exists at all, see Why Snowflake's Bet on Streamlit Just Works.