Inside Streamlit's Re-Run Model — Why Hot Reload Feels Instant

The first time you save a Streamlit file in your editor and watch the browser update before your hand leaves the keyboard, you assume it is some clever diffing magic. It is not. The mechanism underneath is closer to a confession than an algorithm: Streamlit just re-runs your entire Python script, top to bottom, every time anything changes.

Once you understand that, the whole framework stops feeling magical and starts feeling honest. This post is about why that decision was the right one, what makes it fast, and the small handful of concepts you actually need to internalize to use it well.

If you want the strategic context for why this matters — Snowflake's acquisition, Cortex Search, the Community Cloud economics — that's in the companion piece Why Snowflake's Bet on Streamlit Just Works. This article is the engineering deep-dive.

What "hot reload" usually costs

In a normal Python web stack — FastAPI plus uvicorn, Flask plus gunicorn, Django plus anything — the server process is long-running. It holds the routes, the app state, the database connection pool, the imported modules. When you change a file, the dev server has to:

1. Detect the file change.
2. Tear down the existing process (or at least invalidate its module cache).
3. Re-import everything from scratch.
4. Rebind routes and middleware.
5. Open new sockets and resume accepting connections.

The fastest dev servers in this style — uvicorn with --reload, Flask's debugger, Django's runserver — manage this in a few seconds on a small app and noticeably longer on a real one. You save a file, you tab over to the browser, you refresh, you wait. The loop is one or two seconds, which sounds fine on paper but turns "tweak the padding" into a five-minute task.

The cost is structural. As long as there is a server process to restart, restart time has a floor.

What Streamlit does instead

Streamlit's mental model removes the server-process-as-stateful-thing entirely. The server is still there — it accepts HTTP, it serves WebSocket frames — but your app code is not living inside it across requests. Your script is a script. It runs from the first line to the last line, draws the UI as a side effect, and exits.

When something changes — you saved the file, the user clicked a button, a slider moved — the runner just runs your script again. From scratch. Top to bottom. As if you had typed python app.py at a terminal.

import streamlit as st

st.title("A small demo")
n = st.slider("Pick a number", 1, 100, 50)
st.write(f"You picked {n}.")

When the slider moves, the entire file executes again. st.title runs again. st.slider runs again (and returns the new value). st.write runs again. The browser sees the new state.

The reason this is fast is that there is no restart cost. A Python script of a few hundred lines takes single-digit milliseconds to execute if you've avoided heavy work at module scope. The runner is just calling your function in a loop and shipping the resulting UI tree over a socket.

The WebSocket pipe

The other half of the trick is on the wire. A traditional web app communicates with the browser through HTTP request/response cycles — the browser asks for a page, the server returns one, repeat. Hot reload in that world means the browser has to either poll or re-request.

Streamlit holds a persistent WebSocket connection between the browser tab and the server for the entire session. The server runs your script, builds the UI tree, diffs it against what the browser is currently showing, and pushes only the changed nodes through the socket. No page refresh, no F5, no re-fetch.

This is what closes the loop between "I saved a file" and "the screen updated." A file system watcher inside the Streamlit dev server picks up the change, triggers a re-run of your script, the new UI tree gets diffed against the last one, and the delta lands in the browser through the open socket — all within the time it takes you to glance at the browser window.

In production on Streamlit Community Cloud or Streamlit in Snowflake, the file watcher is gone (the code isn't changing), but the rest of the machinery is identical. Every user interaction triggers a script re-run, and the WebSocket pushes the diff back.

The three concepts you actually have to learn

The re-run model has one obvious problem. If your script runs from scratch every time, how do you keep anything across reruns? How do you avoid re-loading a 4 GB model on every click?

Streamlit's answer is three explicit escape hatches. That is the entire API surface for state and persistence. Learn these three, and you can build almost anything.

1. `st.session_state` for per-session memory

A dictionary scoped to the current browser session. Survives reruns. Does not survive the user closing the tab or the app sleeping.

import streamlit as st

if "count" not in st.session_state:
    st.session_state.count = 0

if st.button("Increment"):
    st.session_state.count += 1

st.write(f"Clicked {st.session_state.count} times.")

Without session_state, that counter would reset to zero on every click, because the script re-runs from scratch and count = 0 would execute again.

2. `@st.cache_data` for expensive data

Decorator for pure-ish functions that return data. Streamlit hashes the arguments, executes the function once, and returns the cached result on subsequent calls with the same inputs. Cache survives reruns and (optionally) reboots.

import streamlit as st
import pandas as pd

@st.cache_data(ttl=3600)  # cache for an hour
def load_sales():
    return pd.read_parquet("sales.parquet")

df = load_sales()
st.dataframe(df)

Without the decorator, that Parquet file would be read from disk on every script re-run — every slider move, every button click.

3. `@st.cache_resource` for expensive objects

Same idea as cache_data, but for things you do not want serialized — database connections, ML models, anything where the object identity matters or where pickling would be wasteful.

import streamlit as st
from sentence_transformers import SentenceTransformer

@st.cache_resource
def get_model():
    return SentenceTransformer("intfloat/multilingual-e5-base")

model = get_model()

Without this, your 400 MB embedding model would be re-loaded into VRAM on every interaction. With it, the model lives for the lifetime of the server process and is shared across all sessions.

That is the entire mental model. session_state for "remember this for this user." cache_data for "remember this value." cache_resource for "remember this object." Compared to learning the React component lifecycle or FastAPI's dependency injection system, this is genuinely a few hours of reading.

Where the re-run model gets in your way

It is not free, and pretending otherwise would be dishonest.

Side effects at module scope are dangerous. If you write requests.get(...) at the top level of your script, that HTTP call fires on every re-run. Wrap anything I/O in @st.cache_data or a function called conditionally.

Long-running operations block the UI. A re-run is synchronous from the user's point of view. If a click triggers a function that takes ten seconds, the UI freezes for ten seconds. Stream output, show progress with st.status, or push the work to a background process.

Mutable globals do not behave the way you expect. If you mutate a module-level list inside your script, that mutation will or will not be visible on the next re-run depending on whether Python's module cache is reused. Use session_state for anything that needs to mutate.

Forms exist for a reason. Without st.form, every widget interaction triggers an immediate re-run. For multi-field inputs where you want one submission, wrap them in a form so the re-run fires only on submit.

None of these are deal-breakers, but they are real, and they reward writing your Streamlit code more like a pure function than like a stateful class.

Why the design holds up

The re-run model is the architectural decision that most defines Streamlit. It is also the one most likely to make a senior backend engineer wince on first contact. "You re-run the whole script every time? That's absurd."

It works because two things are true simultaneously. Python is fast enough at executing a few hundred lines that re-running in milliseconds is achievable. And the three escape hatches — session, data cache, resource cache — give you the exit valves you actually need without inventing a state-management framework.

The result is that the simple case is genuinely simple — five lines of Python and you have a working app — and the complex case is still tractable, you just have to be honest about where your state lives.

For a UI library aimed at data and ML practitioners who do not want to learn web frameworks, this is the right trade. The fact that it also produces the fastest "save file, see change" loop in Python is a free side effect of the architecture, and it is the thing that keeps the framework feeling lightweight even as the apps you build on it grow.

---

If you want to see this architecture stitched together with Snowflake's strategic bet and Community Cloud's deployment story, the hub article is Why Snowflake's Bet on Streamlit Just Works — And Where Solo Builders Still Win.