Tuesday · June 2, 2026 · Singapore
NVDA 1,284.30 ▲ 1.42% TSM 248.72 ▲ 0.68% 9988.HK 142.80 ▼ 2.11% BTC 71,420 ▲ 0.84% USD/VND 25,412 ▼ 0.03%
Asia edition · No. 412
DTW
dailytechwire
Tech Intelligence, Wired Daily
DTW Developer Cold Start, Bundle Size, Cost: The Three Levers That Decide Your Serverless Bill
Developer

Cold Start, Bundle Size, Cost: The Three Levers That Decide Your Serverless Bill

Cold start latency, bundle size, and serverless cost are tightly coupled. Here's how the three levers interact and where the real trade-offs sit.

DA
dailytechwire
Published June 2, 2026 4 min read

Serverless promised that you stop paying for idle compute. What it did not advertise is that the same model makes two engineering metrics, cold start latency and bundle size, directly visible on your invoice. The three are linked, and the link is the part most teams miss until a traffic pattern shifts and the bill moves with it.

This piece walks through how the three interact, where the trade-offs actually sit, and what the numbers tend to look like in practice. No fabricated benchmarks here, the figures below are illustrative of the mechanics, not measured claims about a specific provider.

Cold start is a tax on the first request, not a constant

A cold start happens when the runtime has to provision a fresh execution environment: pull the deployment artifact, initialize the runtime, run your top-level module code, then invoke the handler. On warm invocations none of that runs again, so the cold start cost is amortized across however many requests hit the same environment before it gets recycled.

The practical consequence: cold start pain scales inversely with traffic. A function getting steady concurrency rarely shows cold starts in p50. The same function shows them in p99 the moment a burst forces the platform to spin up new environments in parallel. If your SLO is written against p99, cold start is your problem regardless of average load.

Runtime choice matters more than most language flame wars suggest. Interpreted runtimes with heavy initialization (large dependency trees, ORM bootstrapping, connection pool setup at module load) pay the cost on every cold start. AOT-compiled binaries and minimal runtimes shift work to build time, trading a longer CI step for a shorter startup path.

Bundle size is the lever you actually control

You cannot directly set cold start latency. You can set what the platform has to load before your handler runs, and that is largely a function of bundle size.

The artifact has to be fetched and unpacked into the execution environment. A 5 MB bundle and a 150 MB bundle do not initialize at the same speed. The gap is not linear with size, but it is real and it lands squarely in the cold start window.

Three things that move the number:

  • Tree-shaking and dead code elimination. Bundling with a tool that does proper tree-shaking strips unused exports. The catch is that dynamic imports and side-effectful modules defeat it, so the win depends on how your dependencies are authored, not just your config.
  • Dependency discipline. Pulling in a full SDK to call one endpoint is the classic offender. Modular SDK packages, or hand-rolling a thin client for a single API, can cut tens of megabytes.
  • Layers vs. inlining. Shared dependencies in a layer keep your function artifact small, but the layer still has to load. It changes where the weight sits, not whether it exists.

The deployment package limit is a hard ceiling, but the soft ceiling, the point where cold start latency starts hurting your tail percentiles, arrives well before it.

Where the cost shows up

Billing on most function platforms is a product of allocated memory, execution duration, and invocation count. Cold start feeds the duration term directly: the initialization time is wall-clock time the function is alive, and on several pricing models you pay for it.

That creates a feedback loop worth naming explicitly. Bumping allocated memory often raises the CPU allocation proportionally, which can shorten both initialization and execution. So a function that looks expensive at low memory can be cheaper at higher memory because it finishes faster. The only way to know is to measure cost per invocation across memory tiers, not to assume more memory means a bigger bill.

The other cost surface is provisioned concurrency, the mechanism most platforms offer to keep environments warm. It eliminates cold starts for the provisioned count, but you pay for that capacity whether or not it serves traffic. That is the idle compute serverless was supposed to remove, bought back deliberately to protect tail latency. Whether it is worth it is an SLO question, not an architecture one.

The trade-off, stated plainly

There is no setting that minimizes all three at once. Cutting bundle size helps cold start and can lower duration cost, but the engineering effort to slim dependencies has its own price. Provisioned concurrency kills cold start latency and raises baseline cost. Higher memory can cut duration cost while raising the per-millisecond rate.

The teams that handle this well treat it as a measurement problem. They track cold start frequency against traffic shape, watch p95 and p99 rather than averages, and re-run the cost-per-invocation math when memory or dependency changes ship. The teams that get surprised are the ones who set memory once, never look at the bundle after the first deploy, and find out about the coupling when a launch-day traffic spike multiplies their cold start count.

None of these levers is hidden. The coupling between them is what tends to go unmodeled until it shows up on the bill.

DA
dailytechwire

More from Developer

All Developer →