Caching Is Not Free

Jul 3, 2026 · 10 min read

Cache is one of the most misunderstood components in software architecture.

We all know this story.

You have a critical web application and you notice that the latencies of a few endpoints have gone up in the last few weeks.

You sit down to check what’s happening and you find out that there’s actually nothing wrong. The load has increased organically over time and it now threatens your latency SLAs.

So you schedule a meeting, and you get into a room with your team to see what you need to do.

What’s the first instinct?

Easy! Let’s throw a cache in front. Add Redis in front of your service and all of your problems are solved.

Well, not quite!

Unfortunately, many times the instinct becomes a decision.

It’s not free performance - it’s a tradeoff

We often think of cache as free performance when it’s really just another fundamental design decision.

And as every design decision in distributed systems - guess what - it’s a tradeoff!

When you choose to use a cache, you’re buying low latency but you’re paying with data freshness/staleness and complexity.

And we’re not talking about trivial complexity. Remember the famous cliché of the two hard things in software engineering?

Cache invalidation is just hard.

And on top you now have synchronization edge cases to solve and most important of all: a whole new infrastructure to maintain.

Where it bites

The thing with cache is that it’s very easy to include it in your architecture, and this often introduces side effects that in the long-term can harm your system instead of helping it.

The Masking Problem

Many times we unconsciously use the cache as a bandaid fix when the database performance degrades. Been there, done that.

The reality though is that many times the problem stems from poor engineering practices.

I have seen many cases with N+1 queries, missing indices or simply badly written code that could be optimized simply and clearly and would solve the latency issue without further action.

Note here that the issue I’m focusing on is not the cover-up of the real problem per se. It’s the ignorance.

As long as it’s a conscious decision, the bandaid can be the correct solution. Time-pressure or the high fix complexity can make a good case. Remember, it’s all tradeoffs.

If it’s just another tech debt decision, all good! Think about it, document it, and tackle it later.

By adding a cache to cover up these inefficiencies without a proper analysis, you’re not fixing the root cause - you’re just masking it.

It’s not only a waste of resources and an increase in maintenance and complexity - these issues stack up and given enough time and evolution of the service, these will come back and knock on your door - hopefully not at 2am.

High cardinality data & the read-heavy thinking trap

We often think that cache pretty much works for all read flows.

But if your service has highly variable data without a hot path and the requests are not similar, then caching just makes things worse.

Hit rates drop, you’re doing evictions constantly and you’re in an infinite miss → load → evict cycle which ends up increasing latency and utilizing more resources.

High-cardinality data is one way of dropping your hit rates, but in the past I’ve walked into another case.

A long, long time ago, I was assigned to design a user preferences service that had a non-negotiable: return the requested user preferences fast.

The service was simple. A user logged into the app and if they had already stored preferences, they would show up to modify their experience. If they didn’t, users could save their preferences and these would be available in their next interactions.

How did I ensure that my service had fast reads? Cache of course!

I created the backend service that stored the data in a SQL database and threw a Redis in front to return the data blazingly fast. I even confirmed the promise of caching. Retrieval from the main SQL storage required ~100ms while Redis returned it in 2ms. 2!

Guess what! The hit/miss ratio was ridiculous.

Why? A re-read wasn’t needed in the normal flow. A read was only needed on log-in and after storing new preferences. If I had to estimate, the ratio of read/write would be ~10000:1.

Interestingly the project was a huge success but this was not because of the caching. It was successful despite adding the cache.

The takeaway? Even if your app is read-heavy, and even if you don’t have high-cardinality data, if your access patterns don’t have hot paths, a cache is useless.

When a latency cache quietly becomes a capacity cache

One important distinction we often neglect is that not all caching is the same. Caches come in different types¹. The main ones are:

Latency - boosts the latency of your application
Capacity - increases your load capacity. Without it you can’t support an increased load.

There’s a framing I really like from The Coder Cafe² about the latency and capacity access patterns.

The unsettling part is that the code is identical in both cases. The difference only becomes visible at failure time. — The Coder Cafe

And that’s exactly where I’ve seen many teams getting in trouble.

You introduce caching to boost the latency metrics. Over time, the service logic expands and the data evolves. Then one day, your cache is down, and all your requests are now hitting the database. The database can’t keep up and your system just can’t serve anything.

What was once a nice to have feature is now a critical component of your system and a single point of failure.

In my experience, this is a pretty common theme. I’d even say that given enough time and complexity evolution a service with a cache tends to transform to a capacity cache over time.

What causes this

The problem here is that it’s not always a specific reason that causes this transition.

The typical cause is the gradual growth of traffic. Another common reason is the gradual data storage growth that makes the data queries more time and resource demanding - the database tables increase, the queries become less efficient and, boom, now it’s a capacity cache!

A tricky experience with data structure drift

The worst cause of latency-to-capacity drift I’ve experienced is a combination of cached data structure evolution and cached-items size growth.

I once worked on a system with this exact shape: a course listing service. Domain details changed, but the architectural pattern was the same. It served one of the highest criticality paths of the application: return the available courses to users so they can enroll.

Latency boosting was a must from the beginning. Not only to protect the SLA but to serve the information as soon as possible. So naturally, caching was engraved into the design spec of the service. It wasn’t an afterthought.

In the first version, that service served small training providers. The cache was designed to hold the full list of courses with their details for each provider so it was returned quickly as-is in every request.

Over time however, the business expanded to support large universities. From 100s of courses, now we had 1000s of courses for a single provider. The first entries of universities worked fine. No hiccups. Slightly elevated read queries, but nothing catastrophic.

Gradually, the number of universities increased. On top of that, each course now needed more attributes in the response. The cache entries had expanded in both the number of courses and in the amount of data stored per course. Suddenly the cache size mattered more. The evictions became more frequent. The memory of the service started becoming an issue.

And one day, we realized that these few seemingly innocent spikes in 5xx errors had a pattern: They were always present after deployments with a cold cache and they happened more and more. We had a problem. The service could no longer serve requests without a cache. The side effects and the nature of the data had drifted so much that it changed the whole system - we had a capacity cache.

This particular case is an example of multiple small decisions that slowly made the caching more and more needed. It is a much harder problem to solve and requires decoupling and big refactorings to deal with if you want to return to a latency cache.

When and how do you deal with cache

I think it’s now clear that cache is not free. So, how do you approach the tradeoff and how do you manage cache?

When to use a cache

Do you really need a cache? Before you even start looking into the cache strategy and tuning, you need to deeply understand the nature of the problem.

The first thing is to look for underlying performance inhibitors. Are your queries optimized? Do you have the correct indices? Do you have any unnecessary weird computation loops? Check these first.

Once you’ve gone past the first step and your latency is still high, you have two prerequisites:

You’re dealing with read-heavy hot paths.
- We’re talking low cardinality data that indeed get read a lot and frequently
You have an acceptable explicit data staleness tolerance
- If your business demands absolute consistency without any delays, caching is just unacceptable. The tolerance? Depends on your use-case and it’s business driven.

Introducing a cache

Ok, you’ve decided that caching is the solution for you. What do you need to have in place before you deploy a cache?

Simple, but not easy. You need

A solid invalidation strategy that includes an explicit TTL, an eviction policy, and custom triggers
Monitoring.

I’ve seen two main cases making invalidation highly complex and difficult to manage: Unexpected side-effects and unintended frequent invalidations.

In-flight requests with cached items might become stale. If they touch critical flows, your once-robust use-case suddenly needs to be aware of these cases and introduce concurrency guardrails. Another problem is that aggressive invalidation might end up forcing expensive reloading.

These issues can be captured by good monitoring. Hit/miss rates are absolute musts to gain visibility on how the cache is behaving and fine-tune it accordingly.

The thing I always keep in mind when introducing a cache is that you’ll never get it right the first time. Start conservatively favoring freshness over performance, observe and fine-tune it accordingly.

Operating a cache

And finally you have your cache up and running. How do you operate a cache?

The major thing that you need to be aware of is what type of cache you’re dealing with and act accordingly.

Do you have a latency booster? Do regular load tests to ensure it’s still a latency booster and not a capacity cache. You need to handle failures of your cache gracefully in this case. It’s ok if your service is slower as long as it survives.

If you have a capacity cache you need to treat it as critical infrastructure. Work on SLAs, alerts, safety nets and everything needed to ensure that cache stays up and keeps your system up and running.

Conclusion

Caching is a powerful architecture tool, but it’s not free. It’s a tradeoff you have to make and understand what you’re paying with.

Next time you’re in a room with your team tackling high latency, ask yourself:

Do I really need a cache?
What’s my invalidation strategy?
Is my monitoring in place?
Is it a latency or a capacity cache?

Treat it accordingly.

Tags: Software Architecture