Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Fermi estimates are a terrific technique, and indeed in this case we can see that "tens of thousands" x "five for each user" x 20B saved = 10MB, so that worked out nicely. So yeah, that would have been smart. BUT...

When Fermi estimates fail, they fail in one of two ways. Sometimes they fail by a constant factor, as expected, and that's fine. But sometimes they fail because you forgot one or more important factors entirely, and in that case, they can be arbitrarily wrong. (In fact, it seems pretty likely that the so-called Fermi paradox is an example of this). It's not just 8MB vs 15MB vs 10MB.... it's 10MB vs some chance of totally utterly wrong, like maybe 1GB or something.

So when the test for real data is cheap (as OP says it is in this case, not that I know squit about Redis), it's "probably good to get into the habit" of doing it right, instead of getting a perfectly good napkin dirty.



I know what you're saying but even if you're off by an order of a magnitude, at least it's some sort of starting point. Estimating that your key-space in Redis is between 5MB and 500MB isn't particularly useful but it is a big improvement over having no idea. I don't disagree with you that in most cases, there is no substitute for real world data, especially if you can come across it without much pain, but I maintain you should always start with some sort of estimate, though wrong it may be. It's good to take the time to try to reason out a system or problem, and then seeing if your mental model matches the real world reality.

In this particular example the napkin analysis is so trivial and so accurate and the 'real world' data is so expensive ( linear look up of the entire key-space, followed by data destruction ), that it immediately jumps out at me.


When you forget even one factor, you can be off by many orders of magnitude.

And you're measuring the cost of the real data in totally useless terms. Who gives a shit about linear scans and "data destruction", lolz. What matters is how long it took the OP to figure out what command to write, and write it, and write the cleanup commands. If he already knew how to do it, then it was basically free. If he didn't already know how to do it, then he learned something, which pays for most of the cost of doing it.

And the napkin analysis is so trivial, but you don't EVER know that it's accurate, because there's no error bound on "oh I forgot the important part".


I don't think you know what you are actually arguing. I'm pretty sure I'm not arguing against performing real world tests to diagnose a problem. I'm pretty sure I'm not doing that because I'm not insane. I'm also fairly sure I didn't argue that you have to do one or other but not both. Though I didn't say it outright, I'm pretty sure I'm implied, and I stand by it, that it is good practice to at least do a fast mental (or napkin) estimate when it is warranted. Debugging is a hard process, but it shouldn't be a random one. You should have a mental model of your system, and it is good practice to go through the exercise of defining said mental model and yes, checking your assumptions, when it is warranted.

I'm clearly not a fan of the direction the OP took to verify his assumption that his key sizes have an impact on his Redis DB size. I think there's a better, more accurate way of checking his hypothesis (napkin math). His approach wasn't great and doesn't work for anything other than toy or test deployments, and the results aren't as clear as you may think (see below). I stand by that.

>And the napkin analysis is so trivial, but you don't EVER know that it's accurate, because there's no error bound on "oh I forgot the important part".

We're still talking about the same situation, correct? I suppose if it's pedantry you want, pedantry I can give. Tell me, how does the OP know that his "real world analysis" is correct? Because at some point in time, for some sort of input the system gave him one kind of result? Apparently a simple multiplication ( #of keys x avg. key size) is fraught with errors, but issuing a command (RENAME) against a black-box datastore the OP probably doesn't fully understand provides a clear, unambiguous result? What if Redis caches (in memory or to disk) all original values before issuing a key rename and then clears them out over a period of time, or better yet, doesn't clear them out until it needs the space? So you run your script, check memory usage, and see no difference.... so of course, because we're you, we trust "real-world result" and we live happily ever after ... yes?

Obviously this is a contrived example and OP most likely got the right result but I think I made my point. "Real world results" are full of gotchas and ambiguities (and sometimes require great deal of background knowledge to properly interpret), and in such a case it would be nice to have a mental model of what the expected result is so that it can be either verified or proven wrong (and thereby provide a direction for further investigation).


This isn't a Fermi problem. It's analysis. Math.

How many keys do you have? Not something you should guess.

How many bytes per key are you saving? Not something you should guess.


The 10MB difference can be attributed to fragmentation, so the savings may in fact be zero. In such systems mathematical estimates don't always reflect reality so real testing would be more expedient and appropriate.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: