Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The final pro/cons list: https://github.com/npm/rfcs/pull/595#issuecomment-1200480148

I don't find the cons all that compelling to be honest, or at least I think they warrant further discussion to see if there are workarounds (e.g. a choice of compression scheme for a library like typescript, if they would prefer faster publishes).

It would have been interesting to see what eventually played out if the author hadn't closed the RFC themselves. It could have been the sort of thing that eventually happens after 2 years, but then quietly makes everybody's lives better.



"I don't find the cons all that compelling to be honest"

This is a solid example of how things change at scale. Concerns I wouldn't even think about for my personal website become things I need to think about for the download site being hit by 50,000 of my customers become big deals when operating at the scale of npm.

You'll find those arguments the pointless nitpicking of entrenched interests who just don't want to make any changes, until you experience your very own "oh man, I really thought this change was perfectly safe and now my entire customer base is trashed" moment, and then suddenly things like "hey, we need to consider how this affects old signatures and the speed of decompression and just generally whether this is worth the non-zero risks for what are in the end not really that substantial benefits".

I do not say this as the wise Zen guru sitting cross-legged and meditating from a position of being above it all; I say it looking at my own battle scars from the Perfectly Safe things I've pushed out to my customer base, only to discover some tiny little nit caused me trouble. Fortunately I haven't caused any true catastrophes, but that's as much luck as skill.

Attaining the proper balance between moving forward even though it incurs risk and just not changing things that are working is the hardest part of being a software maintainer, because both extremes are definitely bad. Everyone tends to start out in the former situation, but then when they are inevitably bitten it is important not to overcorrect into terrified fear of ever changing anything.


> This is a solid example of how things change at scale.

5% is 5% at any scale.


Yes and no. If I'm paying $5 a month for storage, I probably don't care about saving 5% of my storage costs. If I'm paying $50,000/month in storage costs, 5% savings is a lot more worthwhile to pursue


Doesn't npm belong to Microsoft? It must be hosted in Azure which they own so they must be paying a rock bottom rate for storage, bandwidth, everything.


It's probably less about MS and more about the people downloading the packages


For them it is 5% of something tiny.


Maybe, maybe not. If you are on a bandwidth limited connection and you have a bunch of NPM packages to install, 5% of an hour is a few minutes saved. It's likely more than that because long-transfers often need to be restarted.


A properly working cache and download manager that supports resume goes a long way.

I could never get Docker to work on my ADSL when it was 2 Mbps (FTTN got it up to 20) though it was fine in the Montreal office which had gigabit.


The amount of modules my docker hosts download from npm is anything but tiny.


5% off your next lunch and 5% off your next car are very much not the same thing.


Those lunches could add up to something significant over time. If you're paying $10 per lunch for 10 years, that's $36,500 which is pretty comparable to the cost of a car.


Which is, then, supporting the fact that scale matter, isn't it?

Here the scale of time is larger and does make the 5$ significant, while it isn't at the scale of a few days.


So what, instead of 50k for a car you spend 47.5k?

If that moves the needle on your ability to purchase the car, you probably shouldn't be buying it.

5% is 5%.


If it takes 1 hour of effort to save 5%:

- Doing 1 hour of effort to save 5% on your $20 lunch is foolhardy for most people. $1/hr is well below US minimum wage. - Doing 1 hour of effort to save 5% on your $50k car is wise. $2500/hr is well above what most people are making at work.

It's not about whether the $2500 affects my ability to buy the car. It's about whether the time it takes me to save that 5% ends up being worthwhile to me given the actual amount saved.

The question is really "given the person-hours it takes to apply the savings, and the real value of the savings, is the savings worth the person-hours spent?"


This is something we often do in our house. We talk about things in terms of hours worked rather than price. I think more people should do it.


By that logic I waste time reading books instead of paying someone else to read them for me.


Paying somebody else to read the book means you don't get the benefit of the book.

Also, this is exactly what you company is doing, paying you to "read the book" so they don't have to.


If you can get the exact same result for less cost (time and money), why not? Things like enjoyment don't factor in since they can't be directly converted into money.


Why do so many people take illustrative examples literally?

I'm sure you can use your imagination to substitute "lunch" and "car" with other examples where the absolute change makes a difference despite the percent change being the same.

Even taking it literally... The 5% might not tip the scale of whether or not I can purchase the car, but I'll spend a few hours of my time comparing prices at different dealers to save $2500. Most people would consider it dumb if you didn't shop around when making a large purchase.

On the other hand, I'm not going to spend a few hours of my time at lunch so that I can save an extra $1 on a meal.


I wouldn't pick 5¢ up off the ground but I would certainly pick up $2500.


You'd keep 5c. A significant number of people who find sums up around $2500 give it back unconditionally, with no expectation of reward. Whoever lost $2500 is having a really bad day.


5% of newly published packages, with a potentially serious degradation to package publish times for those who have to do that step.

Given his numbers, let's say he saves 100Tb of bandwidth over a year. At AWS egress pricing... that's $5,000 total saved.

And arguably - NPM is getting at least some of that savings by adding CPU costs to publishers at package time.

Feels like... not enough to warrant a risky ecosystem change to me.


https://www.reddit.com/r/webdev/comments/1ff3ps5/these_5000_...

NPM uses at least 5 petabytes per week. 5% of that is 250 terabytes.

So $15,000 a week, or $780,000 a year in savings could’ve been gained.


In a great example of the Pareto Principle (80/20), or actually even more extreme, let's only apply this Zopfli optimization if the package download total is equal or more than 1GiB (from the Weekly Traffic in GiB column of the Top 5000 Weekly by Traffic tab of the Google Sheets file from the reddit post).

For reference, total bandwidth used by all 5000 packages is 4_752_397 GiB.

Packages >= 1GiB bandwidth/week - That turns out to be 437 packages (there's a header row, so it's rows 2-438) which uses 4_205_510 GiB.

So 88% of the top 5000 bandwidth is consumed by downloading the top 8.7% (437) packages.

5% is about 210 TiB.

Limiting to the top 100 packages by bandwidth results in 3_217_584 GiB, which is 68% of total bandwidth used by 2% of the total packages.

5% is about 161 TiB.


Packages with >= 20GiB bandwidth == 47 packages totaling 2,536,902.81 GiB/week.

Less than 1% of top 5000 packages took 53% of the bandwidth.

5% would be about 127 TiB (rounded up).


How often are individuals publishing to NPM? Once a day at most, more typically once a week or month? A few dozen seconds of one person's day every month isn't a terrible trade-off.

Even that's addressable though if there's motivation, since something like transcoding server side during publication just for popular packages would probably get 80% of the benefit with no client-side increase in publication time.


In some scenarios the equation flips, and the enterprise is looking for _more_ scale.

The more bandwidth that Cloudflare needs, the more leverage they have at the peering table. As GitHub's largest repo (the @types / DefinitelyTyped repo owned by Microsoft) gets larger, the more experience the owner of GitHub (also Microsoft) gets in hosting the world's largest git repos.

I would say this qualifies as one of those cases, as npmjs is hosted on Azure. The more resources that NPM needs, the more Microsoft can build towards parity with AWS's footprint.


That's right, and 5% of a very small number is a very small number. 5% of a very big number is a big number.


Do you even know how absolute numbers work vis-à-vis percentages?


I agree with everything you said, but it doesn’t contradict my point


I'm saying you probably don't find them compelling because from your point of view, the problems don't look important to you. They don't from my point of view either. But my point of view is the wrong point of view. From their point of view this would be plenty to make me think twice and several times over past that from changing something so deeply fundamental to the system for what is a benefit that nobody who is actually paying the price for the package size seems to be particularly enthusiastic about. If the people paying the bandwidth bill aren't even that excited about a 5% reduction, then the cost/benefits analysis tips over into essentially "zero benefit, non-zero cost", and that's not very compelling.


The problems look important but underexplored


Or you're not understanding how he meant it: there are countless ways to roll out such changes, a hard change is likely a very bad idea as you've correctly pointed out.

But it is possible to do it more gradually, I.e. by sneaking it in with a new API that's used by new npm version or similar.

But it was his choice to make, and it's fine that he didn't feel enough value in pursuing such a tiny file size change


The pros aren't all that compelling either. The npm repo is the only group that this would really be remotely significant for, and there seemed to be no interest. So it doesn't take much of a con to nix a solution to a non-problem.


Every single download, until the end of time is affected: It speeds up the servers, speeds up the updates, saves disk space on the update servers, and saves on bandwidth costs and usage.

Everyone benefits, the only cost is a ultra microscopic time on the front end, and a tiny cost on the client end, and for a very significant number of users, time and money saved. The examples of compression here...


Plus a few years of a compression expert writing a JS implementation of what was likely some very cursed C. And someone auditing its security. And someone maintaining it.


I feel massively increasing publish time is a valid reason not to push this though considering such small gains and who the gains apply to.


I agree, going from 1 second to 2.5 minutes is a huge negative change, in my opinion. I know publishing a package isn't something you do 10x a day but it's probably a big enough change that, were I doing it, I'd think the publish process is hanging and keep retrying it.


If you’re working on the build process itself, you’ll notice it a lot!


Since it's backwards compatible, individual maintainers could enable it in their own pipeline if they don't have issues with the slowdown. It sounds like it could be a single flag in the publish command.


Probably not worth the added complexity, but in theory, the package could be published immediately with the existing compression and then in the background, replaced with the Zopfli-compressed version.


> Probably not worth the added complexity, but in theory, the package could be published immediately with the existing compression and then in the background, replaced with the Zopfli-compressed version.

Checksum matters aside, wouldn't that turn the 5% bandwidth savings into an almost double bandwidth increase though? IMHO, considering the complexity to even make it a build time option, the author made the right call.


No, it can't because the checksums won't match.


I don't think that's actually a problem, but it would require continuing to host both versions (at distinct URLs) for any users who may have installed the package before the Zopfli-compressed version completed. Although I think you could also get around this by tracking whether the newly-released package was ever served by the API. If not, which is probably the common case, the old gzip-compressed version could be deleted.


Wouldn't that result in a different checksum for package-lock.json?


I felt the same. The proposal wasn't rejected! Also, performance gains go beyond user stories - e.g. they reduce infra costs and environmental impact - so I think the main concerns of the maintainers could have been addressed.


> The proposal wasn't rejected!

They soft-rejected by requiring more validation than was reasonable. I see this all the time. "But did you consider <extremely unlikely issue>? Please go and run more tests."

It's pretty clear that the people making the decision didn't actually care about the bandwidth savings, otherwise they would have put the work in themselves to do this, e.g. by requiring Zopfli for popular packages. I doubt Microsoft cares if it takes an extra 2 minutes to publish Typescript.

Kind of a wild decision considering NPM uses 4.5 PB of traffic per week. 5% of that is 225 TB/week, which according to my brief checks costs around $10k/week!

I guess this is a "not my money" problem fundamentally.


This doesn't seem quite correct to me. They weren't asking for "more validation than was reasonable". They were asking for literally any proof that users would benefit from the proposal. That seems like an entirely reasonable thing to ask before changing the way every single NPM package gets published, ever.

I do agree that 10k/week is non-negligible. Perhaps that means the people responsible for the 10k weren't in the room?


> which according to my brief checks costs around $10k/week

That's the market price though, for Microsoft its a tiny fraction of that.


Or another way to look at it is it's just (at most!) 5% off an already large bill, and it might cost more than that elsewhere.

And I can buy 225 TB of bandwidth for less than $2k, I assume Microsoft can get better than some HN idiot buying Linode.


> And I can buy 225 TB of bandwidth for less than $2k

Even so, $2k a week is at least one competent FTE.


massively increase the open source github actions bill for runners running longer (compute is generally more expensive) to publish for a small decrease in network traffic (bandwidth is cheap at scale)?


> I don't find the cons all that compelling to be honest

I found it reasonable.

The 5% improvement was balanced against the cons of increased cli complexity, lack of native JS zopfli implementation, and slower compression .. and 5% just wasn't worth it at the moment - and I agree.

>or at least I think they warrant further discussion

I think that was the final statement.


Yes, but there’s a difference between “this warrants further discussion” and “this warrants further discussion and I’m closing the RFC”. The latter all but guarantees that no further discussion will take place.


No it doesn't. It only does that if you think discussion around future improvements belongs in RFCs.


Where DOES it belong, if not there?


> I don't find the cons all that compelling to be honest, or at least I think they warrant further discussion

It needs a novel JS port of a C compresison library, which will be wired into a heavily-used and public-facing toolchain, and is something that will ruin a significant number of peoples' days if it breaks.

For me, that kind of ask needs a compelling use case from the start.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: