While the Java ecosystem tends to get a lot of criticism, this is one thing they got right: fetching a package name from a public repository involves prefixing the name with the reverse-DNS name of the uploader (the "group ID" in Java/maven parlance). Uploaders can only submit packages after they've proven they have control over the domain.
It's really unfortunate that even newer package management ecosystems like Rust's haven't learned that lesson. Yes, it increases initial setup friction, but dependency confusion and name squatting issues more or less just go away when you do that. Sure, you might run into issues where people register domains for common misspellings of popular domain names, but that's already a problem with website resolution, and there are ways to mitigate that.
The problem described in the post isn't the fault of package naming schemes, it's due to `pip install --extra-index-url` apparently ignoring the user-specified registry if the package exists on PyPI, which, if true, is baffling, because it throws out the concept of private registries altogether in favor of treating PyPI as some sort of universal namespace.
We ran into a pretty serious vulnerability around editable installs last year as well. Nearly the same story as here - basically an editable install hit an edge case and went to pypi instead when you wouldn’t expect it to. Python has not thought this problem through nearly as much as the Java people have.
> The problem described in the post isn't the fault of package naming schemes
If PyPI used Java-like scheme with domain ownership verification, Google would presumably use com.google.* namespace for their private stuff and it wouldn’t be possible to introduce malicious package this way.
So the described issue arguably is a consequence of that.
That Github issue is insane. So frustrating that an obviously wrong behavior is continued even after exploits in the wild. Count me as one of those idiot users who was confused by the current behavior.
I challenge you to audit all dependencies and, particularly, the sub-dependencies of those. Most people, in my experience, look at a requirements.txt or package.json and think that’s everything but your dependency tree is going to be easily 3-4x that size.
Node.js is particularly notorious for this. It’s a big challenge for having non-flat dependency management and there’s no great solution for this problem.
Well, I don't use Node.js and I'm careful about cargo dependencies, so I actually can do this with every project I have worked on, and I value the ability to do it. It has led to a lot more sleep, as I understand it, than people with critical projects that rely on npm get. Cargo doesn't exactly have the all of the problems that npm has, but it is already kind of time consuming to audit your dependencies in Rust.
Especially since a few users slurped up all of the interesting package names and refuse to give them up, despite many people complaining on GitHub. It's one of the huge warts of the cargo ecosystem it seems nobody on the rust team is willing to fix.
I think all packages should be namespaced per user. The language's core team could promote widely used packages to a "standard" namespace and give it a good name that way, thereby making it the "canonical" implementation of something.
The lack of namespacing in Rust is so weird. The discussion has been going for years, but it makes zero sense to me that after seeing issues npm and pypi faced during the past decade Rust is still pushing back against a more structured registry structure. Squatting will become more and more problematic the longer this flat registry approach persists.
Not only do they refuse to fix it, every time I've brought it up, they insist it isn't a mistake, and this way is better than, or no worse than, what Java does.
Can you elaborate what is silly about it? The reversing, or the DNS, or both?
A package called com.example.foo would be package foo in the namespace "example", which is under the namespace "com", which is how a lot of languages do namespace nesting.
DNS can be said to be "reversed" from that point of view.
> While the Java ecosystem tends to get a lot of criticism, this is one thing they got right: fetching a package name from a public repository involves prefixing the name with the reverse-DNS name of the uploader (the "group ID" in Java/maven parlance). Uploaders can only submit packages after they've proven they have control over the domain.
That doesn't really solve the fundamental problem, though. Domains can and do change hands by various means and for various reasons having little to do with Java packaging, meaning that control of those packages can be transferred as well. It's just shifting the problem to a different place is all.
People from outside Google freak out about this because at their company, in 99.9% of companies, running code on an engineer's workstation would immediately be the highest possible level of breach. Said process could silently insert code into repos, corrupt the build environment, replace packages in production, or whatever. If you never worked at a company that took it seriously it is hard to imagine that there are people who do take it seriously, and that it is possible to have technical defenses against committing unreviewed and unapproved code, poisoning the official build toolchain, or surreptitiously changing production software images.
> would immediately be the highest possible level of breach
Obviously not true, in fact none of the companies I worked in that was the case. Now, if we're talking about somebody like network admin that may be more juicy, but in a large company rarely a single person has the "highest possible level".
> If you never worked at a company that took it seriously it is hard to imagine that there are people who do take it seriously
This sounds quite unsufferably smug, and unwarrantedly so. Yes, we get it, in Google they have access levels. Newsflash: this concept wasn't invented by Google. A lot of companies use it. Running code on engineers workstation does not give you keys to the whole castle. But it does get you into a first perimeter, after which you can do some recon, not available outside the big firewall, launch other things against corporate sites not available to mere mortals, look for "experimental" servers which are poorly secured because they're not production yet and are behind corporate firewall (that happened in pretty much ever company I've worked with), get their hands on some juicy browser cookies containing authorizations to some internal services, steal some code finally... There are a lot of interesting things a person can do with engineer's access and some of them may land a stepping stone to a next step access maybe. If you don't understand why such a thing may be important even without giving somebody the keys to the castle - maybe you don't know enough about how secure systems are built to smugly dismiss other people.
> Obviously not true, in fact none of the companies I worked in that was the case.
Selection bias. In some companies access to an engineers single workstation allows access to potentially billions of dollars worth of intellectual property. Imagine China stealing the design for the latest aircraft engine from GE, or accessing the PC of the most senior accounts payable person in a company and getting access to actual money. Just because YOU weren't the most valuable target in a company doesn't mean no one in your role is. My account at my company is constantly under attack because I'm the VP of IT in healthcare. The reality is "my" account doesn't even have any powers, it's just email. My accounts with sensitive access are separate.
> People [..] freak out about this because [..] in 99.9% of companies, running code on an engineer's workstation would immediately be the highest possible level of breach.
So it's not selection bias, it's a counterargument. The poster also said engineer not "VP-level".
It's 100% relevant because there are even MORE valuable than me in my company. I was pointing out the importance of people to an attacker is directly proportional to their access, not their rank. If they got into one of our RCM people, we'd be royally screwed, and they make $30/hr.
> Obviously not true, in fact none of the companies I worked in that was the case
I once offered a bet to the large security team at a well-known decacorn tech company I worked at: I offered to make a personal, reasonable-sized cash bet with any member of the security team that I would win if I could deploy malicious, unreviewed code to any service or machine of their choice without it being prevented or proactively noticed by them.
The members of the security team all declined my bet. We're talking about a team of probably at least a dozen people, many of who had been working at the company far longer than I and who had been shaping and reviewing the company's security design for years.
They knew perfectly well that I would be able to win the bet. Not because their security was unusually bad, but because it was bad in the common, usual ways. Securing the supply chain is hard, and real security is almost impossibly expensive to add to a system late in the game if you didn't design it in from the beginning.
Or maybe they simply didn't want to risk personal money on some bet about the state of security at their job. I wouldn't take the bet even if thought the security was good.
If you're not even willing to make a bet for a single signed dollar, that doesn't speak highly to your confidence in your work.
It's fine to not be confident, but when professional security teams at large companies are afraid to express confidence that their systems are non-trivial for a random engineer to hack in their free time, that seems at odds with the claim that it's "obvious" that permission escalation is hard
Making such a bet is not a really professional thing to do. Regardless of the actual risk it introduces. If I was a manager in that company and two of my employees made such a bet I'd be tempted to fire both or, at the very least, have a very serious conversation. I think that's borderline malpractice.
When I worked at Google back in the day, we used to make dollar bets all the time. You'd tape the signed dollars you won to your monitor.
A willingness to take pride in your work and to not take it too seriously when smart, well-intentioned people make mistakes (e.g. blameless postmortems) is part of the culture difference that led to Google's engineering becoming so exceptional and innovative vs the more corporate, don't-rock-the-boat, fear-driven culture that the traditional businesses had at the time.
The second paragraph seems at odds with the first. I'd describe a culture where people are making bets on whether or not you can find a bug in someone else's work is the opposite of blameless. I'd consider it quite hostile, to be honest. Specially if it's something that management is actually ok with.
I'm assuming you were at google in late 90s/early 2000s?
>If you're not even willing to make a bet for a single signed dollar, that doesn't speak highly to your confidence in your work.
I've long thought that one should have the attitude (and act to make it so) that one should be willing to bet their job on the quality of their work, but not necessarily actually do so.
And betting anyone (co-worker or not) that they can't compromise the systems (especially, but not limited to production systems) you're tasked with keeping from compromise is a bad bet -- even if you win.
I'd class that sort of behavior as having serious potential to be a "Career Limiting Move" (CLM).
Yah, so they have to pay out on a bet and they become unemployed. That seems really smart. Never gamble in anything that is 100% correlated with your primary source of income.
And yet, almost always, infiltrating large organisations like this for whatever purpose (e.g. state level actors, corporate espionage, whatever) involves multiple chains of vulnerability. The type of vulnerability that the OP reported is the initial entry door, and as such, extremely valuable.
Also, about “code reviews”: here’s a story from last year about how a massive refactoring of some webkit code in 2016 resurrected a massive exploit that was actually fixed in 2013, but went unnoticed. It was only discovered and patched in 2022, as it was being exploited in the wild by, among others, the NSO group.
From my experience most of these defences take the form of limiting privileges and implementing the four eye principle. Consider the fact he could take over 15 Google developers systems within a few weeks with little effort. What if there's two people who can approve each other's production deployments, etc.?
Even if you only take over one developer's system, it's a great starting point for pivoting into the network and starting a more sophisticated attack. I'm sure an advanced threat actor would know how to take advantage of the opportunity against Google.
This is an ad hominem dressed up as a comment about company policy: “if only you worked at a company that didn’t suck, like I did, you would have actual security and wouldn’t file stupid bugs like these!”
We both know that Google does better than most at endpoint security. In some cases it’s possible to argue that they are the best. What we definitely don’t need is you to be superior about it: it’s part of the reason why (ex-)Googlers have a poor reputation.
In this case, having technical measures that avoid sensitive things ending up on developer machines is an excellent way to help improve your security posture. That said, it definitely doesn’t mean you shouldn’t be unconcerned about code execution on developer machines. There’s a reason that internal red team exercises distinguish between external access and already having a foothold on a machine.
In this case isn’t obviously not a specific failure in Google’s security policy that package managers don’t do namespacing, but if I was on the team and I received this report I would at least think about whether there is something I would want to do here to improve the situation, similar to existing efforts to prevent attacks like paste bracketing or trivial keylogging.
Isn’t it a different thing for a googler to run some code deliberately, presumably having some idea where it came from and what it does VS some code being run on their computer without their consent or knowledge of what it is.
I don't fool myself into believing that when I install some package from CRAN, or whatever your least-favorite insecure software repository is, that I know exactly what it is or does. So to me it seems the same risk either way. Anyway I am only trying to address the question raised in the article about why Google security did not flip out over this report in the way the author expected them to.
The two main protections are requiring multiple signoffs (for landing code) and requiring proof of physical presence (for various things, including any sort of access to prod). Having code exec on a large number of systems means the first is weakened significantly, and the strength of the second really depends to an extent on how much effort and patience the attacker is willing to put in (eg, alias a command that requires gnubby press to something under your control, use the press to do whatever you want, then trigger the real command - user ends up being prompted twice, but how many people would actually flag that as a security concern?). Someone with no understanding of how things work in Google would probably screw up in a way that got them noticed pretty quickly, but a well-skilled adversary who's not looking for a quick win could definitely circumvent many controls if they had code exec on a decent number of systems.
tl;dr - if I were still gLinux security, I might not be freaking out about this, but it would definitely fall into the set of stuff I'd be making space for in next quarter's OKRs.
> Said process could silently insert code into repos, corrupt the build environment, replace packages in production, or whatever
Haha I hope this to be true :sigh: The reality is that all those security hardening measures already surpass the level where it significantly undermines the overall productivity... Engineers cannot even have a test run on production data without an explicit review from colleagues.
Doesn’t Google use a monorepo though? Even if there are guards against poisoning builds and commits, the risk of an RCE causing silent exfiltration of confidential trade secrets must be astronomical.
Incredibly sensitive portions of google3 are siloed and only a small part of the company can access them.
Tools like code search and the source checkout process also both check for accessing unusually large portions of the codebase, making it only possible to exfiltrate small portions of the codebase at once.
If accurate, at least one of the hosts in question indicated the user was root. Perhaps it was running in a container. Regardless, this malicious package could be used for data exfiltration or other nefarious things.
But they have already explained (at the end) that in any event, "Googlers can download and run arbitrary code on their machines", with the implication being that Google already thought about and had to deal with this general issue a long time ago. What this exploit does is invert the how of this "arbitrary code" running on google machines, but it makes perfect sense that Google's security protections couldn't care less how the code ended up on some dev's machine since they themselves could have just explicitly pulled it from the net.
This is a product developed by Google that has at least been utilized internally to some extent. It's not perfect, but my previous company used it and it does prevent unexpected unknown code from running in the background.
What it does not do is prevent someone from intentionally downloading and executing a library unless the upvoter actually comes to some demand that you do so. I found that it quickly became a bit of a "alert fatigue" where you approve things your coworkers send you so they can get back to work without properly vetting.
In a well designed zero trust network this makes little difference. The traditional posix security model is bogus from the start anyway. A lot more useful stuff you can exfiltrate as a regular user usually.
If an attacker has arbitrary code execution as a user with sudo access (ie, basically any developer system), then the user/root distinction is basically meaningless.
That is also explicitly prohibited by the PyPI rules[1]:
“A project published on the Package Index meeting [...] the following is considered invalid and will be removed from the Index: [...] project is name squatting (package has no functionality or is empty);”
although the enforcement of that rule is lax in general, not only in this instance[2].
* There is no namespaces, so all internal packages must be individually protected
* It's easy to misconfigure the various python package tools to open you up to dependency confusion attacks
* The only effective protection mechanism that you can implement across a large enterprise fills the index with spam and is forbidden
It would really improve things if they could introduce namespaces and let legal entities own those.
Yeah, this sucks, and I won’t claim that a sudden crackdown on invalid packages would help the situation, nor that eliminating that rule would. Namespaces could probably help some. However, while I don’t know how the PSF folks feel, if it were up to me, when it came to the “let legal entities own those” step I’d throw my hands up and point people to the DNS namespace, the way Go, Nix, etc. do.
That’s not a perfect solution, but so far that DNS is the namespace with mainstream acceptance and builtin lawyers that a single entity cannot e.g. just singlehandedly sanction, sue or simply “reserve the right to refuse” people out of—in practice, for the most part.
PyPI’s (and CPAN’s, CTAN’s, Hackage’s, NPM’s) centralized index was originally a (deliberately) crude solution to the discoverability problem, at least in part. These days, we have adopted a different bad solution—putting everything on a Microsoft-owned hosting service with a crap search function. That is also quite bad, but maybe it’s time we recognize it happened anyway and stop making concessions to the old solutions in our package naming schemes.
Package registries need to address this, as even the newest and "modern" ones keep repeating the same mistakes (yes, I'm looking at you cargo/crates).
Just add a damn namespace, where packages have to be under in order to be publicly available. Would solve the problem yesterday, but instead new registries with global names keep appearing like it's not a problem.
In the case of heavy moderated ones like Debian et al, it makes sense with a global namespace, but for the ones anyone can upload a package? Require namespaces already...
Not sure if you will read this, but the nuget team responded earlier today to my email and reserved the prefix I asked for. They were quite reactive, you may want to try out your luck again.
To be fair the only alternative is fixing Python, and even then you still would have to wait a good 5 years at least for all the old Python versions to dwindle.
I mean this is a clever hack, but it is not a Google vulnerability per se, it is a vulnerability in a public open source tool chain that they use, i.e. pip.
It is working more or less exactly as it should, the employees in question are just downloading untrusted packages inadvertently. This is something that could happen anywhere, even without private package repos.
I guess pip wouldn’t pay the researcher cash, so they’re not interested in that. This blog seems like a pretty desperate attempt to get google to pay them.
I think my response would be two words and the second one would be off.
> This blog seems like a pretty desperate attempt to get google to pay them.
It's actually deliberately criminal as I read it. "Hey, I trojaned some code and got it downloaded onto your company's systems! Please pay me a bug bounty!" is 100% isomorphic to extortion.
It is not extortion unless a threat is being made, and you cannot manufacture a threat against yourself by claiming that the author might conceivably make a threat on the basis of these facts, particularly when you have also said that these facts raise no concerns.
There is no credible threat here. In addition to the above points, if attacks were made by exploiting these facts, the author, having raised the issue in the first place, would become a person of interest in any investigation.
“Ladies and gentlemen of the jury, my client did not threaten anyone! His accusers are on record as saying he merely told them that they had a, quote, ‘nice little restaurant’, and that it would be a ‘shame if it caught fire’. These are not threats, but simply facts.”
He literally compromised live systems. You're using "credible" in the sense that you take him at his word that he won't do anything bad. That's exactly the wrong party to be playing trust in!
Your response here has failed to address any of the points I made in my original reply. You have not identified any threat being made, without which there is no extortion - and as Google does not regard this situation as being a vulnerability, it is going to be difficult for you to identify a credible threat that could be used to extort something from them.
Notice that you are also taking the author at his word when you say he literally compromised live systems. To turn this into a case of extortion, you would have to go beyond that and invent a number of things that have not been said - and some highly implausible things at that, given the very public way in which this supposed extortion is being conducted.
This is just more non-sequiturs, with which you attempt to distract from the gaping hole in your position. You have still not shown any hint of any plausible threat, without which, it is not even close to, as you put it, "100% isomorphic to extortion" - in fact, you have not presented one iota of evidence of extortion, and evidence is a requirement for for prosecuting extortion crimes.
And for your information, 'explicit' is not a synonym of 'credible'.
Any submission without an exploit? It's routine to find crash bugs or potentially XSS data or injection opportunities without going all the way to a compromised system.
The issue here is that the submit actually attacked live systems, instead of just reporting on the possibility of malicious library code.
...which is something everyone already knows about, and thus why he couldn't get paid. You don't get paid for actually hacking systems either!
That’s like saying it’s not a bank vulnerability for using a gate made of paper instead of steel. It’s a vulnerability of the paper...
It is a google vulnerability for using the tool in ways that are known to be broken. Dependency confusion attacks are well known and have known mitigations. When depending on private packages one must not rely only on extra-index-url, instead point to a full url or use a completely internally hosted index-url.
I mean, this guy just registered a package without mentioning it to anyone, then suddenly it started executing inside a google users machine. No social engineering involved. Note that pip install is not just downloading, it also can run arbitrary code during installation phase due to setup.py.
Sure, Google likely have another 3 layers of defense to get to the truly interesting sauce, at least he got through the front door.
Call it whatever, but executing arbitrary code on an employee's system should trigger all the alarms in any org.
The two guys responding to the email are basically "doing their job". "Oh, it's not a bug in the software package, no cookie for you." Yeah, it's a much more severe incident, you stupid son of a bitch. Any CISO sees this and throws themselves out of the window.
Can you please edit out name-calling and swipes from your HN posts, and make your substantive points thoughtfully?
You obviously had a good point here but including things like "you stupid son of a bitch" unfortunately flips a higher-order bit. Perhaps you don't owe inadequate vulnerability handlers better, but you owe this community better if you're participating in it.
Google implements a zero-trust security model[0] they call BeyondCorp[1], which basically assumes any device may be compromised at any given moment. I'm not intimately familiar with their implementation but it seems designed to make precisely situations like this less dramatic.
I hate to break this to you, but engineers themselves are capable of running arbitrary malicious code on their own workstations, even if we ignore the dozen other ways it might get there (dependency attacks, social engineering, browser exploits, plugging in a USB key you found on the ground, etc.).
At Google’s scale you need to assume that even some employees will be bad faith actors (e.g. agents of some government, with a goal of surreptitiously adding back doors) and you need far more sophisticated security controls (e.g. multi-party controls, immutable audit trails) than assuming engineers or their systems will never be compromised. The latter is going to be true for some employee nearly 100% of the time even if you don’t have bad actors.
The existence of these controls and general set of security assumptions and architecture are what make this not a big deal, not a lack of care.
Eh. Google has serious insider risk protection and a intends to defend against arbitrarily malicious single employees. "Somebody runs something bad on their workstation" is already a threat model that they've been working to mitigate for like a decade.
Namespacing packages plus private (or public) pip patch that hardcodes the link of certain namespaces to certain repos would be the most obvious one (I spent 5 seconds thinking about it so far not I'm not saying it's the best ever).
Internal package registry that knows all internal package names and makes pip reject colliding names from other sources would be another possibility.
Explicitly verifying the hash of the internal package based on the registry (again) and refusing to install packages that don't match the hash would be another option.
I'm sure if a person smarter than me (Google probably has thousands) spends a day thinking about it, they could think of a dozen better ways.
Are you saying it's impossible to mitigate/eliminate package source confusion attacks? There's no way in Python to pin the source exactly where Google wants it?
It's possible to mitigate, but it's impossible to realistically prevent unless you disconnect your employees from the internet. Which is probably costlier than any losses they are ever likely to end up having due to this.
There is a universe of difference between "employee is allowed to download from the internet" and "something unintended from an unauthorized external party runs automatically"
Any package from pypi can at any time run code that’s malicious. IIRC, even at install time. So installing any package, even in a python virtual environment bears the risk of arbitrary code execution - before you have a chance to inspect it. (Same for php, ruby, rust, …) Basing your security concept on that never happening is like playing whack-a-mole in hardcore mode.
What are you going to do? Block access to any pypi.org package except for whitelisted ones? You can try to mitigate things by perhaps blocking access to any pypi repos that's also available internally but that haven't been released to the public. At least it would get rid of this one vector. But that doesn't mean that people won't just "pip install whatever" anyway.
To quote John Oliver: "What can you do?" "Something. You can do, something."
The possible ways to respond to that process are open ended and infinite. You can do anything you want about it. Doing nothing about it is approximately the least defensible.
> Doing nothing about it is approximately the least defensible.
Everything you do has a cost. Increased friction which makes people work around it. Lower productivity. Time spent implementing it that cannot be used to implement something more useful.
Don’t just do something. Do something that improves the situation - and google does. Their statement indicates that they consider the developers machine as fundamentally not trusted - and I’d consider that a correct assumption. Some of the thousands of machines will be compromised at any given time. Whether it’s via this exploit, or another or by bribing the engineer doesn’t matter. What matter is that they attempt to contain the issue at that boundary.
I don't know why you interpret that that way, but I sure hope you don't write the autopilot to any planes I might fly in or anythimg else important with such logic.
I did not say or imply or suggest to "do something, anything" without caring if it's sensible or effective.
"anything" simply means there is no limit to the possible suitable things.
There is also no limit to the possible unsuitable things, but so what?
It is beyond stupid to take that starting point and conclude that anyone suggested "Maybe Gooogle should issue Tarot decks to all employees to determine if they should press enter at the end of every shell command." just because, after all, that is something and included in "anything".
I can't know which of the infinite possible detailed measures make sense within Google's environment. But I don't have to to still know that they exist. The details will depend on internal details only they know.
If I say "wrap the pip command in an internal wrapper that performs various checks" surely there is some reason that is not practical or not effective enough, exactly as stated. Or maybe that would exactly clear it all up. But if not, ok so something else then. Have some imagination. But that does not remotely imply random nonsense.
Essentially it’s impossible. You can make it harder, but if you grant developer machines access to the internet, they can download an untrustworthy package from anywhere and use it as a local dependency.
You can protect production and CI systems by restricting their internet access, but who can reasonably do work with such a restriction.
There's a big difference from a person downloading a package from a random site and running it, and a person adding a known internal package name to requirements.txt. The first is usually a bad idea (unless you know what you're doing and ready for the consequences) and every SWE worth Google money knows that. The second is a standard practice and everybody does that. You can't make stupid behavior impossible - but you can make common behavior safe, and that's the basis of good security.
I agree with your general thrust, but I don’t see how your point refutes mine.
First, it’s not required to actually run the package. Installing it is sufficient. All package managers that I worked with so far can run code at install time.
Second, the issue here seems to be a misconfiguration that makes pip look up a private package that should be retrieved from a private repository on the public repository. The attacker then just registers a malicious package on the public repo with the same name. Preventing this attack requires that python is correctly configured on each an every developer machine - something that I’d never rely on as cornerstone of my security.
Third: This is one example of smuggling a malicious package on a developers machine. Another vector is that a good package turns into a malicious package with an update. That’s even harder to defend against - pulling in the update with the programming languages standard tooling may run malicious code. You can certainly first download the package, unpack and inspect it and then pull the update - but would you rely on thousands of developers diligently doing that?
Last: this class of error affects almost all programming language package managers out there.
So it’s better to assume that this will happen, take a local compromise of a developers machine as a matter of when, and mitigate what the attacker can do with the capabilities they gain from this compromise.
> Second, the issue here seems to be a misconfiguration that makes pip look up a private package that should be retrieved from a private repository on the public repository
True enough. The problem is that a) it is a common misconfiguration and b) it appears to be also affecting Google, which is a big juicy target for any computer criminal. We're not talking about solving a theoretical problem in 100% of theoretical cases. We're talking about having a very practical vulnerability - which can be practically fixed. Yes, that doesn't fix all other theoretically possible vulnerabilities - so what? That's like arguing that since halting problem is unsolvable having debuggers and static analyzers is useless - we can't solve 100% of the problem, so why even bother to solve even 1%?
> Last: this class of error affects almost all programming language package managers out there.
Again, you're replacing a specific issue with a "class". Yes, you can't fix all the problems in the whole class. But you can very well fix this particular one, in many ways.
As I understand the article, the problem is that an employee machine is misconfigured and uses the public package repo for a private package. The package in question seems to be explicitly pinned in the requirements.txt - that’s essentially a whitelist. Whitelisting also suffers from the issue that a good public packages could at any time turn into a malicious package - so you’d need to pin the version. A task that seems infeasible for developer machines at googles scale.
Sandboxing the development environment could be done, but would only help against this attack if the sandbox cannot connect to the public internet- which again would be painful.
Googles strategy of accepting that this kind of breach will happen and rather focus on mitigation of the resulting damage seems like the better way.
This is odd, because I work at Google, on open source projects, and inadvertently created this exact same vulnerability on npm (due to using a fake package name for an unpublished monorepo utility package).
We fixed* this right away, because even though it's true that this "vulnerability" exists with basically every npm package, the difference is that anyone can immediately pull this off once they find an unpublished package in use - they don't have to take over an existing package or get a package they own to be used.
It's the ease of the executing exploit that makes this one more dangerous. Some bored kid could have just wiped my hard-drive or worse, maybe within a few minutes if I'm working.
* An easy fix on npm is to create an org that you use for all internal packages. No one else can publish to that org.
Rationale: Code execution on a Googler machine doesn't directly lead to code execution in a production environment. Googlers can download and run arbitrary code on their machines - we have some mitigations against that, but in general this is not a vulnerability; we are aware of and accepting that risk.
Sure, but you can pivot to various things. If you're running with the right set of privileges, you can disable branch policies and push unreviewed code changes into a repository, or you can exfiltrate code from a repo someone has cloned to their dev box, or hit the internal payroll site and make that employee's direct deposits go to your account.
The whole point of "zero trust" networking is that running on the employee's machine is not much better than running it on any other machine. You can no more easily push unapproved code to Google's repo from an engineer's workstation than you could from wherever you are sitting right now.
The web browser session must be authenticated. You can't just log on to your Googler machine and begin using the internal network with your browser. Gnubby must be satisfied.
I mean it's likely that a process running as a user who is logged in interactively and fully authenticated and recently 2FA's could do something but that thing isn't going to be very spectacular. What would it do? Try to sneak a malicious change into another changelist? I am not saying this is completely useless but it would only be step 1 of many that are unlikely to all succeed and would need to be laser-targeted to the right engineer with the necessary code ownership.
It's a perfect way for an APT to get a foothold in your environment. Imagine if you could check in a unit test that "downloads updates" from somewhere and executes them. Now you've got 30 devs on that floor running your arbitrary code without any paper trail. And since we (generalizing here, I know) have this culture of thinking test code is junk anyway, is anyone going to scrutinize it or just click "approve" and move onto the stuff they think is important?
That’s a good reason to execute your unit tests in an ephemeral environment without network access.
All I am saying about the issues raised in the article is that some organizations start from the assumption that engineers will run arbitrary codes on their devices and the rest of their security story follows from that. It is not necessarily irrational. If you think that is crazy, it might be because you don’t understand the base assumptions.
Unit tests don’t run on dev machines normally, you need to do something special to make that happen (which I’ve never done myself). Also, they by default run without external network access.
> "If I wanted to, I could replace the current version of the package with something malicious, and it would start running on Google’s employees’ computers/virtual desktops."
Seems like you should? It doesn't even have to be malicious, but you're much more likely to get a response (even if it's just a bug fix) by opening a window on someones computer that says "haha, you're hacked! send and email to security_team1234@google.com and let them know what happened."
caveat: not in infosec – maybe there is precedent to not do this kind of thing if you're in the business of bug bounty hunting.
And? If someone finds a way to bypass a privacy control of a product they have broken the CFAA. If you report this bug in a bug bounty you could be charged for doing something illegal but companies choose to not to in hopes they can create an invective structure where security issues can be discovered before they are abused.
Way too much risk for too little reward. All it takes is a couple of the wrong people to catch wind of what happened for it to spiral into hysteria and a phone call to the authorities, and the next thing you know you're being charged for a crime by people who have no technical understanding of what actually happened.
Part of the reason we have a crisis in computer security is because the good guys have to be extremely careful about the systems they poke. They can only poke companies with responsible disclosure policies in specific ways. It shouldn't be a crime to find and report vulnerabilities in good faith, but that's how it is. I almost got myself in big trouble for doing so on one occasion.
What law does it break if you publish a package that displays a popup and somebody else voluntarily downloads and executes it? There's no maliciousness, no harm done, yet somebody would find a way to get you prosecuted for it because the legislation is vague and 30 years behind.
Meanwhile the actual bad guys are getting away with draining bank accounts and dumping databases with millions of peoples' personal information.
"This is an automailer to send the CEO of Google a friendly Hello, and politely request changes to the Bug Bounty program."
Which does exactly that, and nothing nefarious beyond that, would probably be okay. It's doing exactly what's advertised.
You want to avoid anything which uses words like "hack" or "compromise." Indeed, you can go out-of-the-way to point out it is explicitly not a "hack" or "compromise" under current Google policies.
I’d think that ship has sailed in this case. The author already publicly stated that they would be able to make the package do “something malicious” within Google if they wanted. So however they change it after the fact, they’d run the risk of being accused of malicious intent.
"Something malicious" would be very different than sending a proof-of-concept email. "Something malicious" might be, for example, snarfing up data, or having one engineer commit malicious code and having another one approve it.
Indeed, the email could walk through malicious use-cases like these, which either leak customer data or damage Google infrastructure.
Whoa. They didn't even block the installs from continuing to happen.
Google must be really confident in their ability to contain threats like this one. In other orgs, this would be a "hair on fire" sev 1 security incident.
There are a LOT of layers at Google, and they're very liberal about what you can do on your own machine. There are a lot of steps between there and the prod environment, and usually a bunch of auditing too. Once you're in prod, your server also basically can't do anything unexpected - for example, if you want to call out of the datacenter, you have to file a ticket, etc. All of this establishes an audit trail, too.
This class of exploit is interesting, but it's totally overshadowed by the fact that every Google engineer is probably downloading dozens of public packages per day, any of which could be compromised at any time. Plus those attacks are even better because you don't need guess what the original package did to avoid detection.
This. Any package can be anything at any time. Remember when left-pad (10 lines of code) was deleted by its author in 2016?
Build systems all over the world failed so hard, NPM had to reinstate the package against its own rules.
left-pad's author could have done anything with all the dependencies on his package.
Local is the org local repo, where they publish their internal packages. The remote is the public repo in the Internet.
In Java builds we usually had:
build -> org local repo -> maven central
So the local repo (be it Artifactory or Apache Archiva) works as a proxy. It dowloads the artifacts form internet if the artifact is not present locally. The build does not go directly to the maven central.
If org uses zero-trust this is not vulnarabity at all. The fact that you can run something on a employees dev machine does not mean a lot. That is the entire point of zero-trust.
Surely the dev machine must somehow interact with Google's systems at some point? Even if every interaction requires 2FA, malware with access to the user's Windows/X11 session could intercept user input, modify requests, steal session tokens, etc.
I'm not familiar with Google dev process. But I have extensive experience working in large corporations.
In such environments, it is crucial to maintain a high level of isolation between development machines, personal computers, test farms, production servers, and source code repositories.
Also there are various types of development machines and environments to cater to different needs. Some are highly restricted, only allowing developers to interact with and modify the production source code. These environments provide minimal functionality beyond code editing and submitting changes to test farm. On the other hand, there are more flexible environments, akin to "scratch pads," where developers have greater freedom to experiment and explore.
Google's response said that their software does not depend on the vulnerable packages (this is consistent with the researcher only seeing downloads from dev machines and not from automated builds). Reading between the lines, I'm guessing that there is a big human element in play and it's simply not something that the security team can fix. If engineers insist on manually downloading packages insecurely, there is only so much the security team can do because "Googlers can download and run arbitrary code on their machines". (I think that's what they tried to convey by "social engineering" -- not that the researcher was using social engineering, but that there is a social aspect.)
I feel like this is a major design flaw in the package managers being used. If people have to squat package names to mitigate the chances of a dependency confusion attack, then the package manager/package repository needs to find a better solution.
I think this is something GoLang does a great job at. To use a package, you have to specify the exact URL of the repo. This mitigates the risk of dependency confusion since an attacker would need control over the domain to upload a conflicting package.
I think about this every time I install something through pip or apt. As someone with no knowledge about the package infrastructure but experience from SEO (domains), it seems like a simple task to just register common typos and spread malicious code. I assume there's some kind of vetting?
> I think about this every time I install something through pip or apt. [...] it seems like a simple task to just register common typos and spread malicious code. I assume there's some kind of vetting?
For distribution repositories (apt and so on), yes there's some vetting. To start with, only a limited set of people (the distribution's developers) are allowed to upload packages to a distribution repository, and even then, there's often a second layer of vetting for new package names. For instance, on Debian (https://wiki.debian.org/Teams/FTPMaster):
"When a package is uploaded to the unstable or experimental suite, it falls into one of three categories. If it is a new version of an existing package and adds no new binary packages, it is moved into the package pool automatically. If one or more of the binary packages or the source package itself is not currently in the archive or if a package is moved between the components (main, contrib, non-free), it is NEW and must be examined by an FTP Team member (see NewQueue). [...]"
That is, when a new binary package is added, either because it comes from a new source package, or because a source package was modified to add a new binary package, the FTP masters have to manually approve it, before it becomes available to be installed by apt.
This is different from language repositories (pip and so on), in which anyone can register a developer account, and there's no manual vetting of new package names.
Is it possible that systems/packges hosted outside of google's ecosystem dont count in the scope of "Google vulnerability reward program". The author did go through a lot of non-technical steps to demo the behavior he showed and It did seem like it involved social-engineering of sorts.
so its understandable that they closed it as non-vulnerability.
It required tricking a trusted human to work. It's a human (social) vulnerability, not a software or hardware one. Human vulnerabilities can sometimes be mitigated with software or hardware (or other things), but that doesn't change the fact that they're based on human mistakes, not software or hardware mistakes.
That's what I thought. Until I thought a bit more.
Firstly he didn't fool the trusted human into doing anything they weren't already doing anyway. I don't see anyone being tricked.
Secondly, a lot of exploits depend on someone doing, unprompted, something they really didn't ought to do. I mean, if you rule out as an exploit anthing that simply depends on people not taking active measures against an attack they didn't know about, there's not a lot left.
I don't think Goo should have paid out, though; there's no vulnerability shown in any Google software, whether a product or an internal tool. It looks to me like a pip vuln that $TRUSTED_HUMAN could and should have evaded.
I'm having a hard time accepting a definition of social engineering that includes exploits where you do not in any way influence the behavior of others.
It's not clear from the article or Google's response what caused the dependency to be downloaded, however one thing that Google does mention is that they don't believe it has to do with their products or services. Since the purpose of their bug bounty program is to help Google secure their products, it would fall outside it's scope as the fix doesn't sit with product teams. I would imagine though this is something that would be raised with corporate security, that deals with protection of endpoint devices and security awareness training.
So the poster is mad about abusing dependency confusion and expects a big tech company to reward them blindly?
Do they not realize that most big tech companies have moved on to single feeds that are governed by their own security/inventory teams? Using public registry is an anti pattern now and has been for awhile, well before “dependency confusion”.
Not all package managers have implemented a stopgap to the problem either. I’m a bit disappointed to see this article though. The world runs on trust and we all trust that people won’t abuse known vectors for their own gain.
No idea why the blogger is protecting google when google says this isnt a vulnerability.
He should have it install a reverse ssh tunnel then pass along keylogging and a screenshot every 2 seconds, he'll likely find someway to pivot for a 'vulnerability'.
You weren't banned - users were flagging your posts, correctly, because they were breaking the site guidelines.
I've banned your account just now, however, since you don't seem to be using HN in the intended spirit.
If you don't want to be banned, you're welcome to email hn@ycombinator.com and give us reason to believe that you'll follow the rules in the future. They're here: https://news.ycombinator.com/newsguidelines.html.
No, we are not running out of IP addresses. We have run out of IP addresses, past tense; they are all allocated. Any IP allocations these days are just recycling old addresses given back to RIRs.
But of course Google has large stockpile of IPs already so they are not really impacted the same way others might be.
> It relies on tricking a developer into downloading a substituted package, which is indeed social engineering.
I wouldn't call this social engineering. The attacker isn't actively trying to trick anyone of anything. They're just exploiting the fact that the Python package management tools make it really easy for a user to accidentally -- without any prompting or interference from the attacker -- pull packages from pypi.org rather than their internal private repository.
It might not be “trying to trick”, but it certainly is “trying to trap”. The outcome is the same, though, which is that an attacker actively tries to exploit a misconception of other people for their own (concealed) intents and purposes.
It's really unfortunate that even newer package management ecosystems like Rust's haven't learned that lesson. Yes, it increases initial setup friction, but dependency confusion and name squatting issues more or less just go away when you do that. Sure, you might run into issues where people register domains for common misspellings of popular domain names, but that's already a problem with website resolution, and there are ways to mitigate that.