There were outages around the same time last year. Somebody in the HN thread commented back then that the employees evaluation and promotion window ends around december/eoy, thus more releases are made.
> and the evaluation period was over more than two months ago
Excellent time to slack off a bit, make a few mistakes, then come next eval you can point to a marked improvement over the intervening six-to-nine months!
Well, there are various whitepapers that reformulate this exact truth you've highlighted: issues tend to happen more when changes are made.
Regarding the technical details doc, Google will never state that outright in individual postmortems. And they will definitely not draw this to the logical conclusion regarding the spiky yearly activity.
> logical conclusion regarding the spiky yearly activity
Why is it logical? There’s tons of changes being deployed at all times, at all large companies.
Across products, verticals, everything - hundreds of changes at any given point. Some of these changes can introduce hard-to-predict bugs in globally distributed systems. Most of the time, external users don’t even notice before they’re fixed.
Like another commenter said, performance reviews do not coincide with with the year-end at Google and other companies.
Hypothetically speaking? 11 months a year there's no incentive to cut corners and only do 2 weeks of testing on something that really needs 3. If one month a year rushing things out with a bit less testing is rewarded, I can believe some people would respond to that.
Of course, I wouldn't go as far as to call this a "logical conclusion" as the evidence I've seen is very slim.
It can’t be logical because it is based on bad facts.
We wrote evals in August. Anybody racing to get things launched before perf did so months ago. The timing described in the top post is just false.
We write the next one in February. We are almost exactly half way between perf cycles. Your hunch that this lines up with the end of the year is false.
There is also a lot of pressure on engineers around black Friday and Cyber Monday to get things done before any change embargoes come into place. This is coming from someone who just worked 16 hours straight to scale things before a change freeze.
I don't know how or if this would impact Google, but I'm sure someone there has at least thought about those dates.
November 11 is a public holiday in US. And it seems it has in recent years became a company holiday in many companies too. May be that break in established over many years cadence of workday and holidays plays a role here, like say affecting smoothness of support transition between US-EMEA-APJ.
Is there an objective way to measure code quality?
We barely understand the systems we’re writing code for - I don’t see how you could objectively judge the code to manage those systems without fully understanding the systems first.
You could have wave some metrics like number of bugs or test coverage, but I can’t think of any (aforementioned included) that aren’t subject to tons of confounding variables.
More likely someone actually will. A blameless postmortem will be written, and the people that will fix the bug or systems issue will have something to work on with high visibility and high impact, which tends to translate in good performance ratings.
Comments like this make me wonder if people really expect engineers to be fired because of an outage? I do not work at Google, but none of my workplaces would fire engineers because of a failure. Mistakes happen. As long as they are not repeated, everything is good.
If your company fires people in situations like this, run away and never look back.
Googler here, not speaking on the behalf of the company, my opinions are my own
People do absolutely NOT get fired over incidents. Making mistakes is human. An incident will prompt a review of the systems and safeguards in place to prevent such an incident, much like an airline incident investigation -
basically "somebody fat-fingered it" is never the answer, postmortems are always blameless
EDIT: now that I think of it, the opposite thing happens after a major incident - a systemic failure should be identified, people are being hired to fix it :)
> Googler here, not speaking on the behalf of the company, my opinions are my own
Why do employees at big tech names (FAANG et al.) are so often so cautious as to include this as a foreword everywhere? Twitter bios are full of that, for instance.
It is crazy to me; who would expect anything else that our opinions being your own and nothing more? Who would expect that your word (with all due respect) is worth anything with regards to the company's PR?
Is there an actual risk in the US? Have there been trials or anything that push people to add such statements?
Can't speak for other companies, but this is covered in basic training at Google - if you're not authorized to speak on behalf of the company, you must make it clear when your writings may be mistaken or constructed to represent the company.
Basically the company has specially trained people that speak on behalf of the company, and that message should not be confounded by personal opinions of other employees. For example, on the recent FB outage, there was an employee posting inside information on reddit - media companies just took it at face value and ran around with it reporting as it was what FB itself was saying about the outage.
I'm not aware of any actual risks in the US, but then again I'm not in the US. For me this seems a minor point, and I actually enjoy separating my public persona from the company for which I work, being it Google or a small startup.
> On the recent FB outage, there was an employee posting inside information on reddit - media companies just took it at face value and ran around with it reporting as it was what FB itself was saying about the outage.
To be fair, I think the media would have done that even with a "speaking only for myself" disclaimer.
Every company says that, the obvious solution is to never name it when you speak. Why do these people need to say "Im a googler" and then immediately "but forget it, I speak on my own"... obviously there s value in the fact they re at Google and it will color their discourse which is already probably forbidden.
Dont name your company if you intend to speak for yourself.
It's the same at all large companies - its CYA boiler plate.
Though I almost got to be the official spokesperson for British Telecom responding on the alt.2600 news group about the the Met police VMB hack - press office was cool but the internal security was not.
I work at a large tech company, and they do mention in the on boarding materials that we represent the company, so we should be careful in our social media profiles. My solution to this is to not associate my social media profiles with my employer. This is technically not really what we’re supposed to do, and I might have to change that approach at some point if I move high enough in the org to start getting attention from people, but this works for me better than disclaimers on all my posts.
> Why do employees at big tech names (FAANG et al.) are so often so cautious as to include this as a foreword everywhere?
This isn’t just ‘big tech’ - I work at a relatively small tech company, but I’d never want anything I say about the company to be mistaken as some sort of ‘official statement’ especially if it related to an incident that possibly had a financial impact on external parties, and could conceivably be misused in that context in the future.
I go as far as never writing private emails from my work mail for the same sort of reasons - although that is from a possibly over-abundance of caution.
Part of it is because the company asks us to. Part of it is because I think it's reasonable to tell people your biases, and it can avoid the situation where substantive conversation gets derailed by "gotchas". If I make a comment about how I think Google Meet has the best noise cancellation of any video chat software, even though I don't work on Meet or anything adjacent to it, it's still a bad look if someone can dig through my comment history and pull out a previous comment about how I work for Google.
> It is crazy to me; who would expect anything else that our opinions being your own and nothing more?
Rumor mill journalists will mine social media and forum comments and write entire articles about "so and so FAANG employee gives hint at future merger" when some dev comments how much they've enjoyed using some library recently.
They want to mention their employer to gain authority in the discussion, but since mentioning their employer is a legal/PR risk, they need to follow it up with a disclaimer (this only partially mitigates the risk, but it’s worth it to get the brag in).
In fairness, 'opinion' is a horrendously vague and ill-defined word. It does double duty as (i) 'normative value' and (ii) 'personal understanding of the descriptive facts', which two senses are constantly confused - for example right here.
That's why we constantly get "it's just my opinion" used in reference to type-ii opinions (personal understanding of descriptive fact), when it's only really appropriate to type-i opinions (normative value).
Many conversations would be far clearer if it were abandoned in favour of more precise language, IMPUOTDF.
"Many conversations would be far clearer if it were abandoned in favour of more precise language, IMPUOTDF"... um whats IMPUOTDF?
I did try to google it, but only this post was found.
Sorry, I was joking: 'in my personal understanding of the descriptive facts', referring back to that second definition of 'opinion' earlier in the comment.
Yeah, this 'blameless' ethos has definitely trickled down from FAANG to decently-sized decently-reputed places I've worked at - and certainly to #EngTwitter.
I think it's a bit over-applied in some cases. Does it not commit you to the theorem that every process can be made so perfect as to be completely invulnerable to one human being making a mistake? (At least, in the form exemplified by the common tweets to the effect that "your processes are to blame for $incident, not your interns/engineers/etc".)
Even if you required two-person auth for every single thing, two people will make a mistake now and then, and in reality - due to our being social animals - the two probabilities are not truly independent.
I just don't see how this is feasible in reality. A more realistic principle feels like: "people will infrequently make mistakes, and that's of course natural and human and forgiveable, but far fewer incidents should be vulnerable to human error than currently are".
I of course agree that mistakes are inevitable. That being said, the point of blameless culture is not to make a process invulnerable to mistakes. Instead during a post-mortem, we look at how to prevent that particular incident from happening again.
You're totally right and the SRE book by Google goes over this - the company's culture does not allow firing people for outages. If you're somewhere this still happens, run away (or you'd better be getting paid more than top-end ICs at Google.
The other comments already explained it, but I'm wondering how you haven't come across this 'saying' before. It's so overused and also cheesy in my opinion.
People are born every day. Every day tens of thousands of people will hear about hacker news, the pyramids, Darth Vader being someone's father, for the first time.
It was not meant ill-willed. I was just wondering. Regarding what you said, I think I understand what your point is. On the other hand it makes some difference among whose people you ask - some things are much likelier to be common sense (or at least heard of) than in other places. Whatever.
I work part time in the Army. In the Army when you go from their equivalent of junior to mid-level they take you out of your job for eight months of dedicated personal development, before you start your first mid-level job. When you go to their equivalent of senior they take you out for a year.
I don't know how much that costs all-in, including the salaries, instructors, facilities, but might be starting to approach a million.
But the army is a cost center! The workers have some money shaved off their salary to pay for an army that allows delinquents and half-disabled to pew-pew guns in the forest, leaving them in peace. It's not comparable to a productive entreprise that needs to build things for people or perish.
For instance, if Google fails and cant profit, it cant just shoot at their client until they pay. Your organisation can.
WW2 was 70+ years ago. The USSR fell 30+ years ago. Military budgets are still incredibly high. The army as it is today does not need to be that efficient. And even during WW2 times the US did not face a credible threat of invasion. The last time the US faced an invasion on the mainland was during the war of 1812.
No I mean that for the US army today the downside to the army being inefficient is that money is wasted. Not great, but not a disaster. For a different country that could mean the country gets invaded or the government collapses (like Afghanistan).
Middle and upper management get there via connections and picking things up on their own. It's unsurprising that they don't want to be subjected to competition with lesser people who can "merely" be trained to do their job as well, or better.
If your job is something like a staff officer in a Brigade, you could learn that on the job, the Google way, because exercising is also 'the job', but they don't get you to do that - instead they take the time to fully pull you out of all work commitments for dedicated personal development. These periods of personal development are about personal skills rather than combat training, which you've already done by this point.
Someone who just cost their company millions in revenues is gonna be _extremely_ careful not to make the same mistake again. Hence, million-dollsr training.
The GP is a reference to the anecdote about IBM’s Thomas Watson not firing a executive who had made an error costing the company a substantial amount of money.
But firing someone doesn't undo an incident. It just introduces other weird incentives. People become afraid to change things for fear of breaking something, or when something does break they try to hide it rather than feeling like they can immediately ask for and receive help to fix it.
The only time someone should be fired for causing an outage is if they're negligent or sloppy or mess things up all the time. This is rare. Almost always outages in large systems are the combination of many factors — latent bugs, design flaws, abnormal load, etc, any one or two of which wouldn't take the site down. But when the combine in a perfect storm that nobody foresaw things fall over.
But now you have someone in your team who will never, ever make that same mistake again and should be your new go-to guy for all X related changes (X being DNS or what-have-you). Firing someone with that type of experience does not lead to success.
100% of all devs make huge mistakes, at least once.
> But now you have someone in your team who will never, ever make that same mistake again and that should be your new go-to guy for all DNS related changes.
I'm not entirely sure that's always true. For example, i've seen people introduce N+1 issues into a codebase, spend evenings fixing them and refactoring code to fix production issues... just to later introduce those very same types of issues.
Sure, you can learn from mistakes, have post-mortems and so on (provided that your org even does those and that anyone listens and cares about the conclusions from those), but to me it feels like the most foolproof way is to ensure that no-one can make these mistakes again, be it with a checklist (which tend to be ignored, honestly), or better yet, an automated CI step or a new test suite.
In my eyes, it's basically the same as with unit tests - everyone agrees that you need them, but people rarely write enough of them. So if you introduce something to prevent them from not doing what they should, e.g. a quality gate within a CI step which will disallow a merge once the coverage falls below a set margin, suddenly things are a lot better in the long run.
N+1 issues aren't nearly as devastating as N^2. Commend them for not putting your systems to a complete halt, then teach them how to reason about this properly.
Depends on the project, i guess: if you're unlucky enough to be working on a monolith and suddenly a page takes 5'000 SQL queries to load as opposed to 100, because someone thought that initializing data through service/DB calls in a loop is "easier" than writing views in the DB, it might still kill the entire system anyways, depending on the count of users.
And once this data initialization is sufficiently complicated and convoluted for you not to be able to rewrite it and them not wanting to rewrite it, all while "the business" is breathing down on your neck, you might either want to introduce caching (and possibly run into cache invalidation problems down the road), or just freshen up your CV.
I guess i'd also like to expand on the previous suggestion and advise others to consider performance/load testing as well, especially when coupled with APM solutions like Skywalking or even Matomo analytics, both of which can allow you to aggregate the historical page load times, CPM and overall performance of your applications, to figure out what went wrong when.
Still, that engineer (if its engineer's fault) is extremely unlikely to make that issue again. IMHO, the problem is systemic in that why the system allows such errors (if its human error) to happen. Given Google's scale I think a lot of the generally known scenarios are covered and what you see is tens of services interacting in not obvious ways. Those unobvious scenarios manifest in situations like this.
It makes no sense at all. After the outage you have not only a review of the causes and appropriate remedies, but also more experienced people who are now more aware of possible consequences of seemingly unrelated actions and will take extra care not to make these things happen in the future.
Also, such cases are rarely the "fault" of a single person. Or, the direct/immediate cause is often not the main one.
As time passes, the more big cloud providers there are, and the more complex they individually get (more products).
Assuming the chance of an outage is actually static. If there is one (highly reliable and trusted) provider with five products one year, and three providers with ten products the next year, the chance of you seeing an outage has gone way up because of surface area.
More reason to promote for self-host, and decentralised systems like smtp, matrix.org, ActivityPub. The idea of having all the data and server concentrated in a few player such as google, amazon, ccp is not reliable for the digital operation of the planet.
The question is: should we trust small, underfunded, hobbyist servers more than large corporations that have a money-driven reason to maintain high quality of services?
Self-hosting is a good option, especially when mixed with multi-cloud offerings.
The big rub is that it's really hard to approach the same level of availability the cloud offerings already give you. Depending on work-load, self-hosting is typically more expensive.
This is why the SRE book talks about availability budget. You can't have 100% uptime, so how much do you want to pay to get close to it?
I also can't think of the last time I saw Twitter's Failwhale, which almost became their default mascot at one point.
(Yes, I know they discontinued it, but I haven't seen the service "over capacity" in some time. I say this as a casual web user of it, not an active account holder or app user).
Everyone is bad at them. Trading desks, international trade offices, space related offices all have a set of clocks on the wall because even really smart people just suck at figuring out time zones. I even use https://everytimezone.com/ in lieu of a set of clocks.
What a needless complication. I wish we could all just switch to UTC and stop daylight savings time.
I can't imagine anyone taking kindly to a change as big as eradicating time zones. Get rid of DST, yes, but only if the U.S. goes up an hour (the extra sunlight at 6pm during DST is nice).
I usually check the HN website to see if my internet is down, both on my mobile and on my work computer. I used to just type "test" in the browser address bar hoping for the related Google SERPs but checking for HN seems way faster nowadays.
I don't think it's that so much as the fact that that half of the internet services that rely on Google along the line somewhere (but not visibly to the users) are down, at least for me. And yet the Google search page can be reached (again, speaking from my European perspective, sample size one). To make it even more confusing: these issues are limited to my WiFi. If I use my phone connection as a hotspot the problems disappear.
So from the perspective of a regular person it is very reasonable to conclude that it is their internet provider that is down.
It's valid criticism for projects that don't need the scale or features that cloud-based solutions offer.
A single machine won't give you the same level of scaling and management features, but also won't have hundreds of distributed moving parts that could break and take your service down as a side-effect.
I feel like you are selling the cloud short for SMBs. The majority of my career has been spent working for shops that have their own datacenters or colo. It's so freaking cheap to run a huge service it's not even funny. And it's a dream to have so much compute and storage floating around that you can "waste" it without even thinking -- it's already paid for!
However, when your small the economics flip. You want to run as little as possible as cheap as possible. And a load balancer pointing at a two asgs in different availability zones is by far the cheapest way to get something production ready on the internet. You can go from your garage to a 100M business and probably never need to graduate from this architecture.
Yep, we're hit by that right now. It's not a total outage, we're losing about 10-15% of our calls to bigquery from within cloudfunctions. What we have using a VPC connector is ok. Fortunately areas affected are ancillary (mainly monitoring, ironically), our service is still running.
Or maybe this is unrelated and it's just one of the many cloud outages contradicting their uptime promises. Welcome to the reality of cloud computing, where everything's foggy and the hallucinogenic fumes made us believe downtime was a thing of the past.
I guess the network admin who skipped the classes on Network Fundamentals, has moved on from their recent placement in OVH, to Facebook, then to Google.
You think that's how statistics works? Obviously it's possible to keep your own server running with 0 downtime. But you are exposed to a much higher risk of severe downtimes longer than the cloud provider would likely be. Be it hardware failure, grid, ISP, whatever.
That claim was a direct snippet from your original comment. So unless you meant that you laugh at people who are talking specifically about your personal server, I'd say you are being selectively literal.
Cloud kool-aid drinkers say "The cloud is always better"
I say that in many cases your own server is better. My company runs its own on-prem confluence, it's been taken down for updates on a regular basis at a known maintenence time. That's far better than losing it because a cloud based one was hosted on GCP this morning when we actually need the data
Obviously in many other cases the cloud is better. You wouldn't want to server a million cusotmers across the world from a single server in your own basement. That's not the only model.
I agree completely with that added nuance. Cloud infra is definitely overused and is sometimes seen as the only option these days, even for the simplest of projects. Having your own metal is often times much more convenient and cheaper. Plus it's much more enjoyable to work with.
My only gripe was with your gross over-simplification that read like "hurr durr, my server hasn't gone down this year, so self-hosting has better uptime than Google". It's such an unnecessary and baseless argument.
Vast vast majority of arguments you hear on HN is that it's impossible to run your own equipment. That's as ridiculous as saying you could run dropbox off a raspberry pi, but tends to get pushed to the top, and the schadenfreude when these events inevitability come along is too good to miss.
Every few months another major outage of another cloud provider hits the headlines, meanwhile millions of small companies have no problem with the uptime of their 'legacy' services.
I was at a farm a few weeks ago, the farmer had a server in a closet. It did break on occasion, when there was a power outage. His desktop and internet broke too, so what would the point in his server working.
If it was hosted on google it wouldn't have been working this morning, despite his computer being fine.
If you build your business processes around accepting failure, it's not a problem. It's far easier to keep at least one out of 3 machines online for 99.999% of the time than to keep a single service running for the same time.
To be fair, this is exactly how statistics work. The average downtime of self-hosted servers might be higher (arguably) but many people are under Google's yearly downtime (aka variance is high).
In my experience, it's very easy to achieve high uptime through luck. If I only run a single server, and it has a 10% chance of failing in any given year, I have a 90% chance of achieving 100% uptime in a given year.
In my experience it's also very easy to think you've got all your bases covered when actually you haven't. I'm protected against mains power failures by a UPS and a generator - but a UPS switchgear fault can cut my power even without a mains power outage. My server has dual power supplies and network cards - but that won't help me if a clumsy worker sent to replace the server above mine unplugs mine by mistake. And so on.
If I think I'm doing a better job than a billion-dollar corporation with hundreds of thousands of servers, does that mean I am? Or is it more likely I'm fooling myself?
I also have a personal dedicated server that never crashed or restarted this year. However I am not sure how much it was actually available. I know for a fact that there were multiple network issues at OVH. I also know that had my server been home, it would have been worse (Optimum residential is awful).
The server not failing is not the only outage mode.
Indeed. People do not realize just how advanced the reliability infrastructure of those services is. Things like diesel power generators have been baked into cloud datacenters for what, a decade now? Probably longer. Show me your alternative power source when the power goes out (and power does disappear, everywhere, eventually).
Diesel generators have been baked into my on prem equipment room for at least 40 years
For you average person working in an average office if the power goes out you're not going to be working anyway, so it doesn't matter if your server is offline too.
And an excellent corollary to this is that when you have lucky 100% uptime there is no incentive to optimize mean time to recovery.
Sure the raspberry pi in your closet has been running fine for years, 100% uptime, but then a component fails. Do you have a replacement on hand? Are you continuously monitoring it to know it went down? The component failed at 3am, did it page you? Did you hop right out of bed to rush to fix it?
Single systems can have really nice uptime until they don’t. Then you are hoping that the people on hand can repair what’s going on after months or years of never having to do that. Mean time to recovery might be a week while you wait for new hardware or a few hours while you google some error message you’ve never encountered.
People can run their own systems if they want to, but they shouldn’t confuse good luck with rigorous engineering.
Of course, how else would I know the exact failure times, my service monitoring only polls every few minutes. That doesn't help me with my knowing how my service is performing though, I'm not offering a ping service.
More importantly, I do know I had an error loading twitter at Thu 11 Nov 04:17:01 GMT 2021, however my websites (and google and hackernews) were working. At 17:53:01 GMT Twitter took over 2 seconds to load it's first http page, far beyond the normal 400ms. Google took 23 seconds to load this morning, BBC News just 0.071.
On the other hand I also need to provide services which can't cope with outages measured in milliseconds. Good luck with complaining to a cloud provider that your traffic vanished for 4 seconds. Those services thus have multiple connections on independent hardware and circuits with no single point of failure
This is so misinformed by industry propaganda. Modern cloud services are very often unavailable from a region without any apparent reasons. Or the service appears to be available but some specific feature doesn't work. Or it's available but just really slow or dropping packets.
When you have a "simple" (already quite complex) BGP(TCP(HTTP)) tunnel, chances are things just work and it's easy enough to diagnose issues. Between anycast, auto-scaling, and WAFs (among others) you have added so many layers of complexity that the chance of a "random" error somewhere in the stack has dramatically increased and diagnosis is close to impossible.
It used to be simple that web services are either up or down. Now it's not so obvious anymore, and that's definitely not being reflected in the 5 9's SLAs falsely promised by cloud providers. I can say with confidence most selfhosted systems i've been close to have much better uptime than modern cloud services and are much cheaper to service and maintain.
I agree with specific dimensions of this sentiment.
Firstly, I think that the GP comment ever so slightly misapplies the localized awareness available in smaller environments to the cloud. Yes, there are tons more engineers to look at stuff, but those engineers are a) already bogged down keeping up with internal infra, politics and bureaucratic machinery, and b) quite some distance further away from actual errors occurring on the ground because everything's aggregated to the hilt so the stats remain comprehensible at the higher scale.
I also agree that the newer stuff that is less mature has an order of magnitude more intrinsic leaky abstractions than old designs, which I do think were more hygienic and espoused more effective separation of concerns than what is used today.
Also, not to nitpick, but the "classical" canonical definition would probably look closer to BGP(TCP(TLSv1.3(HTTP/1.1))), with the modern equivalent being BGP(UDP(HTTP2)) and the future being BGP(UDP(QUIC)). I do agree that the rapid consolidation of HTTP and TLS, without wide-scale awareness and slow, methodological development of general introspection tooling, does make things net worse in general. I suspect infra will still be using HTTP/1.1 for a long time into the future until this materially changes.
My private website on my RPI is running now 2 Years without a problem and only minimal downtime due to rebooting for the new kernels.
It is amazing how much uptime you can achieve with a 5$ Computer in comparison to a 1730000000000$ (1,73 tera $) Company.
Even if you compensate for dynamic content.
Maybe Google's servers are also running without a problem and you can't reach it for a different reason. Self-reported uptime is not a good measure of server availability.
Was your home internet available all of the time? How many times did you reboot your modem?
And yet it doesn't mean his RPi-powered website needs to ever sustain that kind of traffic, so he doesn't have to take the additional risk of running a distributed system.
In contrast, in the cloud you typically take on the risk of the underlying distributed platform (optimized for managing thousands of VMs, etc) even though you only need a single machine for yourself.
FWIW I vaguely recall some kind of website that showed approximate bandwidth use... it was something like a cross between statcounter and alexa except for network stats and the like.
I think it purported Google to be hovering around 15Gbps. That sounded humongously wrong, like I'd expect global traffic <-> Google to at least be a couple terabits, right?
I'd be very curious if there's a way to actually ballpark the number. Or maybe it is in fact possible to implement services that reliably track this sort of thing, and my futile searches earlier just weren't finding them...
Didn't even noticed. Work doesn't use any GCloud components and personally its been a while since I've degoogled myself. No Google docs, mail or search. Not even the quad 8 nameservers.
due their reliability and over the tp. It always overfelivered. during its first decade, If anyone read any of the books from its founders, papers, stories related to the company and them it seemed to be the new "non zero day" infrastructure, however the longer the company alive is the more it feels just like any other.
Unrelated: not sure what is going on on HN since a few months but almost all comments I get to write get automatically downvoted, even little things that no one should really care much about.
https://en.m.wikipedia.org/wiki/Google_services_outages