I don't really understand why this is happening at this scale, it's not like they just became broke and can't afford a proper server... can someone explain?
Agents are shipping code faster all over the world and in some cases 24 hours a day. Additionally, some significant number of non-developers are now developers i.e. they are also shipping to github regularly.
This is not limited to just pushing code but all the bells and whistles that github added as features under the assumption of some predictable growth are now exceeding the original plans.
I suspect a lot of their existing systems have to be re-architected for unanticipated scale, and it won't happen overnight for sure.
Pretty damning. Would also be interesting to see the number of commits overlayed. The graph tells a great story about the correlation with MS's takeover, but I wonder if at the same time that uptime went to shit, MS was shifting over large numbers of enterprise contracts to github. That would be a more complete story IMO.
None of which excuses this. Can you imagine someone's reaction in 2017 if you told them that github would be below 90% uptime in 2026? It would be unimaginable.
That’s nonsense. GitHub didn’t have 100% uptime before 2020. I remember downtime back then. And Microsoft didn’t make changes that fast. The only thing that changed is the accuracy of their status page.
Also go back and look at the unofficial status page from 3 years ago. It’s regularly above 99% and has been dropping steadily since then. Then in the last 3 months has dropped to below 85%.
And immediately find that there are numerous incidents that would show up on the modern status board as an issue but are reported with 100.00000% uptime on that graph.
2018-07-16 17:32:53 - We are investigating reports of elevated error rates.
2018-07-16 17:34:27 - We are investigating reports of service unavailability.
2018-07-16 18:04:38 - We've discovered the issue causing connectivity failures and are remediating.
2018-07-16 18:26:48 - We're monitoring the site as systems recover. Some delays are expected as we process backlogged data.
2018-07-16 18:37:26 - We're continuing to monitor and work on further remediation efforts as the site recovers.
2018-07-16 18:54:21 - The site is stable. We are continuing to monitor and work through follow-up remediation efforts.
And there are other incidents with connection failures or elevated error rates during July 2018, but the linked graph shows "average uptime of all components 100.00000%" during July 2018.
Another from October (that also shows 100.0000% uptime)
2018-10-21 23:09:19 - We are investigating reports of elevated error rates.
2018-10-21 23:13:31 - We are investigating reports of service unavailability.
The faster you move, the more you screw up, almost no company producing software have figured out how to move fast and not screw up. It's so hard, that companies even used to boast about how much they didn't care about screwing up, as long as they moved fast.
Add in new "productivity" tools that help you move even faster, with even less regards for how much you screw up (even though the tool could be used for you to move at the same speed, but with less screw ups), and an engineering culture which boils down to "Why not?", and you get platforms run by Microsoft that are unable to achieve two nines of reliability.
That doesn’t track because GitHub Enterprise Cloud has great uptime. This is all load based, vibe coded ai slop code shipped at record numbers from users who will never convert to paid. The real question is what are they doing about that?
Many, many years ago, when ~persia came ashore~ dropbox was announced on HN [0] The top comment was quick to point out "For a Linux user, you can already build such a system yourself quite trivially