Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The tl;dr:

>we don’t use Elastic Block Storage (EBS), which is the main component that failed last week.



This gets even more weight when you consider that EBS was broken across multiple availability zones, which means that, had they used EBS, their first point would be invalidated.


More importantly smugmug was smart enough, when moving to the cloud to realize which components were the most failure prone and to stay away from those.

Not using EBS wasn't luck it was a conscious decision.


They didn't not choose it because of concerns about availability, they didn't choose it because of run-time performance concerns. I don't think you can argue that those concerns even imply anything about availability, much less have some kind of causal relationship.

SmugMug got lucky in their choice. If performance had been consistent with EBS, they would have used it and most likely gone down like so many others.


Not true. Our primary decision was based on unpredictable latency, but the fact that we didn't/don't trust EBS played a huge role. EBS mucks up our basic availability scenario - systems are no longer individual, disposable, replaceable units. I'm sorry if that wasn't clear from the blog post - I'll go re-read that part and update.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: