well this is probably PR as there is no such system nor it can be made that can have 100% uptime. not talking about the fact that network engineers rarely work with servers :)
Fun fact, uptime goals are measured in nines – for example, 99.9% is three nines of uptime. If that one outage lasted an entire day, and they were never down at any other time, that would indeed be three nines of uptime.
Yeah, my net admin colleagues explained that one to me a while back because the bosses were making similar uninformed demands (“this needs to never go down!” “Sure, here is how much that costs”). It was very enlightening :)
Operations like this don’t have a single engineer. The more complex the project, the higher the risk of complications and outages. It’s not a matter of “oh, just think harder about your changes”.
Agree, but five nines are not 100% ;) Anyway - this discussion reminds me of Technical Report 85.7 - Jim Gray, which might be of the interest to some of you.
This is a software development business, which is a positively bananas trade no matter what’s getting written. And the smaller the business, the more hats network guys wear. We work with everything from the server app down to the coffee machine fueling the devs. And 100% uptime isn’t the most crazy demand I’ve heard. I’m sure Chujo is busier than a one-armed paper hanger with jock itch.
At least he’s got money to throw at his hosting company. Scaling up would have been much slower in the old days.
I’m not versed in videogame network infrastructures, but wouldn’t be enough just having a load balancer and a couple of instances to ensure “100% uptime”? At least before all instances and the load balancer itself decide to join a suicidal pact, but more instances mean less chance of a critical event happening, no?
At a press level, sure, and the same for the average user. Legally speaking these numbers do have significance, though. Amazon Web Services (at least at one time) offer a guarantee of 99.99% uptime for their infrastructure. That 0.001% covers things like once a year outages that make the news. A 10000th of a year is actually a tangible amount of time and not even Amazon is confident enough to ignore it.
well this is probably PR as there is no such system nor it can be made that can have 100% uptime. not talking about the fact that network engineers rarely work with servers :)
Not 100% but 99.9%… IIRC Guild Wars 2 servers had like 1 actual outage in 11 years. They have pretty amazing structure.
Fun fact, uptime goals are measured in nines – for example, 99.9% is three nines of uptime. If that one outage lasted an entire day, and they were never down at any other time, that would indeed be three nines of uptime.
Yeah, my net admin colleagues explained that one to me a while back because the bosses were making similar uninformed demands (“this needs to never go down!” “Sure, here is how much that costs”). It was very enlightening :)
a lot of things are possible if you are lucky enough ;)
Five-nines is entirely possible with enough resources and competent outage-minded engineers.
Hell. Five nines is doable with eks, a single engineer and thinking through your changes before pushing them to prod. Ask me how I know…
Operations like this don’t have a single engineer. The more complex the project, the higher the risk of complications and outages. It’s not a matter of “oh, just think harder about your changes”.
Ask me how I know…
If you’ve got a rant, I’m all ears
Agree, but five nines are not 100% ;) Anyway - this discussion reminds me of Technical Report 85.7 - Jim Gray, which might be of the interest to some of you.
If you just threaten your employees enough they river go down /s
This is a software development business, which is a positively bananas trade no matter what’s getting written. And the smaller the business, the more hats network guys wear. We work with everything from the server app down to the coffee machine fueling the devs. And 100% uptime isn’t the most crazy demand I’ve heard. I’m sure Chujo is busier than a one-armed paper hanger with jock itch.
At least he’s got money to throw at his hosting company. Scaling up would have been much slower in the old days.
I’m not versed in videogame network infrastructures, but wouldn’t be enough just having a load balancer and a couple of instances to ensure “100% uptime”? At least before all instances and the load balancer itself decide to join a suicidal pact, but more instances mean less chance of a critical event happening, no?
At a press level, sure, and the same for the average user. Legally speaking these numbers do have significance, though. Amazon Web Services (at least at one time) offer a guarantee of 99.99% uptime for their infrastructure. That 0.001% covers things like once a year outages that make the news. A 10000th of a year is actually a tangible amount of time and not even Amazon is confident enough to ignore it.