A reader from the NoR asks the questions "Is it so hard to keep a 2000 person network running?"
"Why is it that the IT people here can never actually get the computers to work for a whole 2 months straight? I'm sitting here again with people breathing down my neck for TPS reports (no, not really) and I can't work for more then 10 minutes without the damn computer locking up on me. The entire department, and from what I hear, sections of other floors in the building have this problem too. I've pretty much given up trying to work and am spending the day picking my ass and surfing. Grrrrrr.....
...
Is keeping a 2000 person network consistently up and running that difficult? I honestly don't know. IT guys, what do you say?"
I'm an IT operations manager, for a 2500 person network (450 in my office, 8 offices total), in a 100 million dollar company (27 million for my division). Yeah it really is that hard.
Computers are stupid and unreliable. So are people (as opposed to persons).
To figure out how often a system will go down (at least partially) take the MTBF for each critical component, and divide it by the number of critical components.
For example, the MTBF for modern SCSI hard drives is staggeringly high (in excess of 10 years)... actually they've started using something called an anualized failure rate, which is between 1% and 0.5%.
Of course in an organization with 2400 users, and over 20 terabytes of storage arrays, there are something like 5000 hard drives in the organization. That means in an average year, we are going to lose 50 hard drives.
Actually it's probably more like 100 or so.
And that's just dead drives, never mind major data errors...
And then theres user error...
The critical MTBF for the operator is harder to determine. The best bet is that for every 100 people, someone will do something stupid that will cause a critical system to fail, once per week.
This number doubles for healthcare or government workers. It quadruples for programmers, but in most companies they are kept away from the production systems.
Not in my company.
Yes, I'm changing that. I may need a shotgun and a baseball bat to do it; but it is getting done.