Wednesday, August 08, 2007

Inside Baseball

That's an interesting expression. When I first heard it I didn't care for it; because it is obscure to someone not knowledgeable about the American cultural idiom, as well as the internal politics of organizations.

...But, for someone who IS familiar with those things, it's a very useful and apt expression.

Wikipedia defines it as : "a colloquial term in American English referring to "behind-the-scenes" conversations or dialog that the average member of the public would have no way of being privy to."

The expression comes from Bill James, the "inventor" of sabermetrics (or more accurately SABR, for the Society for American Baseball Research); a system for objectively analyzing the performance of baseball players and teams using statistical techniques.

"Outside baseball" are those statistical measures that everyone can see, and which are based solely on objective reality. Inside baseball is those things that go on behind the scenes, and that no-one can see or measure who isn't "inside".

Sabermetrics has come to dominate the management of the sport of baseball, as well as minds of it's fans (especially those into fantasy leagues and sports betting); and the results are clear. Teams (and individuals) with a good grasp of the metrics outperform those that manage by the gut, by at least 20%; even when normalized for all other known factors.

Over the past 30 years it has become very clear that the principles of sabermetrics apply to a lot more than just baseball; they are a method of complex systems interaction analysis.

It just so happens that what I do is complex systems interaction analysis. I'm a systems and security consultant, but whether I'm working on physical security, information security, IT systems, or most anything else I do; my job is all about understanding complex systems, and how to architect, design, build, manage, maintain, and sometimes destroy them.

There's a problem with systems though... , the more inside baseball that goes on; the less one can analyze and predict the actions of systems based on measurable objective factors. Inside baseball breaks the system, unless you can account for it; and even if you have knowledge of the non-objective factors it can be incredibly difficult to compensate for them, because they are just that: not objective.

So, let's talk inside baseball... and let's talk about what it is I do, and have been doing for the past two years for the job I currently hold.

My official title is Sr. Architect. My function is as chief architect for one of the 4 major and 4 minor lines of business, for one of the largest banks in the world.

Each line of business has a chief architect, I am one of those 8 chief architects. There is no architecture position above ours, each LOB is it's own "kingdom" as it were. We do also have an enterprise architecture group however, which helps to set standards and develop technologies that bridge across all the lines of business. We work closely in concert with this group to ensure that the needs of each LOB are represented and met in these polices, which we co-author, evaluate, and approve.

As chief architect, I am responsible for designing, building, and implementing all of the systems, technologies, policies, and processes that go into the technological and information operations for my line of business.

Even more fun, the LOB I happen to be the chief architect for, is not a revenue generating LOB; it's one of the two "headquarters" lines of business as it were. One of the HQ LOBs does corporate operations like payroll, accounting, HR etc..., and the other, the one I'm chief architect for, manages all of the information management and business services infrastructure for the entire bank. So if theres a system that other lines of business depend on, like enterprise information mansgement, storage, data warehouse, security etc... or an enterprise wide system for anything like that... yeah, that's what I'm chief architect for.

As of our next re-org (which should be shortly. It's in final planning stages right now), I report to a group director; who is basically a senior vice president level. The group director reports to one of the CIOs (there are four), who report to the COO, and then of course to the CEO. The last re-org last year added one more layer to that, but when I started the reporting chain looked like this as well. Anyway, there are three guys between me and the CEO (or five guys in my chain, including me and the CEO).

I'm not trying to brag, I'm trying to paint a picture of the organization (the "inside baseball" stuff) so that you can understand what's going on.

Now, so far that all seems relatively complex, but clear. It isn't.

In theory there ar eclear lines of command, responsiblity, and priority; and clear lines of incentive. In reality, everything below the CIO level is "matrixed", meaning that everyone works on multiple teams which may have different goals and priorities. You may work for one guy, but do all of your work for another guy who doesnt evaluate your or effect your compensation, and can't order you to do anything.

The system seems almost deliberately designed to AVOID explicit responsibility or accountability; as well as to prevent innovation or dynamism... in fact I honestly believe it was. The "matrix" structure is designed to ensure that no-one ever gets blamed for anything because there's always fingers to point etc... etc...

So, I've got a very important high level position. I report directly to a sr. vice president level manager, and I have absolute yes or no authority over hundreds of millions of dollars worth of architecture, only overridable by one of the CIOs or the COO.

I don't own projects however. Each project is initiated and owned by the business group that sponsored it, and is paying for it. THey sign off on all work, pay all invoices etc... It's their money, they have to sign to spend it, and they have to sign off on the goods and services they receive.

Now, I need to ensure that all the projects we perform meet enterprise, line of business, and datacenter standards; and are aligned with enterprise and LOB architectural policies and direction. As part of this responsibility, I review all projects for the line of business above the $15,000 level, or all projects that have an architectural or enterprise impact of medium or higher.

This review can sometimes be very in depth and comprehensive and take weeks, or it can be done in a few minutes over the phone. Every project is different. I see projects ranging from upgrading the RAM and OS on a few blades, to the implementation of hundreds of clustered servers costing millions of dollars.

Often, the business cases and requirements I receive are unclear, and I work with our partners (that's what we call the business stakeholders) to clarify and document their business need. Very frequently there is a clearly understood bsuniess need, but not a clearly nuderstood technological need; and I work with the aplication owners to help architect, design, and implement appropriate solutions to these business needs.

On the toher side of things, the partner bust sign off on the designs we create for them; and they must approve the status of the proejct and the actions to occur in the enxt steps at each point in the process.

In this process I work with project do-ordinators and engineers from the various departments (software architecture, software engineering, database management, hardware engineering, hardware integration, capacity planning, network engineering, datacenter management, security, disaster recovery and business continuance). Each project will have either the business owner, or a co-ordinator representing the business owners (may be a co-ordinator, may be a project manager, may be an engineer or manager depending on the needs, and size of the project), and a dedicated co-ordinator from the line of business itself; to help manage and represent these groups administratively. My role as architect is to make sure all the needs technical, procedural, and policy needs are met.

Confused yet? Well I've only covered 1/3 of my job (the other two thirds have to do with developing technologies and systems; and developing policies and procedures); but it's the third thats relevant to our discussion today.

It really is all somewhat unnecessarily complicated. As with the matrix organizations, this structure is something of a jobs and job security program for a lot of the people involved. In many circumstances we could cut the number of people who "handle" a project in half (or in some cases much more).

Anyway, the point of all this is that I have both a strategic responsibility, and the organizational authority to ensure that all projects meet architectural standards and are designed and built appropriately. The partners have the money, and the signature authority over it at each stage.

Now, most of the time this whole thing (miraculously enough) works; slowly, but smoothly and appropriately. If the partner doesnt know what they need, we can steer them in the right direction, and help them build the appropriate solutoion. If they DO know what they need, we can work it into the architecture or make changes necessary to work it in... or even sometimes change the policies and processes to accomodate the partner when it's appropriate.

Most of the time.

I've got a "problem customer" as it were. I've mentioned him before, and the problem is the same basic issue we've had all along, but it has accelerated and expanded in scope.

Basicallly the issue is this: 18 months ago, he settled on a particular architecture and a particular type of machine; and signed support and procurement contracts outside of our normal enterprise agreements to meet the needs of this architecture and infrastructure.

Ok... not great, and he did it against recommendation; but it's something we can accommodate.

One thing we did specifically warn him about though, was that the platform he selected had a very specific limitation. It has a very limited maximum I/O throughput. This limit is just high enough that in a "normal" configuration it isn't an issue, but we warned him that if either his systems were to become more heavily loaded, or if he tried to expand the functionality on those systems that he would run into issues.

Well, for the past several months he has been trying to do more with these boxes than they were designed to do; and we HAVE been running into these issues, as he was warned about. Not only that, but he keeps adding more and more demands of hardware that has reached it's maximum limit.

We have been telling him since march that the boxes he's building can't support the load and functionality he wants to put on them. We have four hardware projects that were started back then, and in all four cases he asked for more interface cards than could actually fit in the boxes, never mind have the bandwidth for. In each case we informed him of this and scaled back his requests to a more reasonable level.

So far, he has taken each of these projects to just before the build stage, then requested last minute changes, asking for MORE than even the original configuration we rejected. In fact, he's done it to each of these four projects at least twice.

He's jsutifying this by saying that because he's logically partitioning the boxes, he doesn't have to provision more hardware (which can be true to an extent, but not to the extent that he wants here); and that because he isn't concerned about performance or availability, only in cramming as much stuff onto as few systems as possible, that he is will to accept this tradeoff and build the system as he's asking for.

Only we won't do that. What he's asking for is to split each system 8 ways, when they should only be split in two. It's POSSIBLE to split 8 ways, but it's jsut a bad idea. He's also tyring to fit four different functions on two systems which will mirror each other; saying that because the machines are partitioned this is sufficient. This is simply not acceptable; because we need to maintain a physical separation of at least two of these functions for disaster recovery and bsuiness continuation purposes.

Basically it doesn't matter if he ways he doesn't need any of that; we simply will not build or support systems that don't meet at least minimum standards.

We have told him that either he has three options:

1. Increase the capacity of the hardware he purchases (buy a bigger box that can handle everything he wants)

2. Buy more boxes, and split the load across them (somethign that is very easy to do)

3. Simply do less with the boxes

Each time I tell him this, he comes back with a design that does just as much with just as little; only rearranging the configuration a bit. Each time I send him back with the same message "these are our options".

He simply will not take the reality of the situation as an answer. He wants me to bend spacetime to fit more stuff onto a medium sized box than can possibly fit. Even if it did, and there were no performance issues, it would be completely against policies, and not aligned with architectural direction.

If I let him do this, not only would I be derelict in my duties as chief architect; but he would be very unhappy, because soon after installation, his boxes would fail spectacularly.

I delivered my final notice to him yesterday. He can do one of the three things I mentioned or I'm simply going to downcheck the projects. Not only that, but I'm not the only one who's told him this. Hardware engineering, hardware integration, and the datacenter guys have all told him the same thing... but he keeps re-arranging the colors on his visio diagrams as if that would make use suddenly see the light ad change our minds.

So, today, he's going to try and go to the CIO and get him to override us; which he very well may do. Generally speaking the CIOs allow the business owners to do so, because it's their money and if they want to be idiots, it's their responsibility.

If they go ahead and proceed with the override, then I'll write it up as not aligned with enterprise architecture or policy, and recommend against implementing it; as will the other teams I mentioned. This triggers a lot of unpleasant things should they go forward.

At that point he's going to have to go outside of normal purchasing channels, conduct the negotiations himself before forwarding them to contracts, get the vendor to build and install it (because our guys won't touch anything I don't approve), pay the datacenter to host it in racks and give it power and cooling, and then pay for a separate support contract from the vendor for the systems.

All in all, it's going to cost him about double to go ahead and do this. Choosing one of the options I presented him would cost 25% to 35% more.; and it will be less reliable, and less maintainable as a result.

Today, I on the other hand am going to short circuit him.

I'm having lunch with the vendor rep; a man I have an important relationship with (I get a lot of insider info, and free steaks from system vendors, because my yes or no means millions of dollars a year to their companies) . At that lunch I'm going to tell the him that if he sells my business partner (lord I hate calling them that) the boxes, services, and support, against our architectural and policy recommendations; that he can forget about selling any other boxes to anyone in our line of business for the next year.

And that my friends, is inside baseball.