Knowledge Management Basics Inform System Release Lifecycle
I’ll make a confession: I’m not organizationally gifted.
This was painfully clear recently when I frittered away frustrating moments of a busy day searching for a document that I needed … now! It occurred to me that, had I organized and versioned the document more appropriately, I would surely have what I needed—without delay and without the heart palpitations.
(Incidentally, it also occurred to me that if a horse had a horn it would be a unicorn; some things just are as they are and so they’ll always be; alas, my organizational skills may remain as elusive as a mythical horned beast of childhood lore. In the meantime, I’ll argue that there is a method to my madness. Really.)
It was during this small and mundane moment that I got thinking about the state of enterprise IT today. Most IT shops look an awful lot like my desktop. Of course, I’m only slinging documents. IT systems are a different deal entirely.
For IT, in the haste to meet deadlines, a lot is left to chance. Many best practices are traded for paths of least resistance. Systems are cobbled together using more art than science; what software is used—sources, versions, dependencies—remain unclear. So, when the system finally makes it to production, it immediately falls down—or threatens to do so just as soon as you fiddle with anything. For many organizations, update processes are as unsettling as a surgeon with astigmatism.
A profound lack of transparency ensures that, when change occurs, bad things happen. It’s no wonder that 80% of IT outages are attributable to system updates—it’s the Law of Unintended Consequences enacted, ratified and applied in full force.
You can trace the origin of this Law to the way systems are constructed today:
- Developers grab whichever IT-issued platform is available to build an application
- They throw it over the wall to QA, who recreates the system based on whichever version of the platform they have laying around; they resolve undocumented dependencies, tweak, tune and fiddle until it’s working
- They certify the system and pass it to the production teams who recreate the system, again, based on their latest version of the platform
During each of these handoffs, software versions and system configurations drift. Dependencies are missed, found, resolved, poorly documented and, consequently, missed again. Deployments fail, the system cycles back to development and test for hot fixes and recertification. Finally, the production team gets the system to run.
Exactly what gets deployed is anyone’s guess.
When it’s time to update the system, the scenario becomes even more gruesome—resulting in one of three certifiably dysfunctional behaviors:
- Updates are deferred (putting compliance and performance at risk)
- Changes are made blindly (leading to outages and job retention issues)
- Changes are tested extensively (slowing IT and adding significant cost)
Of course, there’s a better way. It’s an approach based on the principles of knowledge management, … which brings us back to my document issue.
Just as I need a consistent way to organize and version my documents, IT needs to do the same for all of its system artifacts.
The solution begins with a “definitive software library,” which is a fancy ITIL description of the source of record IT organizations must establish for controlled reuse and management of system artifacts across the operational lifecycle.
From this version-controlled software library:
- App dev gets the latest platform against which they build a new application
- They then check in the complete, dependency resolved system—application, OS and middleware components—which is identified by its own version number
- QA is alerted and they pull down the very same version, run their test suite
- Production grabs the certified version and successfully provisions the system
Dev, test and prod are all accessing the same exact version—no drift, no confusion.
And once a system is deployed, a version manifest becomes the basis for seamless change. Updates can be quickly matched to system inventories, change impact is clearly document, and patches and updates can be implemented incrementally.
When outages do occur, restoration is a simple as reverting to the previous version.
It’s a refreshingly simple answer to a complex operational challenge.
So, with that under control, I suppose it’s time to look at my document issues …
Standardization vs. Flexibility in the Cloud
Recently I discussed systems management with a Fortune 1000 IT architect who is planning a next-generation internal cloud. That IT organization (like most large ones) is extremely diverse with deployments ranging from high-uptime, professionally managed grids down to two-server apps managed directly by developers. For a big shop like that, internal cloud poses some difficult questions:
- What is the right architecture and tool selection for our new internal cloud?
- How do we bridge the gap between today’s wildly heterogeneous existing systems and tomorrow’s clean, standardized cloud environment?
I’m interested in question #2, and I think cloud proponents tend to sidestep it.
There’s a tension behind that question, and it isn’t in the technology stack. It’s this: Can you have too much standardization — is there a balance to strike?
I think of standardization as a spectrum. So let’s make up a completely nonscientific metric for it. I’ll creatively call it “S.”
At S=0%, we have the totally nonstandardized world, where every system is a unique snowflake, with special kernel and application versions, one-off package sets, single-instance configurations, random sets of running services, etc.
At S=100%, we have the perfectly standardized world, where all systems have identical OS builds, application stacks, and configurations; only application data varies from one system to another.
S=100% sounds great, so why is there a tension? Because perfect standardization just doesn’t work for real data center systems. It’s a great model for ISV-published appliances (to support an appliance I sold you, I want your applicance to look exactly like everyone else’s), and for corporate desktops (otherwise the corporate help desk can’t scale). But for the data center, there are many good reasons why small groups of systems need special applications and configurations. Those reasons stem from business requirements: The diversity of systems mirrors the diversity of the business, and changes to the systems mirror the agility of the business.
Of course, S=0% isn’t the right model either because it costs too much, scales too slowly, and exposes you to too much risk. With hand-built/hand-managed servers, a single sysadmin can manage few servers safely and effectively.
I believe that enterprises are typically between S=0% (messy shops) and S=30% (well-run shops). They have some portion of their desired policy documented in text files and spreadsheets, and they are managing a portion of their environment with automated tools, but there are far too many arbitrary, unnecessary differences between systems.
They want to get to about S=70%: Everything that can be standardized should be standardized, as long you don’t give up the flexibility, agility, and freedom to make small-scale, system-specific, and temporary changes to the environment.
So what does this have to do with the cloud? Every cloud architecture I’ve seen is designed for very high standardization, say S=90%. And that’s a problem for real enterprises.
IT needs automation that can address the gap between too little and too much standardization — the gap where real-world systems actually live. Cloud architectures like IaaS and PaaS are great, but if you don’t have process and automation that can handle both customization and standardization, how many real services will actually make it to the cloud?
That’s a product driver for rPath release automation. We designed it to use layered, version-controlled models for every element of the system stack, which makes it possible to leverage the low marginal cost of cloud scalability without giving up flexibility and customizability.
When planning your cloud strategy, I suggest keeping these points in mind:
- Cloud architectures on their own tend to assume an unrealistically high degree of standardization.
- Existing IT process and automation tend to optimize for diverse, under-standardized systems.
- You have valid, lasting reasons to retain system flexibility and customizability.
- Match your move toward intelligent infrastructure with a parallel move toward intelligent automation that can balance flexibility and standardization.
I believe that without that step up in automation, few cloud deployments will deliver on their promise.
Intelligent Change: The Goal of Intelligent System Automation
The following is an excerpt from The 6 Musts of Intelligent System Automation.
Get your free copy here.
Change is a dirty word for enterprise IT. This is because change is destabilizing, leading to outages and escalating IT costs. As a consequence, changes are avoided—and undertaken only when absolutely necessary.
Today’s IT organizations face three bad choices as it relates to change:
• Update blindly and hope for the best—making out-of-band changes and praying to whichever divine spirits look upon IT for a positive outcome.
• Defer updates and hope for the best—again, calling on the same divine spirits for help and hope in allowing continued stability, security and performance.
• Pass changes through extensive QA cycles—fully testing every change to ensure it doesn’t wreak havoc on production systems.
Weekly change review board meetings can run all day and beyond—debating which of the three options to apply for each of hundreds of proposed changes.
And this is why change is such a dirty word; change is expensive!
The reality is that the pace and frequency of change is not letting up. As systems become more complex—drawing on various third party and in-house components—the sources of change are increasing. Add to this the popularity of Agile development techniques and you get a sense of the plight of enterprise IT.
IT is simply unable to consume change rapidly, economically or predictably.
The answer is intelligently automating change, which begins with the system model. This system model defines the exact profile of a deployed system—the components, versions, dependencies, etc.—providing a basis for delivering surgical, predictable updates to deployed systems.
With a system model in place, patching and updates become far less traumatic. It enables four principles that are essential to intelligent system automation:
• Update only what is necessary—creation of a system model allows you to specifically match patches and other updates to deployed systems. This allows you to update only the systems that require changes—rather than blindly implementing wholesale updates because of unknown system inventories. This also prevents the equally pernicious problem of not updating all systems that need it.
• Update incrementally—a deep and detailed understanding of the system profile allows you to implement more granular patches and updates. Why risk redundantly updating an entire platform or component when only one file changed?
• Don’t repeat yourself—by defining systems hierarchically, where variants are derived from common platform components, you can patch and update the base platform once and cascade the changes across all system variants.
• Optimize system definitions—a rich understanding of dependencies allows you to reduce update volume by excluding unnecessary platform elements when systems are constructed. You may start by shedding the most obviously unnecessary components or optimize for true thin provisioning with JeOS.
The addition of system version control provides additional update benefits:
• Rollback when necessary—a complete system version history allows you to easily revert to any previous version for restoration and troubleshooting. Of course the entire practice of intelligent system automation makes system outages less frequent, but when they do occur, system version control provides a reliable safety net. This also reduces testing burdens by ensuring that changes are reversible. Today, many IT organizations test extensively around change cycles because the changes cannot be reversed predictably.
• Test as a unit—when testing is necessary, system version control allows the entire system definition to be tested as a unit. This improves the efficiency of test cycles by eliminating the possibility of configuration drift between dev, test and production phases and virtually eliminating the risk of deployment failures.
• Troubleshoot latent system conflicts—sometimes system conflicts don’t appear until well after the offending update is made, creating a murky chain of causality where conflicting updates are sandwiched between innocuous updates. A versioning foundation allows you to corner the culprit change and run it down like a rabbit by reproducing and testing past system versions until you home in on the offending change. In the source code world, this is referred to as “bisection”; traditionally, it hasn’t been possible in the world of operations.
20 Questions for Fast IT Troubleshooting
I’m thinking of a number between one and a million. Can you guess it? If I tell you whether your guesses are too high or too low, how many guesses do you need?
Thanks to the speed of exponential growth, you only need 20 guesses. A mere 10 guesses more and you get to a billion. Just keep splitting the range of possible answers: 500,000 too high, 250,000 too low, 375,000 too high, … When searching for a needle in a haystack, you make incredibly rapid progress if you can split the problem in half at each step.
Programmers use this trick in lots of ways. One great application is in source code version control, where you can use the trick to rapidly chase down bugs.
Let’s say you run a new test and find a latent bug (one that’s been in the code for who knows how long). It happened sometime in the last million changes to the product, and it’s too subtle to find by debugging. What do you do? You pull out the 500,000th version from version control (which never forgets). Bug present? Narrow down to older versions. Bug absent? Narrow down to newer versions.
Repeat up to 19 times and you find the exact change that caused your latent bug.
That technique is called bisection. It’s a standard feature in modern, strong version control systems for source code.
Neat, but how does this affect IT operations? Latent bugs can happen in any complex changing system, not just source code. In particular, they happen all the time on deployed servers. One administrator changes a configuration to close a security hole. Thousands of changes and many months later, a new update for an application fails due to a hidden incompatibility between the app and the old configuration change. How do you find the problem?
With system version control, you can apply bisection to deployed systems — physical, virtual, or cloud-based. System version control can precisely reproduce any previous version of an entire system (as a new image or by incrementally altering an existing system). In short order you can find the exact change that caused the problem, no matter how old — and like a bad time-travel movie, you can fix the present by altering the past.
That’s the power of strong version control applied to systems management.
One of my favorite bits in The Visible Ops Handbook — the best guide to intelligent IT operations I’ve seen — is the culture of causality. When something goes wrong, high-performing IT organizations first look at deliberate changes that may have inadvertently caused the problem, because they know that 80% of IT failures are self-inflicted.
Of course, actually finding the culprit can be like finding a needle in a haystack. But with system version control, that’s something we know how to do.
Microsoft’s New Cloud Versioning
From Carl Brooks, we hear that Microsoft quietly released a new feature for their freshly-launched PaaS offering, Microsoft Azure: OS version control. Details are scant, but it appears to let Azure consumers select an underlying OS patch level (a combination of operating system and multiple patches) from a menu provided by Microsoft. Without that feature, applications in Azure run on a particular patch combination that is entirely controlled by Microsoft.
I can understand why the release was stealthy — for production apps, the platform would be useless without it! Platform patch control, even if coarse-grained, is table stakes for the cloud. Service managers know that no platform’s patch stream is safe enough to apply, untested, to operational services without significant risk of downtime. That’s what makes system patching so difficult; patches have unintended consequences that vary by application.
This brings Azure more in line with the standard of flexibility set by Amazon EC2, which for years has offered a menu of approved kernels (AKIs) that you can bundle into your custom images (AMIs).
But here are couple of things to think about:
- Why for cloud only? Selecting an OS/patch combination from a menu is a powerful paradigm. It simplifies patching rollout and rollback to the simple task of picking a new (or old) patch level from a drop-down menu. That would improve productivity for any admin or service manager — on virtual servers and physical servers as well as in the cloud.
- Why for the OS only? From a business service’s point of view, the service stands on a tower of software (application, app framework, middleware, OS, and virtualization foundation). All those layers need patches and updates for security, stability, and functionality, but any update at any level can topple the entire tower and take down the business service. Selecting a complete, approved, known-good patch level for the entire stack would be a powerful model for controlled business services.
The concept of system version control does exactly that. It applies the idea of OS versioning broadly (across all types of systems — physical, virtual, and cloud) and deeply (from application content all the way down to the kernel). When a complete server is under system version control, rollouts and rollbacks are as simple as selecting a known-good complete stack from a menu. Intelligent automation, not manual labor, should do the heavy lifting of finding the right set of changes to move safely from one version to another.
Microsoft has the right idea for improving Azure, but a long way to go. I’m interested in how fast they make progress over the next few quarters.
How Should Apps Get Into the Cloud?
James Urquhart at CNET kicked off an interesting discussion around app deployment for cloud: Do we need a new standard unit of delivery for cloud apps?
James suggests that IaaS and PaaS standardize on a bundle of the information that a cloud needs in order to run an application, including:
- Metadata
- The actual software
- Deployment and configuration metadata/scripts
- Runtime orchestration and service level metadata/scripts
The benefits of this approach? Customers and ISVs could build a reusable, easily deployable library of apps, and cloud vendors (and open source cloud projects) could improve their adoptability and time-to-value by absorbing standardized apps.
James points out (in reference to a Chris Hoff post) that, essentially, VMs (and their canonical incarnations, OVF files) are just too coarse as a vehicle for app deployment. I totally agree. Traditional operating systems were designed for persistent physical infrastructure, and deploying apps as entire VM images simply begs the question — you can deploy more quickly at first, but you still have your app intertwined with a living, messy operating system, with all the care, feeding, and cost that entails. (And it ignores the torrid pace of change in modern IT; images are perpetually out of date.)
In the application-centric data center, you design infrastructure to serve your apps, not apps around your infrastructure.
But is his proposal the right way to go? Here’s my take on it in the context of rPath’s system version control:
It’s easy to cleave the app from the OS on a paper diagram. But for the massive library of real-world apps, and for many new apps being written right now, the boundary between app and OS is much fuzzier.
How many COTS apps or in-house apps in your environment will work on a vanilla OS build with no extra libraries or system dependencies? At enterprises we talk to, applications with special, mutually incompatible installation requirements are rampant, even in standardized frameworks such as J2EE.
In real-world data centers, it is impossible to consistently abstract away the OS without breaking the majority of applications. (And that’s really by design — modern OSes are designed to be flexible and modular so app developers can be more productive. The API surface between app and OS is growing over time.)
Can a standard app bundling format capture all the possible dependencies for a large fraction of apps? It’s conceivable for some PaaS frameworks, but it would be very difficult for general apps on IaaS.
It’s the classic tension between too much and too little standardization. OVF, and (in my opinion) James’s proposal are too rigid for real-world apps (even if you could afford to rebuild existing apps on the new framework). But the status quo, where every system is a unique snowflake, is no good either.
rPath offers a third way, and the best of both worlds. We have a strong version-controlled dependency model that lets you install apps and OS components they way they were designed — nothing needs to be repackaged or refactored into artificial containers. From that you describe simple system blueprints (e.g., on this system I want RHEL 5.2 plus my standard JBoss stack plus my in-house billing app). Then, rPath can:
- Generate whole system images — But purely as deployment accelerators, not persistent containers.
- Generate incremental updates — From the same system model, you can deploy the app and any changes to existing systems.
- Accommodate customization — If the app needs to be deployed slightly differently at a line of business, you can use the layered, hierarchical rPath system model to accommodate the variant without losing the work that went into the base system.
So what should be the new standard way to bundle apps for IaaS? We don’t need one. In real-world environments, lack of a standard format isn’t the problem. The problem is lack of intelligent automation.
Attacking the Global Collision of Complexity
Symantec just published their 2010 data center report. They polled 1,780 IT managers and VPs at enterprises of different sizes across all geographic areas. I recommend giving it a read — it’s easy to skim and they made a number of interesting observations.
Some key findings in that report confirm what we’ve been saying at rPath:
- Data centers are becoming more complex and harder to maintain — A third of IT managers say too many applications and too much complexity is a big or “huge” problem.
- Server counts are exploding across all operating systems — I was surprised to see that even the fifth-fastest growing OS is growing at 14% a year.
- Staffing remains tight — IT staffing isn’t growing to keep pace; at most organizations, it is flat or declining. 50% of organizations report being somewhat or extremely understaffed. And lack of budget is the chief culprit for 80% the understaffing.
So what’s the way out of this collision of complexity? At the data center architect level, here are some must-have strategies (that the big IT software vendors aren’t talking about):
- Deep automation — Doubling or even tripling productivity through image-based provisioning and workflow scripting just isn’t enough. Architects need to go one level deeper and automate the harder problems that are commonly gated on senior IT staff, such as:
- Automatically determining which systems to patch in a maintenance window
- Using change history bisection to quickly and accurately run down the root causes of outages
- Cascading changes down a hierarchical data center model instead of applying changes (manually or automatically) to flat lists of systems.
- Heterogeneous models — For many reasons, IT is changing the underlying platform (app server, OS, virtualization environment, internal vs. external IaaS, etc.) faster than ever. That means that any system build that locks in a particular environment is just going to result in re-work for you in the future. Good system models permit late-binding decisions on everything from OS flavor to physical vs. virtual deployment.
And that’s our product strategy in a nutshell. We’re taking automation to a deeper level than any other vendor, with the first version-controlled, target-independent, hierarchical model for systems. If you’re seeing these problems in your data center, we’re here to help.
Dealing with Change
Let’s face it: Nobody likes change.
And nobody likes it less than enterprise IT, which has come to fear change as a malevolent force—the unwelcomed houseguest—that invariably leads to unintended consequences.
When change arrives, bad things tend to happen.
Of course, IT has good reason to be fearful—change is incredibly disruptive to production environments. And it’s becoming more so with the growing complexity of software systems—more sources of change, faster rates of change and more systems to maintain.
IT has good reason to be afraid.
That’s because—and here’s the dirty little secret—there’s a lot we don’t know.
- We don’t know what software is running—systems are constructed in a way that is incredible manual and ad hoc, changes are often made out of band, and, ultimately, what software is running is often anyone’s guess.
- We don’t know what needs to change—since system inventories are poorly understood, we can’t effectively match updates and patches with the systems that require change. As a result, we end up blindly implementing changes.
- We don’t know the impact of change—poorly understood system inventories means poorly understood dependencies. This breeds stultifying conservatism, excessive testing, and often production outages.
- We don’t know what a system should look like—since there is no consistent blueprint for the “correct” system definition, it is difficult or impossible to keep systems in sync across dev, test and production phases.
- We don’t know the how to rollback and restore a system—when outages do occur, troubleshooting and restoration is costly and time consuming because there is no complete version history for the system. Isolating and troubleshooting the root cause becomes an incredibly stressful exercise.
As a result, dealing with change has become a high priority for IT. Process frameworks like ITIL have emerged to provide the tasks, procedures and checklists—the best practices—for dealing with change. This sort of rigor is a step forward for IT, but it has made change cycles slow and bureaucratic.
This is because little has been done to advance the state of IT automation.
This certainly isn’t to argue against the merits of ITIL and other methodologies for dealing with change. In fact, just the contrary—change process must be codified and consistently followed to prevent chaos. The point is that these processes must be intelligently automated to deal with the exploding scale of IT and the pressure to improve IT process velocity and business responsiveness.
Adding bureaucracy to deal with the sort of change problem IT organizations face simply doesn’t scale in the face of the budget pressure and the need for speed.
The key is to automate as much of the change process as possible, but to do so intelligently. Yesterday’s approach to simply scripting manual tasks will only cause the wrong things to happen—faster.
The solution is to improve the change process itself by focusing on two new principles for enterprise IT: The system model and system version control.
The system model is about creating a blueprint for how systems should look and using that as the basis for constructing and maintaining the system over time. The model tells the whole story: Exactly what software is on the system, what policies it must adhere to, the entire dependency chain and the impact of change.
System version control tells that story over time: What is the exact definition of the current versions in dev, test and production? What was the definition of the previous version, before a change was made? What is the difference between the two? Once you have this sort of version history, isolating root cause is simple and rollback and restoration is as easy as reverting to the previous version.
The question becomes: If IT had this level of transparency and control, wouldn’t change become less daunting? Wouldn’t it put an end to the handwringing, the extensive test cycles, and ponderous change review meetings?
What if IT had IT had a persistent system blueprint—a model that described the deployed system in detail? What if all change was driven through this model?
What if everything—and I mean everything—was version controlled?
Dealing with change in the coming age of complexity requires a change in thinking.
Professor Einstein said that repeating the same behavior and expecting a different outcome is the definition of insanity. It’s time for a change in how we deal with change.
It’s time to stop the madness!
Virtual Images in the Real World
Virtual images are the coin of the realm in cloudland.
In this post, Dustin Amrhein (technical evangelist for WebSphere at IBM) nicely lays out the case for OVF (Open Virtualization Format) images in the cloud. He points out the tension in virtual images:
- No matter how complex your software and configuration, once you have it working on a sample system, you can capture it as an image and know it will work. Great — perfect standardization.
- But thanks to configuration differences, every instance of that image needs to be a little different. Perfect standardization means I need a different version of my stack for every system — and I have a lot of systems. Not so great. Sprawl!
Dustin points a way out of that dilemma: OVF configurability. OVF lets you parameterize configuration in virtual images. And WebSphere is using OVF to deliver parameterized virtual appliances.
At rPath, we agree! For ISVs delivering complex software, OVF-parameterized appliances are the way to go. (And BTW, we have a great ISV product to help you produce those appliances.)
But what about the enterprise? Do snapshots and OVF parameters work for enterprise app deployments?
Unfortunately, no. Thanks to the pace of enterprise change — and the variety of sources of that change — the snapshot ’n parameterize approach is unworkable. For ISVs (like WebSphere), there’s a single stream of predictable change (the release roadmap). But for an in-house web app in an enterprise, change is like a game of dodgeball, with requests coming from every direction:
- Content and functionality changes from in-house development
- Custom requirements from different lines of business
- Patch streams from OS and app infrastructure vendors
- Changing requirements from internal and external compliance authorities
Enterprise apps are just too hairy, and parameterized images just aren’t flexible enough. Which is why, despite the massive investment in virtual infrastructure, most enterprises are managing virtual machines the same way they manage physical servers.
So how does an enterprise get the benefits of standardization without creating more chaos?
Here’s step 1: Model your systems completely, and use ephemeral images to accelerate provisioning.
Building a system is one thing. To build it correctly and keep it correct going forward, you need a deep policy model. Precisely what OS, app infrastructure, and app components should be on each server, and how should they be configured?
A great policy model can define complete systems as well as updates for existing systems.
And now we get to the right role for virtual images in the enterprise world: Use images as provisioning accelerators. With the right policy model, you can regenerate an image whenever you want, use it to stamp out some servers, then discard it. That’s the core of enterprise rPath. rPath deeply models (and version-controls) your apps, app infrastructure, operating systems, and complete systems. From that model, rPath can not only generate images (for physical, virtual, and cloud environments) for instant provisioning, but also produce incremental, surgical updates to keep those systems current.
Images may be the coin of the realm, but a policy model is the mint that stamps the coins. Our upcoming series of blog posts will sketch out the money-making machine that is IT automation. Stay tuned!
Mass Customization Informs IT Management Practices
For IT, there’s a classic—almost epic—tension between flexibility and control. It’s the idealized hope for diversity reeled in by the practical need for standardization.
You can see this tension play out as the saga between application development and IT operations today. On the “apps” side, you see the passionate idealist—the artist who sees nothing but possibilities. On the “ops” side, you see the pragmatic realist—the analyst who knows the costs of these possibilities all too well.
As the action unfolds, you see application development make its plea:
- Please add the platform components that my application requires
- Please remove the platform components that conflict with my application
- Please don’t force me to upgrade to the latest platform
The conflict builds as IT operations responds:
- Use the standard platform we’ve provided
- Upgrade to the latest version
- IT can no longer afford to support all of these variants to suit your needs
And there lies the heart of this conflict: For IT, the cost of managing diversity is staggeringly high. Provisioning and maintaining software is complex enough in the best of circumstances. But when versions, variants and customizations proliferate, IT is left to care for a complexity mess of massive proportion.
There is no antagonist in this story. It’s not about right and wrong, parsimonious and profligate. In fact, there’s wisdom and good intention on both sides. It’s a story of naturally opposing interests finding their equilibrium—quite rationally—in the form of a new model that accommodates both flexibility and control.
You can see the same drama play out in the world of manufacturing. (OK, software is technically a form of manufacturing, but I mean the traditional form of manufacturing where you can more readily drop something on your foot.)
In this world, the demand side of the business (think of it as “apps”) puts pressure on the supply side (think of it as “ops”) to customize products to suit the (faddish, fickle, fleeting) markets they serve. Of course the supply side resists; the cost of retooling, producing and maintaining such diversity is way too high.
So high, in fact, that Henry Ford famously declared the Model T would come in any color the American public desired, so long as it was black. Ford, like others, recognized that the cost of maintaining many versions and variants of complex systems as a matrix (many parts, many versions) is nothing short of crushing.
This recognition gave rise to mass customization, a manufacturing model that allowed great flexibility in product specification without the incremental management overhead. Today, mass customization allows manufacturing organizations to have their cake and eat it too—flexibility and control.
It’s achieved by blending the best of standardization and customization—the “base” components are standardized and the “customized” aspects are layered on top. The lifecycle of each variant is managed rigorously to prevent chaos.
Think of the example of an automobile. The same chassis may support the minivan and SUV, but the functionality and fit-and-finish are quite different.
We see the same dynamic playing out in IT. The solution isn’t clamping down on software diversity—that’s too constraining. It’s also not about accepting diversity at any cost. It’s about changing the model by which we manage software—which brings us back to the saga of apps and ops.
Today, IT is under pressure to modify OS and middleware platforms to suit the needs of diverse and ever-changing applications. For IT operations, the inclination is to standardize on a single off-the-shelf platform.
But that’s not practical. Today, IT is forced to provision and maintain multiple platform versions and variants—without a control model for doing so. They’re forced to trick out the Model T when they’re only set up to support one-size-fits-all. IT accommodates because they don’t want to stand in the way of application innovation—but their acquiescence comes at a very high cost.
rPath was designed to tackle this very challenge. It provides a highly scalable management model for diverse software systems, allowing IT operations to take on ever more variability without an ounce of added cost. It’s a model for mass customization of software systems—IT can have its cake and eat it, too.
This week, rPath announced a management solution for Red Hat Enterprise Linux, allowing Red Hat customers to bring this level of flexibility and control to the diverse software systems they’re wrestling with.
You can also attend the webinar we’re hosting on Dec. 10th with Linux Magazine and featuring Lee Thompson, the former chief technologist at eTrade Financial to learn how you can have your cake and eat it, too.

