Reducing complexity in the enterprise data center -- ADTmag

Reducing complexity in the enterprise data center

By Mark Hargrove
July 5, 2006

With the increasingly common use of distributed software components in service-oriented architectures (SOA), it has become essential that the underlying hardware for such applications have a high degree of redundancy. The reason for this is simple: if a hardware platform hosting a critical service fails, the entire enterprise application may be severely degraded, or even fail completely. Currently, the best way to avoid this problem is to implement at least two redundant servers, each running a separate instance of a software service. Depending upon the criticality of the service, more than two copies may need to be running.

Further complicating the situation is the issue of component isolation. Real-world experience with SOA-based applications has proven that while failed hardware will cause components to stop working, the converse is also true. Sometimes software components behave badly and effectively render the hardware useless. We have all seen hung networks and pegged CPUs that were eventually traced to a misbehaving software component. When applications are truly business-critical it is often wise to isolate services onto their own hardware.

Finally, transaction volumes and total system loads must be taken into consideration. The larger the expected loads, the more primary and backup hardware that is needed. Designing software services that run in an active-active, load-balancing configuration on separate servers gives us a way to make the “backup” servers help with handling larger system loads.

There is a hidden danger in this strategy. If the periodic “high watermark” load presented to the application needs both servers to maintain acceptable performance levels or response times, then we have effectively lost redundancy. This leads to the common strategy of using N+1 servers for each component—N servers to handle the high-watermark load and one additional server to provide coverage for a failure of a server in the “N” pool.

The bottom-line: Implementing redundant, isolated hardware in sufficient quantity to cover expected system loads can require a lot of hardware. A typical enterprise data center is proof—rows of 1U, 2U, and 4U servers neatly tucked into 7- or 9-foot racks.

Fortunately, all of these commodity servers are inexpensive—aren’t they?

Individually, commodity hardware servers are reasonably inexpensive. Adding “one more box" to the enterprise data center, though, involves finding rack space, consideration of power and cooling capacity of the center, adding copper and/or optical data cabling, and adding a KVM port. It further involves provisioning the server with appropriate OS and application software loads, and also generally requires adding a software management agent to the server to enable the server to be monitored from an enterprise systems management console. The ongoing care and feeding of a few dozen servers can stress one systems administrator. What happens when the data center houses hundreds or thousands of servers?

Looking for Alternatives

While there is currently no practical way to avoid the need for server redundancy, there are alternatives to the “one software component = one commodity server” model. One not-so-new strategy is to use blade servers instead of commodity “beige box” servers. Blade servers allow a substantial increase in computing density on a per-square-foot basis by consolidating certain requirements of each server into a common chassis. Individual blade servers house the CPU, memory, and disks. Power, external network connections, video connections, serial ports, and CD/DVD drives are provided by the chassis the blades are plugged into. Inter-blade connectivity is handled by the back- or mid-plane of the blade chassis. Intra-chassis connectivity still requires external switching, but decreasing cabling requirements by a factor of 10 to 20 is not trivial.

Another strategy for reducing data center clutter is virtualization. One of the irritating truths about deploying application components across large numbers of physical servers is that while certain components need isolation, they do not really consume significant CPU resources. Except during typical high-watermark loads, CPU utilization of large server farms is often very low. There are plenty of CPU cycles available but no obvious way to take advantage of them without compromising isolation principles.

Virtualization is actually an old idea—the IBM 360 mainframe supported several virtual machines running on the same physical hardware resources more than 40 years ago. With the advent of relatively inexpensive minicomputers in the 1970s, virtualization faded away from the mainstream for several decades. It never disappeared entirely, though, and the raw performance of modern processors makes virtualization a practical way to consolidate multiple servers into a single hardware platform.

Virtualization layers are available from storage vendor EMC (with its VMware product line) and from open source efforts such as Xen. VMware and Xen take quite different approaches to virtualization, but the basic result is the same: both allow definition and management of virtual machines (VMs) on a single hardware platform. Each VM can run its own operating system and can be stopped and started independently of all other VMs running on the same hardware. This capability permits you to achieve application component isolation while consolidating a significant amount of physical hardware. These software virtualization technologies leave users with challenges around quality of service, real-time resource allocation and autonomic system control.

An emerging interconnect-driven server (IDS) architecture aims to approach the problem from a communications and control perspective. Instead of layering virtualization software on top of legacy server bus and external networking architectures, it provides a new approach to the fundamental way in which computing and communications resources are integrated, optimized, and controlled.

The cornerstone of the IDS architecture is an extremely high-performance, fault-tolerant, non-blocking communications interconnect subsystem that provides a communications path between system components such as CPUs, memory, and IO modules. This is quite different from a simple interconnecting backplane because it scales to hundreds of thousands of components. Memory, processor cycles, and links between processors and I/O are all under common software control.

An interconnect-driven server architecture also offers a significant enabler for virtualization, since the notion of quality of service (QoS) is fundamental to the interconnect. Administrators can control, and set guarantees for, bandwidth between components. Further, since QoS is under software control, service-level guarantees can be as dynamic as needed, shifting bandwidth around based upon time of day or upon any kind of policy that can be algorithmically expressed. Instead of relying on static networks that connect user requests, processors, and storage devices, enterprises can deploy virtualized communications and computing resources through software commands.

Businesses need application agility. SOA and related software component architectures are delivering on this need, at the cost of increased complexity and clutter in the enterprise data center. “The network is the computer” is an age-old industry mantra that is finally being implemented through the emergence of an interconnect-driven server architecture. It promises to reduce complexity in the enterprise data center and bring about entirely new control and manageability characteristics that are simply not available today.