In-Depth

Clusters a’plenty

Linux’s maturing clustering and failover capabilities have quickly positioned it as an attractive alternative to high-cost Unix systems, whether they are monolithic or distributed. And if their business models would permit it, Unix vendors could also create low-cost clusters. Some might say, then, that the real story here is the cluster. There are plenty of indications that the fabric of computing is changing once again, and app development managers can expect to see some “rocking in their world” in turn.

As more clusters roll out, AD teams are in closer contact than ever with networking and systems people. There is more to come. The gallop to Service-Oriented Architectures (SOAs) and Web services will surely involve clusters.

ADT has endeavored to cover Grid computing (see “Self-healing systems” by Colleen Frye, September 2003, and “Grid passes buzzword stage ” by Lana Gates, February 2004), which will likely require some significant changes in the way apps are written. One school of thought holds that Grid computing is still a ways off, except for big firms with big, well-defined problems. As in the days of parallel super-minis (remember Convex, Multiflow, Sequent and Alliant?), the best customer is, for example, the petrochemical concern looking for hidden oil. Meanwhile, clusters are here.

Cluster folks typically suggest that applications can run on clusters without “human intervention” -- in other words, you do not have to change your application. But, said at least one player, you may want to change your app after you get used to the ways of clusters. The emerging metaphor is “virtualization,” and it covers several layers of computing fabric, of which clusters, especially Linux clusters, are the most vivid example.

Attack of the Amazons
At Amazon.com and elsewhere, clusters got a big boost as companies began to expand Web server farms. Tom Killalea, vice president of infrastructure at Amazon.com, said the key steps in his company’s full-scale move to Linux began in 2000 with HTTP servers. In 2001, remaining commercial app servers moved to Linux. Put another way, server load balancing was first to go to Linux, then to active-standby fault-tolerant clusters and then to distributed message queuing systems. Beginning last year and finishing this year, DB servers are getting the Linux treatment.

Ahead are Linux-based data warehouse clusters. This is an “interesting case,” Killalea said. The data warehouse is significant for this mega-site; Killalea said it was “more than 14 terabytes to start.” Throughput is of special interest for Amazon.com, which specializes in personalized site presentation.

While some brew their own warehoused clusters, some will buy. Trusting in that, start-up Metapa Inc., Sherman Oaks, Calif., launched its Linux Cluster DB for business intelligence in May.

At Google, the developers famously divvied up the problem of indexing the Web using a large helping of clusters. Fault tolerance, much of it created by folks able to tweak Linux, is what makes these clusters work. It is their ability to tolerate failure that has led the inexpensive PC-based clusters to vie with top-of-the-line Unix machines.

“These machines have mean-time between failure of three years,” said Rob Pike, member of the Systems Lab at Google Inc. “We expect one to die every day for every 1,000 machines. You know every day some are sure to fail. PCs are unreliable, but they are cheap and fast.”

Young Linux clusters may not be bought for the traditional reasons -- in other words, high availability may not be as important as performance and cost. Cost cuts have been obtained because Linux is cheaper than Unix, and because Linux efforts are concentrated on the Intel chip architecture.

The area is plush with players. A very tentative list would include Hewlett-Packard; Lakeview Technology Inc., Oakbrook Terrace, Ill.; Platform Computing Inc., Toronto; PolyServe Inc., Beaverton, Ore.; Red Hat Inc., Raleigh, N.C.; and Veritas Software Corp., Mountain View, Calif.

While it has been mainly an Intel-architecture phenomena, Windows clustering -- still in its infancy -- is also threatened by the Linux cluster march. Both platforms look to supplant Unix. Apps that require high-server throughput or server consolidation are targets.

Among the prominent start-ups working in clustering software is PolyServe. We spoke not too long ago with Steve Norall about PolyServe’s doings in Linux clustering, and he gave us a peek into some of the firm’s Windows plans, which were displayed in more detail at the recent Microsoft TechEd event. There, PolyServe released a version of its Matrix Server shared-data clustering system for the Windows platform. It also announced an alliance with Microsoft that places PolyServe as a Microsoft Certified Partner.

“Linux is being viewed as the replacement for the status quo Unix installed base that is out there,” said Norall, general manager, PolyServe Linux Solutions. “You have a value proposition that is extremely favorable, and that is to leverage the cost efficiencies of Intel-based software.”

Traditionally, clustering has been for high availability only. But companies like PolyServe have been pushing Linux -- and now Windows -- clustering for storage, performance and scalability. Among the leaders at PolyServe are individuals who were at Sequent in the previous parallel era.

Norall said his software does not require developers to change the way they do things. The infrastructure software, he implies, should take care of the deployment details. But he does suggest that the way you design and program software will change if you know that the app you are building can be “parallelized.” Much of the interest in Web services revolves around this premise.

Virtualization
Clusters gained ground in storage apps as well. Somewhat congruent technology arrives from many distinct computing points. Storage clusters, Web servers, app servers and more can join to present the appearance of a single machine. “What is interesting right now is the confluence of the streams, but this has been going on for decades,” said Dan Kusnetzky, vice president, system software research, at IDC.

Kusnetzky and associates see “virtualization” as the common thread in various evolving segments. The question is “How do you gather multiple matches and present the illusion of one?” IDC divides the world of Virtual Environment Software into the following categories:

  • Virtual access software -- This includes Microsoft’s Terminal Services software, Tarantella, and software from Citrix. Apps can be unaware of the underlying operating environments and hardware platforms.
     
  • Virtual application environment software -- Including app servers with failover and load balancing. Parallel database software resides here as well. Players include BEA, IBM and Oracle. 
  • Virtual processing software -- This includes parallel processing, clustering, load balancing, data and app availability software, and virtual machine software. Notable VM offerings came from VMWare (acquired last year by EMC) and Connectix (acquired by Microsoft). This technology derives in great part from VM work of the mainframe era. (In July of this year, VM Big Daddy IBM joined their ranks. IBM announced Micro-Partitioning technology in the “IBM Virtualization Engine,” allowing each processor in the new eServer p5 system to support up to 10 “virtual” servers. The goal was to transform the economics of IBM’s Unix systems but, notably, the new IBM eServer p5 systems can simultaneously support AIX or Linux on POWER distributions in separate dynamic partitions.) 
  • Virtual Storage -- This software allows applications to be unaware of where and how app and data files are actually stored. This category includes storage replication and file system software. The software supports both storage-area network (SAN) and network-attached software (NAS) hardware configurations.

The acquisitions noted above show that vendors are working to fill the holes in their portfolios in this realm. Sometimes it is done via acquisition. Sometimes, as seen for example earlier this year in news that Sonic Software had improved the fault-tolerance of its JMS products, it is done by building. At other times, it is done with deals, such as the pairing earlier this year of BEA Systems’ Tuxedo and WebLogic servers with Veritas’ software for utility computing.

Vendors will continue to fill in the holes in the virtualization spectrum, Kusnetzky noted.

Role of a lifetime
We asked Kusnetzky, can you really get into this virtual world without changing the way you develop?

As a rule, he said, developers develop for the OS environment or application server environment with which they are familiar, without having to know the details of clustering or virtualization. At the storage layer, particularly, this is largely invisible to the developer.

“If you are writing to, say, an app server environment, the intelligence might be written into the app server to do replication and workflow management among the replicas, as well as checking for failures and seeding the last transaction to the surviving partner so that it can be completed on another machine in case of a failure,” noted Kusnetzky.

“It is possible for things to be developed at several layers without the developer being very aware of it,” he said. But he added, that is truer for new apps. Old apps may need refits or workarounds to be virtualized effectively. In some cases, it may not be possible to bring them in at all. Then, Kusnetzky said, the organization must figure out how to virtualize the environment around these blocks of code and data.

One might add intelligent routers to the list of confluent technologies heading toward Grid, utility or on-demand computing in the future. Along the way, more and more IT constituencies will come to be involved, said Damian Reeves, CTO at Zeus Technology, an application-centric traffic management software house in Cambridge, U.K. Some of this role will be familiar, while some of it will be new.

That is because manageability and security are issues, said Reeves. “We have been seeing a lot of people getting involved, and it has been getting more complicated. There tends to be a great topology of dependencies for them to work with,” he noted.

“There are three tribes of people that get involved with products like [Zeus’ Extensible Traffic Manager]: The ’Net people who have all the flashing lights in the data center; the app people who code away in Java or the DB admins; and then there are the security people who sit between the network teams and the app teams. A traffic manager product brings those people together,” Reeves said.

APIs for performance are already of interest to development teams, but in the brave new world that is birthing, such APIs will become more pivotal.

“This affects application people in a couple of ways,” said Angelo Pruscino, vice president and principal architect for the Cluster and Parallel Storage Technology Group at Oracle Corp. “Here, we are devising a set of APIs so an app can call and get monitored automatically from a performance and high-availability point of view, and get restarted if need be. Or if it runs in a Grid or a cluster, we can send a signal back if something happens. If nodes become highly utilized or some resource fails, we can notify the app that is interested in that event so [that it] can take recovery action.”