Enterprise Grid Computing Solutions
Recently I realized I had got a bit fuzzy on the difference between clustered computing and grid computing. I found a good explanation at a page entitled Enterprise Grid Computing Solutions. According to them, my conflation of the two is not unusual. Here's how the folks at Avarsys (http://www.avarsys.com) clarify the similarities and differences with a good example, along with another from Matt Michie at the HiveArchive.
First,
| Aspect | Cluster | Grid |
|---|---|---|
| Homogeneity | Single type of processor/OS | Any number of different types of processor/OS |
| Resource Types | Computation | Computation, I/O, Storage |
| Architectural Volatility | Static: Fixed number of processors and other resources | Dynamic: Resources can come and go |
| Co-Location | Highly localized | Can be anywhere on the LAN/WAN/Internet |
| Scalability | Limited by network latency | Limited by currently-available resources; scales upward easily as more become available |
The offer this example of the advantages of a grid architecture:
Grids offer increased scalability. Physical proximity and network latency limit the ability of clusters to scale out; due to their dynamic nature, grids offer the promise of high scalability.
For example, recently, IBM, United Devices, and multiple life-science partners completed a grid project designed to identify promising drug compounds to treat smallpox. The grid consisted of approximately two million personal computers. Using conventional means, the project most probably would have taken several years — on the grid it took six months. Imagine what could have happened if there had been 20 million PCs on the grid. Taken to the extreme, the smallpox project could have been completed in minutes.
Clusters are better at algorithmic problems that require lots of computational power and frequent if not continual communication between processes executing in parallel, for example a large e-commerce site. The loose coupling of grids makes them better for searching and sorting problems, which is why Google has done so well, as described in a post from the HiveArchive entitled The Coming Battle Over Grid Computing and Internet Services:
Some have estimated that Google’s data centers have well over 100,000 COTS PC’s setup in a distributed grid. Google is running Linux, which can run headless without a video card, or the need to install any GUI package. Linux has been “designed” to be completely scriptable from a command line interface.
Google's grid sounds pretty cluster-like compared to the life-science grid we saw above. As one moves farther from the cluster and closer to an unmistakable grid, one of the key problems to solve is intermittent connectivity, which I described in my 2001 scenario planning paper on application architectures as one of the key issues for computing in the coming decade (now half gone).
We saw a life science example, but is this really key to the future of medical informatics, in particular research informatics? The National Cancer Institute definitely thinks so, as evidenced by their caGRID project, part of their Cancer Bioinformatics Grid initiative (aka caBIG™):
caGrid 0.5 is providing the necessary infrastructure for caBIG applications to leverage the following grid infrastructure capabilities:
- Indexing and Registry Services
- Metadata Management
- Common Data Elements
- Controlled Vocabulary Semantics
- XML Schema Management
- Security Services
- Discovery and Invocation
- Data Service Toolkit
- Analytical Service Toolkit
It will be fascinating to see how this plays out. I'm really curious to see how this plays out, especially as the grid architecture becomes enmeshed with the agent architectures I have been watching for the past several years and written about recently in regard to Novo Innovations in my post entitled Agent Architectures - the next paradigm for software engineering?.











Comments