This is another one of those posts that has taken me a while to write - a little over a month, in fact. Today I added a few final sentences and feel it is finally good to go. Whew! I feel the subject was important enough to be worth the effort. I hope you feel that way too.
Back at the end of March, I read an online article by Bill Snyder at the Infoworld site called Multi-core to leave developers in dust?. It concerned the looming challenge of leveraging the parallel processing resources being made possible by the emergence of inexpensive, soon-to-be-utterly-ubiquitous multiple-core CPUs. Snyder feels the IT industry is not ready for the challenge. I agree. I first wrote about the problem last December in a post entitled The Landscape of Parallel Computing Research: the NY Times and Berkeley EECS. In many ways this post is a continuation of that thread.
Here's a bit of what Snyder has to say:
That's not to say that the IT industry is scoffing at the potential benefits of multi-core processing. But the mountain between IT and some future multi-core promise land -- namely, the task of developing parallelized apps that keep pace with continual core advances -- is huge, says David Patterson, the Pardee Professor of Computing Science at UC Berkeley and director of the parallel computing lab. "It's the biggest challenge in 50 years of computing. If we do this, it's a chance to reset the foundation of computing."
In the short run, Patterson says, we can parallelize legacy software and gamble on getting value out of eight cores. But that would be only an interim solution, as such apps would not scale to 32 or 64 cores, he adds.
What is frustrating is that this problem didn't exactly sneak up on the industry. Chip development cycles are very long, and key software developers are well aware of what's moving through the pipeline. Sure, software always lags hardware. Many of us complained that we didn't have software that would take advantage of 500MHz back in the '90s. But what Patterson and others call the multi-core revolution poses problems for developers that are qualitatively different than the problems of the past. Why wait so long to get serious about solving them?
The answer to the last question is, in my opinion, a personality characteristic common among IT personnel, which is the always-incorrect belief that my current technology - architectural model, operating system, hardware platform, programming language, integrated development environment, application framework, or what-have-you - is the be-all and end-all of IT evolution.
Case in point: Not long ago I was indirectly involved in the port of an application from Ingres on a Vax to a Java/J2EE platform on Windows 2003 Server (web container) and JBoss/Oracle on Linux (EJB container and DBMS). A single programmer had worked on the system for almost 20 years, was not far from retirement, and had much fear of and no interest in learning the new technologies. What was sad about the situation is not the programmer's mindset, but the fact that her managers had allowed the situation to devolve into the state it was in at the time of the need for conversion. If the leaders don't lead, the followers oftentimes don't follow. This was by no means the only time I have encountered this mindset malaise.
The parallel processing paradigm problem is not uniquely associated with multi-core CPU chips. It's characteristic of cloud computing as well. There are many possible cloud computing architectures, but my favorite is one in which complex tasks are divvied up among multiple physical devices, transparent to the end user of the application. Google's ocean of off-the-shelf boxes running Linux is the archetype of such a system.
Another popular manifestation of cloud computing is virtualization, in which end users "see" a virtual machine which may be one of many running on a single physical computer. Virtualization is advantageous to adherents of the status quo because only the data center and technical support staff need know of its existence. Application developers, like end users, don't know or care about virtualization. This has the short-term advantage and long-term disadvantage of eliminating the need to upgrade IT staff skills requisite to the adoption of a new programming paradigm. Virtualization is an appropriate solution for the deployment of legacy "fat client" dat processing applications, but it does not address the emerging need for parallel programming paradigm brought on by the exponential increase in the amount of data that must be managed and analyzed.
Cloud computing is considered by many pundits to be the Next Big Thing in computer architectures. When the cloud consists of machines powered by multi-core processors and end users demand the ability to process increasingly vast amounts of data quickly and efficiently within the increasingly tight fiscal constraints on enterprise IT, the need for better tools and an upgraded manager and developer mindset will become critical.
The need for data-intensive computing
How serious is the need for parallelization? Ian Gorton and others put together a special section on data-intensive computing in the April 2008 issue of IEEE Computer magazine. They point out the many domains in which exponential growth of available raw data is occurring, including but not limited to biology, physics, and business informatics. I would add computer gaming to the list, but it hasn't quite yet emerged as a "serious" mainstream computing domain.
In their introduction to the section, entitled "Data-Intensive Computing in the 21st Century", Gorton et al. write:
Fundamentally, data-intensive applications face two major challenges:
- managing and processing exponentially grow- ing data volumes, often arriving in time-sensitive streams from arrays of sensors and instruments, or as the outputs from simulations; and
- significantly reducing data analysis cycles so that researchers can make timely decisions.
There is undoubtedly an overlap between data- and compute-intensive problems...
Purely data-intensive applications process multiterabyte- to petabyte-sized datasets. This data commonly comes in several different formats and is often distributed across multiple locations. Processing these datasets typically takes place in multistep analytical pipelines that include transformation and fusion stages. Processing requirements typically scale near-linearly with data size and are often amenable to straightforward parallelization. Key research issues involve data management, filtering and fusion techniques, and efficient querying and distribution.
Data/compute-intensive problems combine the need to process very large datasets with increased computational complexity. Processing requirements typically scale superlinearly with data size and require complex searches and fusion to produce key insights from the data. Application requirements may also place time bounds on producing useful results. Key research issues include new algorithms, signature generation, and specialized processing platforms such as hardware accelerators.
As I mentioned above, I already wrote a post about a paper by UC Berkeley's EECS folks (Adanovic et al.) entitled The Landscape of Parallel Computing Research. I quoted quite a bit from the paper in that post, and I quote even more in the remainder of this post. It is over a year old but still highly relevant.
Since real world applications are naturally parallel and hardware is naturally parallel, what we need is a programming model, system software, and a supporting architecture that are naturally parallel. Researchers have the rare opportunity to re-invent these cornerstones of computing, provided they simplify the efficient programming of highly parallel systems. [italics added]
The authors go on to describe some of the symptoms of the coming paradigm shift, in the form of a comparison of old versus new axioms of conventional wisdom (references and citations refer to items defined in their paper):
In the following, we capture a number of guiding principles that illustrate precisely how everything is changing in computing. Following the style of Newsweek, they are listed as pairs of outdated conventional wisdoms and their new replacements. We later refer to these pairs as CW #n.
- Old CW: Power is free, but transistors are expensive. New CW is the “Power wall”: Power is expensive, but transistors are “free”. That is, we can put more transistors on a chip than we have the power to turn on.
- Old CW: If you worry about power, the only concern is dynamic power.
New CW: For desktops and servers, static power due to leakage can be 40% of total power. (See Section 4.1.)- Old CW: Monolithic uniprocessors in silicon are reliable internally, with errors occurring only at the pins.
New CW: As chips drop below 65 nm feature sizes, they will have high soft and hard error rates. [Borkar 2005] [Mukherjee et al 2005]- Old CW: By building upon prior successes, we can continue to raise the level of abstraction and hence the size of hardware designs.
New CW: Wire delay, noise, cross coupling (capacitive and inductive), manufacturing variability, reliability (see above), clock jitter, design validation, and so on conspire to stretch the development time and cost of large designs at 65 nm or smaller feature sizes. (See Section 4.1.)- Old CW: Researchers demonstrate new architecture ideas by building chips.
New CW: The cost of masks at 65 nm feature size, the cost of Electronic Computer Aided Design software to design such chips, and the cost of design for GHz clock rates means researchers can no longer build believable prototypes. Thus, an alternative approach to evaluating architectures must be developed. (See Section 7.3.)- Old CW: Performance improvements yield both lower latency and higher bandwidth.
New CW: Across many technologies, bandwidth improves by at least the square of the improvement in latency. [Patterson 2004]- Old CW: Multiply is slow, but load and store is fast.
New CW is the “Memory wall” [Wulf and McKee 1995]: Load and store is slow, but multiply is fast. Modern microprocessors can take 200 clocks to access Dynamic Random Access Memory (DRAM), but even floating-point multiplies may take only four clock cycles.- Old CW: We can reveal more instruction-level parallelism (ILP) via compilers and architecture innovation. Examples from the past include branch prediction, out-of-order execution, speculation, and Very Long Instruction Word systems.
New CW is the “ILP wall”: There are diminishing returns on finding more ILP. [Hennessy and Patterson 2007].- Old CW: Uniprocessor performance doubles every 18 months.
New CW is Power Wall + Memory Wall + ILP Wall = Brick Wall. Figure 2 plots processor performance for almost 30 years. In 2006, performance is a factor of three below the traditional doubling every 18 months that we enjoyed between 1986 and 2002. The doubling of uniprocessor performance may now take 5 years..- Old CW: Don’t bother parallelizing your application, as you can just wait a little while and run it on a much faster sequential computer.
New CW: It will be a very long wait for a faster sequential computer (see above).- Old CW: Increasing clock frequency is the primary method of improving processor performance.
New CW: Increasing parallelism is the primary method of improving processor performance. (See Section 4.1.).- Old CW: Less than linear scaling for a multiprocessor application is failure.
New CW: Given the switch to parallel computing, any speedup via parallelism is a success.
I have quoted a lot from the paper, but given that it is 56 pages long I hope I haven't violated "Fair Use" restrictions. In any case, my motivation is to get you to look at their paper, not to supplant it. I consider it a must-read for persons interested in discerning the shape of things to come in the IT world. Many informaticians are not also computer engineers (myself included) so some of the jargon above and in the document is intimidating, but if nothing else a skim-through and close reading of the introductory and concluding sections will provide a glimpse into the nature and magnitude of the challenges we will face in coming years.
It is vital for enterprise IT executives, managers, and staff to become aware of the coming paradigm shift. As I indicated earlier, this is merely the latest wave to hit us. We humans have an astounding capacity to forget the evanescence of past paradigms and to assume that today's world is the only possible world.
Comments