"When we do multiplication on the chip, we
can do it the same way you learned it in grade school," Brad
McCredie, Power6's chief architect, said in an interview.
McCredie also presented Power6 details at the Fall Processor
Forum here Tuesday.
Binary math is the ordinary mode for Power6 and a natural
for computers: The two digits can conveniently be
represented by voltage differences and other yes-or-no,
up-or-down, on-or-off differences. But humans, graced with
10 digits, generally opted for base 10, or decimal,
mathematics, and about a little more than half of numeric
stored in commercial databases is decimal, McCredie said.
But precision problems can crop up when computers translate
numbers into binary to perform a calculation, then translate
back to the decimal system to present answers. For example,
10 percent of $1.50 should be 15 cents, not 14.9999 cents,
he said. Consequently, regulations require that some tax and
government applications perform math using decimal-based
calculations, McCredie said.
"There are a lot of software packages so people can run
decimal math," he said, but performing the instructions in
hardware speeds up processing by a factor of two to seven,
he said. It's still slower than binary math, though; the
chip can't do as much in a single clock cycle.
Power family competition
Power6, a dual-core chip IBM will begin manufacturing this
year for servers going on sale in mid-2007, is the latest in
a series of server processors that are central to Big Blue's
recovery in the Unix server market. In terms of revenue, IBM
reached the top spot in the market in 2005 over
Hewlett-Packard and Sun Microsystems, though the company has
given back some of those gains in the first half of 2006.
The Power family, which also includes lower-end PowerPC
models, competes chiefly with Itanium chips from Intel,
Sparc from Sun and Fujitsu, and x86 chips from Intel and
Advanced Micro Devices.
The Power and PowerPC lines will grow one step closer
together with Power6, which incorporates the AltiVec
instruction set that speeds up many multimedia tasks.
AltiVec, also known as VMX, increases efficiency by letting
a single processing instruction be applied to multiple data
elements. That's helpful for video and audio tasks on
desktop machines, but servers will benefit as well in, for
example, high-performance computing tasks such as genetic
data processing, McCredie said.
Adding AltiVec was a tradeoff, he said. It's a valuable
feature, but electrical current "leakage" problems in
today's chipmaking technology mean that even idle parts of a
chip consume power and produce waste heat.
Power6 will run at speeds of 4GHz to 5GHz, IBM has said. "It
will be closer to 5GHz than it is to 4GHz," McCredie said.
To keep up with the faster clock speeds--about twice that of
the current 2.3Ghz fastest Power5--IBM increased the
Power6's communication abilities. Where Power5 can transfer
data on and off the chip at a rate of 150 gigabytes per
second, Power6 can do so at 300GBps, McCredie said.
IBM also has moved some higher-end reliability features from
its mainframe line to Power6, he said. The idea is to catch
and fix as many errors as possible before software has to be
interrupted.
At each cycle, the chip records the state of all the data
it's storing; if an error is detected, the chip can revert
to its previous state to retry the processing step, McCredie
said. If the error is more severe, the entire state of the
processor can be moved to a new processor core, an ability
called "CPU hot spare."
In addition, every data pathway is checked to make sure data
isn't corrupted as it moves within the chip, he said.
Each Power6 chip has dual processing cores, and each core
has 4MB of high-speed level-two cache memory to itself,
compared with a 2MB shared cache in Power5. In addition, the
two cores can share an optional 32MB of level-three cache
separate from the chip, McCredie said.
Each core can simultaneously handle two instruction
sequences, called "threads." The performance of the second
thread is about 55 percent of the first on database
transaction tasks, McCredie said, which is about double the
performance of the second thread on Power5.
To improve virtualization abilities, Power6 can be
subdivided into as many as 1,024 separate partitions, each
with its own operating system. Customers aren't likely to
want slivers that thin, though, he said. "I don't think
we're going to deliver that to the customer. I think we're
going to stop at 200 or so," McCredie said.
A Power6 chip can connect directly to three others in
four-socket groupings using a first-tier communication
fabric. And each of those groupings can connect directly
with seven others over a second-tier communication fabric.
The two-tier fabric keeps all the processors' cache memories
synchronized.
|