As processors advance, at what point must air-cooled heat sinks be replaced by liquid-cooled heat sinks?

From a practical standpoint, a transition from air-cooled to liquid-cooled heat sinks is required when power density exceeds approximately 0.64 W/mm2.  For reference, the latest Intel Xeon processors have an average power density of about 0.40 W/mm2.  Putting a finger on a precise date when power density will cross this threshold is not a simple task.  Chip designers are well aware of thermal constraints imposed by air-cooled heat sinks and are careful to avoid bumping up against those constraints. Often this requires dialing down base clock speeds to maintain acceptable junction temperatures, resulting in significantly lower performance than the processors are capable of if they were adequately cooled.

Ebullient heat sinks can redefine thermal constraints and thereby usher in a new era of processor design and performance that will lead to widespread adoption of liquid cooling.

Moore’s Law

Until recently, the number of transistors per unit area roughly doubled every 18 to 24 months. This was predicted by Gordon Moore in 1965 and is known as Moore’s Law. For about 50 years, Moore’s Law set the rate of semiconductor technology development and was an unchanging constant that enabled the computer, Internet, communications, and cloud revolutions.

Moore’s Law is Dead (or at Least Slowing)

Recently, Intel acknowledged that gains in transistor count will slow. Intel pushed back the debut of its first chips with 10-nanometer transistors from the end of 2016 to sometime in 2017. The company has admitted this wasn’t a one-off delay and that it can’t maintain the pace it used to (source).

While Moore’s Law began slowing only recently, advances in overall chip performance began slowing a decade ago due to thermal constraints. Below is a plot showing transistor count, clock speed, power, and performance per clock versus time. While Intel made steady progress increasing the number of transistors, meanwhile, clock speed, power, and performance per clock all stalled in the early 2000s.

As stated in an Executive Report of the International Technology Roadmap for Semiconductors (ITRS), clock speed stalled due to thermal constraints:

Even though the transistor count has kept on increasing now and then at Moore’s Law pace, and transistors are able to operate with each new technology generation at higher frequency than before, it has become practically impossible to keep on conjunctly increasing both of these factors due to physical limitations on power dissipation; one of the two features (i.e., either number of transistors or frequency) had to level off in order to make the ICs capable to operate under practical thermal conditions. Frequency was selected as the sacrificial victim, and it has stalled in the few GHz since the middle of the previous decade.

In short, air-cooled heat sinks are unable to effectively remove processor heat, so manufacturers have restricted clock speed to reduce the risk of overheating. By doing so, they’ve failed to tap into the full potential of processors they are creating. Instead, they’ve pursued performance improvements through costly development of new technologies – an approach that simply doesn’t make sense, especially when clock speed is low hanging fruit that can be seized simply by addressing the issue of heat.

More-than-Moore

In hopes making consistent performance gains despite the slowing of Moore’s Law, Intel has adopted a strategy it calls “More-than-Moore.” This strategy reduces the emphasis on increasing transistor count by augmenting it with performance gains from other technologies and approaches. Alternate approaches include increasing the number of cores, improving power management software, and increasing chip size.

All of these approaches have one thing in common: they all focus on improving performance while dancing around the fundamental issue of junction temperature. We propose addressing junction temperature head on with a better heat sink.

Junction temperature is a fundamental constraint

The maximum junction temperature of silicon processors is about 100°C. In most processors, software begins throttling performance at about 88°C to bring the temperature down to a safe level. If the temperature continues rising, software will shut down the processor entirely at 92°C to avoid physical damage.

When chip designers talk about “thermal constraints,” at the root of their concern is junction temperature. There are two ways to control junction temperature:

  1. Restrict heat generation (by limiting power consumption)
  2. Improve heat removal (by providing a better heat sink)

For years, Intel has focused on the first method by developing technologies that improve performance without appreciably increasing heat generation.  While many of these technologies are impressive and have delivered performance gains, the recent stalling of Moore’s law signals that the time has come to shift focus to the second method of controlling junction temperature – improving heat removal with better heat sinks.

Ebullient heat sinks, coupled with increased clock rates, are the most cost-effective way to improve processor performance.

Ebullient heat sinks address junction temperature head on. They maintain low junction temperatures by providing exceptional heat transfer rates in a compact form factor that can be packaged in any server.

By maintaining low junction temperatures, clock speed is unshackled from its stalled state of the last decade. As a result, clock speed can be increased to deliver dramatic performance gains.

By way of example, Rave Computer uses Ebullient heat sinks to cool NVIDIA Tesla K80 graphics cards in Cipher Series workstations. Each graphics card has a Thermal Design Power (TDP) of 300W and base clock speed of 560 MHz.  However, with Ebullient heat sinks, the clock speed of each GPU is safely boosted to 875 MHz indefinitely to provide an impressive 55% improvement in compute performance while maintaining junction temperatures about 20°C below throttling temperature.

At low volume, the cost to equip a processor with an Ebullient heat sink is about $300, including facility-side infrastructure. By contrast, other methods of improving chip performance require large R&D investments and capital expenditures with no guarantee of success. In a recent Gartner report, it is estimated that “a single system on a chip at the leading edge today can cost more than $300 million in design costs before entering production,” and “R&D costs alone for a new logic process generation are in the neighborhood of $2 billion.” Moore’s second law predicts that the capital cost of semiconductor fab will increase exponentially over time, so this avenue will likely become even more costly with time.

We are not aware of any technology that can deliver performance improvements that Ebullient can at a cost of $300 per processor.  Therefore, we believe Ebullient heat sinks, coupled with increased clock rates, are the most cost effective way to improve processor performance.

Ebullient heat sinks have the potential to rewrite the rulebook governing chip design, resulting in higher power densities and, in turn, widespread adoption of liquid cooling.

 

Related posts

Stay cool.