Petascaling Commodity onto Exascale: GPUs as Multithreaded Massively-Parallel Vector Processors - the Only Road to Exascale
Abstract: The first commodity x86 cluster Wigraf exhibited paltry 10s to 100s Megaflops in 1994 using 486s; since then we experienced orders of magnitude boost in performance. However, the first Petaflop was achieved with the LANL RoadRunner, a Cell-based cluster, and in 2010 we may see the first (GP)GPU-based cluster reaching Petaflops. Is such trend of using non-x86 "accelerator" merely a fad to push the flops superficially, or are they fundamental to scaling? If it is the latter, what are the missing pieces? Based on the from TSUBAME 1.2, the first and still the only GPU-accelerated cluster on the Top500, we show that GPUs not only achieve higher performance but also better scaling, and in fact their true nature as multithreaded massively-parallel vector processor would be fundamental for Exascale. Such results are being reflected onto the design of TSUBAME2.0 and future successors.
