Evaluating On-Node GPU Interconnects for Deep Learning Workloads

TitleEvaluating On-Node GPU Interconnects for Deep Learning Workloads
Publication TypeConference Paper
Year of Publication2018
AuthorsTallent, NR; Gawande, NA; Siegel, C; Vishnu, A; Hoisie, A
EditorJarvis, S; Wright, S; Hammond, S
Conference NameHigh Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation
PublisherSpringer International Publishing
Conference LocationCham
ISBN Number978-3-319-72971-8

Scaling deep learning workloads across multiple GPUs on a single node has become increasingly important in data analytics. A key question is how well a PCIe-based GPU interconnect can perform relative to a custom high-performance interconnect such as NVIDIA's NVLink. This paper evaluates two such on-node interconnects for eight NVIDIA Pascal P100 GPUs: (a) the NVIDIA DGX-1's NVLink 1.0 `hybrid cube mesh'; and (b) the Cirrascale GX8's two-level PCIe tree using dual SR3615 switch risers. To show the effects of a range of neural network workloads, we define a parameterized version of the popular ResNet architecture. We define a workload intensity metric that characterizes the expected computation/communication ratio; we also locate AlexNet and GoogLeNet within that space. As expected, the DGX-1 typically has superior performance. However, the GX8 is very competitive on all ResNet workloads. With 8 GPUs, the GX8 can outperform the DGX-1 on all-to-all reductions by 10{%} for medium-sized payloads; and in rare cases, the GX8 slightly outperforms on ResNet.