

# New Site: Paderborn Center for Parallel Computing (PC<sup>2</sup>)

#### **Christian Plessl**

Paderborn University, Germany Paderborn Center for Parallel Computing



Cray User Group Conference 2019 – Montreal – 11 May 2019

### Paderborn: Germany's Chippewa Falls

- Heinz Nixdorf (1925–86)
  - founder of Nixdorf Computer
  - businessman, sportsman, donor
- Major player in business computing
  - headquarter in Paderborn
  - > 30'000 employees worldwide
  - > 20 countries
  - > 5 B DM revenue
- Our local Seymour Cray
  - or Steve Jobs
- Remains of Nixdorf Computer seeded IT
   industry in our area





Heinz Nixdorf

## **Paderborn University**

#### • Founded in 1972

- 20'000 students
- 260 faculty members, ~1200 PhD students and postdocs

#### • Departments

- Humanities
- Economics
- Natural Science
- Mechanical Engineering
- Math., CompSci and EE

#### Research focus

- optoelectronics and photonics
- material science
- business informatics
- intelligent technical systems



#### **Paderborn Center for Parallel Computing**

- Scientific institute of Paderborn University
  - established in 1992
  - roots in theoretical computer science
- Service provider and research institution
  - provision HPC infrastructure and services for computational sciences
  - develop new methods and tools for HPC simulation in cooperation with domain scientists
  - perform computing systems research for energyefficient HPC with emphasis on heterogeneous and accelerated computing with FPGAs and manycores
- Long track record in exploring emerging and off the beaten path technologies



#### **PC<sup>2</sup> History and Innovations**

|      | HPC System                                          | Properties / Innnovation                                                                                                                                                                                                  | Research Topics                                                                                                          |
|------|-----------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------|
| 1991 | Parsytec<br>SC320                                   | <ul> <li>system design in Germany (Aachen), Transputer<br/>processors developed in UK</li> <li>largest parallel computer with freely programmable<br/>network</li> <li>parallel programming with OCCAM</li> </ul>         | <ul> <li>graph partitioning</li> <li>optimal embedding of graphs of degree 4</li> </ul>                                  |
| 1992 | Parsytec<br>GCel<br>#262 of<br>Top500               | <ul> <li>1024 processors, largest parallel computer with<br/>Transputers in Europe</li> <li>Solaris Unix and parallel programming environment<br/>PARIX</li> <li>scalable 2D communication network</li> </ul>             | <ul> <li>general graph<br/>embedding, in particular<br/>in 2D meshes</li> </ul>                                          |
| 1995 | Parsytec<br>GC/PP<br>#118 of<br>Top500              | <ul> <li>transition to standard technologies (CPU, compiler, operating systems)</li> <li>innovation through heterogeneous nodes:<br/>PowerPC (computing) + Transputer<br/>(communication)</li> </ul>                      | <ul> <li>load balancing</li> <li>HIBRIC-MEM streaming cache</li> </ul>                                                   |
| 1999 | Fujitsu-<br>Siemens<br>hpcLine<br>#351 of<br>Top500 | <ul> <li>use of Intel x86 and Solaris/Linux as standard components</li> <li>innovation in networking: Scalable Coherent Interface (SCI), European development</li> <li>first large scale SCI-cluster worldwide</li> </ul> | <ul> <li>message passing</li> <li>fault tolerance</li> <li>start of HPC usage<br/>beyond computer<br/>science</li> </ul> |

#### PC<sup>2</sup> History and Innovations (2)

|      | HPC System                                | Properties / Innovation                                                                                                                                                                                                               | Research Topics                                                                |
|------|-------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------|
| 2003 | Megware<br>FPGA Cluster                   | <ul> <li>combination of standard CPU/OS technologies<br/>with application-specific accelerators (FPGA)</li> <li>used for powering one of the best Chess<br/>computers of the day</li> <li>Myrinet network with low latency</li> </ul> | <ul> <li>distributed game tree search</li> <li>custom computing</li> </ul>     |
| 2003 | HP<br>PLING                               | <ul> <li>step towards 64bit CPU technology (Intel Itanium)</li> <li>first 64bit Linux Cluster with InfiniBand in Europe</li> </ul>                                                                                                    | <ul> <li>software support for<br/>InfiniBand in 64bit Linux</li> </ul>         |
| 2004 | Fujitsu/ICT<br>Arminius<br>#213 of Top500 | <ul> <li>Direct water cooling for CPUs</li> <li>integration of GPU nodes in cluster</li> <li>PCIe / InfiniBand</li> <li>x86-64, Linux</li> </ul>                                                                                      | <ul> <li>3D visualization of simulations</li> <li>immersive control</li> </ul> |
| 2007 | Fujitsu<br>Siemens<br>BiSGrid             | <ul> <li>nodes with high compute power, 4 sockets with<br/>AMD processors</li> </ul>                                                                                                                                                  | <ul><li>grid computing</li><li>workflow management</li></ul>                   |
| 2013 | Clustervision<br>OCuLUS<br>#173 of Top500 | <ul> <li>heterogeneous nodes with GPUs, Intel Xeon Phi</li> </ul>                                                                                                                                                                     | <ul><li>virtualization</li><li>multi/many Core</li></ul>                       |
| 2018 | Cray CS500<br>Noctua                      | <ul> <li>16 nodes with FPGA accelerators and dedicated<br/>interconnect between FPGAs</li> </ul>                                                                                                                                      | <ul> <li>HPC acceleration with<br/>FPGAs</li> </ul>                            |

# **Noctua HPC Cluster**

- Cray CS500 cluster system
- 256 CPU nodes
  - 256 nodes with 2 x 20-core Xeon Skylake Gold 6148
  - 192 GiB RAM / node
  - 100 Gbit/s Omni-Path interconnect
  - 700 TB Lustre parallel file system
- 16 FPGA nodes
  - same configuration as CPU nodes
  - each with 2 x Nallatech 520N FPGA boards
  - Stratix 10 GX2800, 32GB DDR4, 4 memory channels
  - PCle 3.0 x16
  - 4 QSFP+ ports
- Operational since Sept 2018

at time of installation largest academic installation of FPGAs in HPC cluster



#### **Noctua Phase 2**



- New data center optimized for HPC
  - best of class energy efficiency and flexibility
  - warm water cooling (free cooling)
  - modular design to support concurrent operation and upgrades of multiple generation HPC systems
  - extensibility (power, cooling, office space)

- Specificaitons
  - white space: 300m<sup>2</sup>
  - other technical facilities: 1100m<sup>2</sup>
  - initial power / cooling capacity: 1.2-2 MW
  - office space for 25+ persons + seminar rooms, labs, ...

## **Workloads and Users**

- Solid state physics and chemistry (in particular DFT codes)
  - CP2K, VASP, QuantumEspresso, Turbomole
- Optoelectronics and photonics
  - CST microwave studio
  - in-house codes
- Engineering
  - Fluent, OpenFOAM
- Computer science
- Statistics
  - 70 active projects
  - 400 active users



data: PC<sup>2</sup> 2016 (Oculus cluster)

# Why FPGAs?



500.0 -----Stratix V D5 @ 225MHz 90.0 65.0 0.05 0.05 0.5 30.7 18.6 15.0 11.5 5.2 1.8 4.5 2.7 2.0 1.4 0.5 16-bit int 8-bit int ms-fp9 ms-fp8

sweet spot

for FPGAs

- end of Dennard scaling and Moore's law is imminent
- Post-CMOS technologies will not be ready for many years
- demand for HPC and general data center applications growing rapidly
- CPUs are fundamentally inefficient due to generality (instructions, caches, OoO)
- What can we do
  - scale out by using ever larger and more costly systems
  - specialization of architectures
  - develop new methods that do not require exact computation and/or high precision
  - method/architecture codesign

FPGAs are currently the only viable technology for application-specific computing (when ASICs don't pay off)

# **Approximate Computing**

- Exploit performance/energy vs. accuracy trade-offs in computing architectures
- Suitable if:
  - application is inherently tolerant to inaccuracies
  - inaccuracies can be compensated, e.g. iterative methods
- Target applications
  - molecular dynamics, quantum chemistry
- Architectures
  - CPU/GPU: reduce memory bandwidth
  - FPGA: trade area saved for more computing units



- A massively parallel algorithm for the approximate calculation of inverse p-th roots of large sparse matrices. In Proc. Platform for Advanced Scientific Computing Conference (PASC). ACM, 2018.
- Accurate Sampling with Noisy Forces from Approximate Computing. In preparation.



iterative computation of A<sup>-1/p</sup>: approximation error for custom floating-point formats

### **Capabilities of Todays Top-Of-The-Line FPGAs**

#### Example: Intel Stratix 10 GX2800 (used in Noctua)

- > 900,000 configurable logic blocks
  - up to 4 Boolean functions of 8 inputs
- 5760 hardened arithmetic units (DSP)
  - fixed point and IEEE 754 SP floating-point
- > 11,000 independent SRAM blocks
  - width/depth/ports highly configurable
- integrated DDR4-2666 memory controllers
- 96 serial transceivers, up to 28.3 Gbps
- typically about 300-600MHz
- power consumption 50-225W

**100 TERRA-OPS** 

10 single-precision TFLOPS

20 TB/s internal SRAM bandwidth (full duplex)

300 TB/s communication bandwidth (full duplex)

up to 80 GFLOPS/W

### How Can FPGAs Compete with CPUs or GPUs



- Compute-bound applications
  - customization of operations and data formats
  - new methods considering FPGA architecture
- Memory-bound applications
  - unrolling and data flow computing with very deep pipelines
  - application-specific, distributed memory architectures
- Latency-bound applications
  - speculative or redundant execution
- I/O-bound applications
  - on-board network interfaces
  - direct FPGA-to-FPGA communication

HBM: high-bandwidth memory HSSI: high-speed serial interface, e.g. 100G Ethernet

### **Direct Integration of FPGAs in Interconnect**



- peer-to-peer optical links between FPGAs
  - high throughput
  - low latency (<600ns)</li>
  - even better for streaming
- building application specific networks
  - circuit switched (optical switch)
  - packet switched (Slingshot!)



### **Ideas for Collaboration within CUG**

- What we can share
  - provisioning of FPGA firmware and tool versions with Slurm
  - applications and libraries with FPGA support (CP2K, DBCSR, FFT)
  - integration of optical switches as secondary networks in cluster
  - access to our FPGA partition for research and development



srun --partition=fpga \
 --constraint=18.0.1

- Possible areas of collaboration
  - integration of FPGAs as network-attached accelerators in Slingshot
  - tools for application analysis to identify suitable functions for offloading
  - numerical methods for approximate computing in linear scaling DFT and molecular dynamics



• We are looking forward working with the CUG community

#### **Further Information / Feedback**

Christian Plessl Paderborn University <u>christian.plessl@uni-paderborn.de</u>

Twitter: @plessl @pc2\_upb

http://pc2.uni-paderborn.de

