Help this site to grow by sending a friend an
invitation to visit this site.
CFD News by Email
Did you know that you can get today's CFD Review headlines mailed to your inbox?
Just log in and select Email Headlines Each Night on your User Preferences page.
Massive Parallel Computing with NUMECA FINE™/Open
Posted Tue August 08, 2017 @10:14AM
Recent developments in NUMECA FINE™/Open have yielded impressive improvements in massively parallel performance. More than just improving the scalability of the solver iteration loop, these developments have affected all phases of a typical simulation: solver startup, iteration, and the solution writing.
Taken together, these developments allow efficient performance with over 20,000 processes and have made rapid turnaround of high resolution time accurate simulations a reality.
In recent years there has been a shift in industrial turbomachinery design and analysis to higher fidelity, higher cost, time-accurate simulations in place of traditional steady state analysis. These simulations allow a deeper understanding of blade passing interaction and other time-varying phenomena which can negatively impact design efficiency. However, this type of simulation presents many significant software challenges:
As opposed to steady state simulation which only require a single meshed blade passage, unsteady simulations require multiple blade passages or even a full 360 degree mesh. To maintain a sufficient mesh density this can lead to an order of magnitude or more increase in mesh size. To achieve a quick solution turnaround time a scalable parallel implementation is mandatory.
A second problem appears in the form of poorly scaling solver input/output (IO). The famous saying by Ken Batcher applies: “A supercomputer is a device for turning compute-bound problem into an I/O-bound problem”. At higher levels of parallelism a large percentage of computation time is spent on traditionally serial I/O. This is most apparent with unsteady simulations which require frequent solution writes.
Recent developments in the NUMECA FINE™/Open solver have addressed both of these problems. First, optimizations to the MPI-based parallel model have yielded efficient parallel performance with over 20,000 processes and a per-process load of less than 30K cells. Second, a comprehensive parallel I/O solution has been implemented with the CGNS-3 file format, allowing scalable initialization, checkpointing, and shutdown of the FINE™/Open solver at any scale.
Fig.1 FINE/Open Parallel Performance. 6e8 Cell Mesh, Unsteady DES Solved on the OLCF Titan Supercomputer.
Taken together, these developments have allowed massive unsteady simulations which would have been otherwise unfathomable. In collaboration with Dresser-Rand, high resolution turbomachinery DES simulations have been performed on the OLCF Titan supercomputer using mesh configurations with between 8e7 and 6e8 cells, the largest of which were solved on over 20,000 compute processes. Using the CPUBooster™ module for rapid convergence, and taking advantage of the efficient parallel performance of the FINE™/Open solver, it is now possible to complete each time step in less than 4 minutes including the solution write. This allows a complete solution turnaround time on the order of a few days.