NVLINK Bridge support

Audrius_Mikuta · September 20, 2020, 10:33am

Hi, does TFD support NVLINK Bridge memory doubling with 2 gpus?

Paul_Selhi · September 20, 2020, 11:02am

No not at present

Jascha_Wetzel · September 21, 2020, 7:14am

There’s a prototype for multi-GPU simulation. I can’t yet estimate when this will be ready for production, though.

FYI: Unfortunately it NVLink is not enough to make it work. It helps to speed up the communication between GPUs, but it would still have to be 20-30 times faster in order to work as a drop-in solution for a multi-GPU simulation.
As a result, the simulation has to partition the grid and distribute it among GPUs. All operations have to exchange results between neighboring partitions/GPUs. To complete any operation on the entire grid, it takes as long as the slowest partition/GPU. Therefore the partitioning must be well balanced to minimize slowest runtime. Since runtimes of some operations depend on the data (e.g. the velocity of the flow), the per-GPU runtimes change over time and the partitioning has to be re-balanced continuously.
When it works, it’s pretty cool. But it takes quite a bit of careful tuning to not drop off a cliff performance-wise.