Mickaël BOICHOT is currently a PhD student at CEA, supervised by Adrien Roussel and Elisabeth Brunet (Télécom SudParis). His thesis director is Patrick Carribault, HDR Engineer Researcher at CEA.
Mickaël’s thesis topic is the following: “Characterization of parallel applications for porting to multi-GPU systems”. The goal of this thesis is to define if an application can be ported on GPU. The work of porting on GPU is a long and tedious step, it is thus necessary to be able to evaluate if this one will bring beneficial results for the application or not.
SIAM CSE 2023 - SIAM Conference on Computational Science and Engineering, 2023
abstract
Abstract
Heterogeneous supercomputers with GPUs are one of the best candidates to build Exascale machines. However, porting scientific applications with millions of lines of code lines is challenging. Data transfers/locality and exposing enough parallelism determine the maximum achievable performance on such systems. Thus porting efforts impose developers to rewrite parts of the application which is tedious and time-consuming and does not guarantee performances in all the cases. Being able to detect which parts can be expected to deliver performance gains on GPUs is therefore a major asset for developers. Moreover, task parallel programming model is a promising alternative to expose enough parallelism while allowing asynchronous execution between CPU and GPU. OpenMP 4.5 introduces the « target » directive to offload computation on GPU in a portable way. Target constructions are considered as explicit OpenMP task in the same way as for CPU but executed on GPU. In this work, we propose a methodology to detect the most profitable loops of an application that can be ported on GPU. While we have applied the detection part on several mini applications (LULESH, miniFE, XSBench and Quicksilver), we experimented the full methodology on LULESH through MPI+OpenMP task programming model with target directives. It relies on runtime modifications to enable overlapping of data transfers and kernel execution through tasks. This work has been integrated into the MPC framework, and has been validated on distributed heterogeneous system.