Mickaël BOICHOT
Mickaël BOICHOT a été doctorant au CEA, encadré par Adrien Roussel et Elisabeth Brunet (Télécom SudParis). Son directeur de thèse était Patrick Carribault, Ingénieur Chercheur HDR du CEA.
Le sujet de thèse de Mickaël était le suivant : “Caractérisation d’application parallèle en vue d’un portage sur systèmes multi-GPUs” Le but de cette dernière était de permettre d’évaluer le potentiel d’une application pour un portage sur GPU. Le travail de portage étant une phase longue et fastidieuse, il est nécessaire de pouvoir évaluer si celui-ci va amener à des résultats bénéfiques pour l’application ou une sous-partie de celle-ci.
SIAM CSE 2023 - SIAM Conference on Computational Science and Engineering, 2023
abstract
Abstract
Heterogeneous supercomputers with GPUs are one of the best candidates to build Exascale machines. However, porting scientific applications with millions of lines of code lines is challenging. Data transfers/locality and exposing enough parallelism determine the maximum achievable performance on such systems. Thus porting efforts impose developers to rewrite parts of the application which is tedious and time-consuming and does not guarantee performances in all the cases. Being able to detect which parts can be expected to deliver performance gains on GPUs is therefore a major asset for developers. Moreover, task parallel programming model is a promising alternative to expose enough parallelism while allowing asynchronous execution between CPU and GPU. OpenMP 4.5 introduces the « target » directive to offload computation on GPU in a portable way. Target constructions are considered as explicit OpenMP task in the same way as for CPU but executed on GPU. In this work, we propose a methodology to detect the most profitable loops of an application that can be ported on GPU. While we have applied the detection part on several mini applications (LULESH, miniFE, XSBench and Quicksilver), we experimented the full methodology on LULESH through MPI+OpenMP task programming model with target directives. It relies on runtime modifications to enable overlapping of data transfers and kernel execution through tasks. This work has been integrated into the MPC framework, and has been validated on distributed heterogeneous system.