Post-Image

Marc PERACHE

Post-Image Post-Image

Ingénieur-Chercheur en Informatique, directeur de recherche CEA, expert Fellow CEA et titulaire d’une HDR en Informatique, Marc Pérache est en charge de coordonner l’adaptation des logiciels et modèles de programmations aux calculateurs actuels et futurs.

Sa thématique de recherche principale de recherche porte sur les supports exécutifs proposant des modèles de programmation parallèle, en particulier MPI en contexte multithreadé. Ces travaux ont pour cibles les architectures massivement parallèles comme les supercalculateurs du TOP 500.

Marc Pérache a encadré 10 thèses (+ 3 en cours) et est co-auteur de plus de 30 articles dans des conférences et journaux.

Generating and Scaling a Multi-Language Test-Suite for MPI
Julien Adam   J.B. Besnard   Paul Canat   Sameer Shende   Hugo Taboada   Adrien Roussel   Marc Pérache   Julien Jaeger  
EuroMPI'23, 2023

abstract

Abstract

High-Performance Computing (HPC) is currently facing significant challenges. The hardware pressure has become increasingly difficult to manage due to the lack of parallel abstractions in applications. As a result, parallel programs must undergo drastic evolution to effectively exploit underlying hardware parallelism. Failure to do so results in inefficient code. In this pressing environment, parallel runtimes play a critical role, and their esting becomes crucial. This paper focuses on the MPI interface and leverages the MPI binding tools to develop a multi-language test-suite for MPI. By doing so and building on previous work from the Forum’s document editors, we implement a systematic testing of MPI symbols in the context of the Parallel Computing Validation System (PCVS), which is an HPC validation platform dedicated to running and managing test-suites at scale. We first describe PCVS, then outline the process of generating the MPI API test suite, and finally, run these tests at scale. All data sets, code generators, and implementations are made available in open-source to the community. We also set up a dedicated website showcasing the results, which self-updates thanks to the Spack package manager.

MPI Application Binary Interface Standardization
Jeff Hammond   Lisandro Dalcin   Erik Schnetter   Marc Pérache   J.B. Besnarb   Jed Brown   Gonzalo Brito Gadeschi   Simon Byrne   Joseph Schuchart   Hui Zhou  
EuroMPI'23, 2023

abstract

Abstract

MPI is the most widely used interface for high-performance computing (HPC) workloads. Its success lies in its embrace of libraries and ability to evolve while maintaining backward compatibility for older codes, enabling them to run on new architectures for many years. In this paper, we propose a new level of MPI compatibility: a standard Application Binary Interface (ABI). We review the history of MPI implementation ABIs, identify the constraints from the MPI standard and ISO C, and summarize recent efforts to develop a standard ABI for MPI. We provide the current proposal from the MPI Forum’s ABI working group, which has been prototyped both within MPICH and as an independent abstraction layer called Mukautuva. We also list several use cases that would benefit from the definition of an ABI while outlining the remaining constraints.

Performance Improvements of Parallel Applicationsthanks to MPI-4.0 Hints
Maxim Moraru   Adrien Roussel   Hugo Taboada   Christophe Jaillet   Michael Krajecki   Marc Pérache  
Proceedings of SBAC-PAD 2022, IEEE, 2022

abstract

Abstract

HPC systems have experienced significant growth over the past years, with modern machines having hundreds of thousands of nodes. Message Passing Interface (MPI) is the de facto standard for distributed computing on these architectures. On the MPI critical path, the message-matching process is one of the most time-consuming operations. In this process, searching for a specific request in a message queue represents a significant part of the communication latency. So far, no miracle algorithm performs well in all cases. This paper explores potential matching specializations thanks to hints introduced in the latest MPI 4.0 standard. We propose a hash-table-based algorithm that performs constant time message-matching for no wildcard requests. This approach is suitable for intensive point-to-point communication phases in many applications (more than 50% of CORAL benchmarks). We demonstrate that our approach can improve the overall execution time of real HPC applications by up to 25%. Also, we analyze the limitations of our method and propose a strategy for identifying the most suitable algorithm for a given application. Indeed, we apply machine learning techniques for classifying applications depending on their message pattern characteristics.

Exploring Space-Time Trade-Off in Backtraces
Jean-Baptiste Besnard   Julien Adam   Allen D. Malony   Sameer Shende   Julien Jaeger   Patrick Carribault   Marc Pérache  
Tools for High Performance Computing 2018 / 2019, Springer International Publishing, p. 151-168, 2021

abstract

Abstract

The backtrace is one of the most common operations done by profiling and debugging tools. It consists in determining the nesting of functions leading to the current execution state. Frameworks and standard libraries provide facilities enabling this operation, however, it generally incurs both computational and memory costs. Indeed, walking the stack up and then possibly resolving functions pointers (to function names) before storing them can lead to non-negligible costs. In this paper, we propose to explore a means of extracting optimized backtraces with an O(1) storage size by defining the notion of stack tags. We define a new data-structure that we called a hashed-trie used to encode stack traces at runtime through chained hashing. Our process called stack-tagging is implemented in a GCC plugin, enabling its use of C and C++ application. A library enabling the decoding of stack locators though both static and brute-force analysis is also presented. This work introduces a new manner of capturing execution state which greatly simplifies both extraction and storage which are important issues in parallel profiling.

Benefits of MPI Sessions for GPU MPI applications
Maxim Moraru   Adrien Roussel   Hugo Taboada   Christophe Jaillet   Michael Krajecki   Marc Pérache  
Proceedings of EuroMPI 2021, 2021

abstract

Abstract

Heterogeneous supercomputers are now considered the most valuable solution to reach the Exascale. Nowadays, we can frequently observe that compute nodes are composed of more than one GPU accelerator. Programming such architectures efficiently is challenging. MPI is the defacto standard for distributed computing. CUDAaware libraries were introduced to ease GPU inter-nodes communications. However, they induce some overhead that can degrade overall performances. MPI 4.0 Specification draft introduces the MPI Sessions model which offers the ability to initialize specific resources for a specific component of the application. In this paper, we present a way to reduce the overhead induced by CUDA-aware libraries with a solution inspired by MPI Sessions. In this way, we minimize the overhead induced by GPUs in an MPI context and allow to improve CPU + GPU programs efficiency. We evaluate our approach on various micro-benchmarks and some proxy applications like Lulesh, MiniFE, Quicksilver, and Cloverleaf. We demonstrate how this approach can provide up to a 7x speedup compared to the standard MPI model.

On-the-Fly, Robust Translation of MPI Libraries
Edgar A. Léon   Marc Joos   Nathan Hanford   Adrien Cotte   Tony Delforge   François Diakhaté   Vincent Ducrot   Ian Karlin   Marc Pérache  
Proceedings of Cluster 2021, 2021

Enhancing Load-Balancing of MPI Applications with Workshare
Thomas Dionisi   Stéphane Bouhrour   Julien Jaeger   Patrick Carribault   Marc Pérache  
Proceedings of EuroPar 2021, 2021

Overlapping MPI communications with Intel TBB computation
Cassandra Rocha Barbosa   Pierre Lemarinier   Marc Sergent   Guillaume Papauré   Marc Pérache  
2020 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2020, New Orleans, LA, USA, May 18-22, 2020, IEEE, p. 958-966, 2020

Unifying the Analysis of Performance Event Streams at the Consumer Interface Level
Jean-Baptiste Besnard   Allen D. Malony   Sameer Shende   Marc Pérache   Patrick Carribault   Julien Jaeger  
Tools for High Performance Computing 2017, Springer International Publishing, p. 57-71, 2019

abstract

Abstract

Several instrumentation interfaces have been developed for parallel programs to make observable actions that take place during execution and to make accessible information about the program’s behavior and performance. Following in the footsteps of the successful profiling interface for MPI (PMPI), new rich interfaces to expose internal operation of MPI (MPI-T) and OpenMP (OMPT) runtimes are now in the standards. Taking advantage of these interfaces requires tools to selectively collect events from multiples interfaces by various techniques: function interposition (PMPI), value read (MPI-T), and callbacks (OMPT). In this paper, we present the unified instrumentation pipeline proposed by the MALP infrastructure that can be used to forward a variety of fine-grained events from multiple interfaces online to multi-threaded analysis processes implemented orthogonally with plugins. In essence, our contribution complements “front-end” instrumentation mechanisms by a generic “back-end” event consumption interface that allows “consumer” callbacks to generate performance measurements in various formats for analysis and transport. With such support, online and post-mortem cases become similar from an analysis point of view, making it possible to build more unified and consistent analysis frameworks. The paper describes the approach and demonstrates its benefits with several use cases.

Study on progress threads placement and dedicated cores for overlapping MPI nonblocking collectives on manycore processor
Alexandre Denis   Julien Jaeger   Emmanuel Jeannot   Marc Pérache   Hugo Taboada  
Int. J. High Perform. Comput. Appl., 2019

Checkpoint/restart approaches for a thread-based MPI runtime
Julien Adam   Maxime Kermarquer   Jean-Baptiste Besnard   Leonardo Bautista-Gomez   Marc Pérache   Patrick Carribault   Julien Jaeger   Allen D. Malony   Sameer Shende  
Parallel Comput., p. 204-219, 2019

Detecting Non-sibling Dependencies in OpenMP Task-Based Applications
Ricardo Bispo Vieira   Antoine Capra   Patrick Carribault   Julien Jaeger   Marc Pérache   Adrien Roussel  
OpenMP: Conquering the Full Hardware Spectrum - 15th International Workshop on OpenMP, IWOMP 2019, Auckland, New Zealand, September 11-13, 2019, Proceedings, Springer, p. 231-245, 2019

abstract

Abstract

The advent of the multicore era led to the duplication of functional units through an increasing number of cores. To exploit those processors, a shared-memory parallel programming model is one possible direction. Thus, OpenMP is a good candidate to enable different paradigms: data parallelism (including loop-based directives) and control parallelism, through the notion of tasks with dependencies. But this is the programmer responsibility to ensure that data dependencies are complete such as no data races may happen. It might be complex to guarantee that no issue will occur and that all dependencies have been correctly expressed in the context of nested tasks. This paper proposes an algorithm to detect the data dependencies that might be missing on the OpenMP task clauses between tasks that have been generated by different parents. This approach is implemented inside a tool relying on the OMPT interface.

Mixing ranks, tasks, progress and nonblocking collectives
Jean-Baptiste Besnard   Julien Jaeger   Allen D. Malony   Sameer Shende   Hugo Taboada   Marc Pérache   Patrick Carribault  
Proceedings of the 26th European MPI Users’ Group Meeting, EuroMPI 2019, Zürich, Switzerland, September 11-13, 2019, ACM, p. 10:1-10:10, 2019

Checkpoint/restart approaches for a thread-based MPI runtime
Julien Adam   Maxime Kermarquer   Jean-Baptiste Besnard   Leonardo Bautista-Gomez   Marc Pérache   Patrick Carribault   Julien Jaeger   Allen D. Malony   Sameer Shende  
CoRR, 2019

Dynamic Placement of Progress Thread for Overlapping MPI Non-blocking Collectives on Manycore Processor
Alexandre Denis   Julien Jaeger   Emmanuel Jeannot   Marc Pérache   Hugo Taboada  
Euro-Par 2018: Parallel Processing - 24th International Conference on Parallel and Distributed Computing, Turin, Italy, August 27-31, 2018, Proceedings, Springer, p. 616-627, 2018

Efficient Communication/Computation Overlap with MPI+OpenMP Runtimes Collaboration
Marc Sergent   Mario Dagrada   Patrick Carribault   Julien Jaeger   Marc Pérache   Guillaume Papauré  
Euro-Par 2018: Parallel Processing - 24th International Conference on Parallel and Distributed Computing, Turin, Italy, August 27-31, 2018, Proceedings, Springer, p. 560-572, 2018

Transparent High-Speed Network Checkpoint/Restart in MPI
Julien Adam   Jean-Baptiste Besnard   Allen D. Malony   Sameer Shende   Marc Pérache   Patrick Carribault   Julien Jaeger  
Proceedings of the 25th European MPI Users’ Group Meeting, Barcelona, Spain, September 23-26, 2018, ACM, p. 12:1-12:11, 2018

Contemporary High Performance Computing
Mickaël Amiet   Patrick Carribault   Elisabeth Charon   Guillaume Colin Verdière   Philippe Deniel   Gilles Grospellier   Guénolé Harel   François Jollet   Jacques-Charles Lafoucrière   Jacques-Bernard Lekien   Stéphane Mathieu   Marc Pérache   Jean-Christophe Weill   Gilles Wiber  
Chapman; Hall/CRC, p. 45-74, 2017

Towards a Better Expressiveness of the Speedup Metric in MPI Context
Jean-Baptiste Besnard   Allen D. Malony   Sameer Shende   Marc Pérache   Patrick Carribault   Julien Jaeger  
46th International Conference on Parallel Processing Workshops, ICPP Workshops 2017, Bristol, United Kingdom, August 14-17, 2017, IEEE Computer Society, p. 251-260, 2017

User Co-scheduling for MPI+OpenMP Applications Using OpenMP Semantics
Antoine Capra   Patrick Carribault   Jean-Baptiste Besnard   Allen D. Malony   Marc Pérache   Julien Jaeger  
Scaling OpenMP for Exascale Performance and Portability - 13th International Workshop on OpenMP, IWOMP 2017, Stony Brook, NY, USA, September 20-22, 2017, Proceedings, Springer, p. 203-216, 2017

Dynamic Load Balancing of Monte Carlo Particle Transport Applications on HPC Clusters
Thomas Gonçalves   Marc Pérache   Frédéric Desprez   Jean-François Méhaut  
Parallel Computing is Everywhere, Proceedings of the International Conference on Parallel Computing, ParCo 2017, 12-15 September 2017, Bologna, Italy, IOS Press, p. 465-474, 2017

Resource-Management Study in HPC Runtime-Stacking Context
Arthur Loussert   Benoit Welterlen   Patrick Carribault   Julien Jaeger   Marc Pérache   Raymond Namyst  
29th International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2017, Campinas, Brazil, October 17-20, 2017, IEEE Computer Society, p. 177-184, 2017

Introducing Task-Containers as an Alternative to Runtime-Stacking
Jean-Baptiste Besnard   Julien Adam   Sameer Shende   Marc Pérache   Patrick Carribault   Julien Jaeger  
Proceedings of the 23rd European MPI Users’ Group Meeting, EuroMPI 2016, Edinburgh, United Kingdom, September 25-28, 2016, ACM, p. 51-63, 2016

A Parallel and Resilient Frontend for High Performance Validation Suites
Julien Adam   Marc Pérache  
High Performance Computing for Computational Science - VECPAR 2016 - 12th International Conference, Porto, Portugal, June 28-30, 2016, Revised Selected Papers, Springer, p. 248-255, 2016

Fine-grain data management directory for OpenMP 4.0 and OpenACC
Julien Jaeger   Patrick Carribault   Marc Pérache  
Concurr. Comput. Pract. Exp., p. 1528-1539, 2015

An MPI Halo-Cell Implementation for Zero-Copy Abstraction
Jean-Baptiste Besnard   Allen D. Malony   Sameer Shende   Marc Pérache   Patrick Carribault   Julien Jaeger  
Proceedings of the 22nd European MPI Users’ Group Meeting, EuroMPI 2015, Bordeaux, France, September 21-23, 2015, ACM, p. 3:1-3:9, 2015

Improving MPI communication overlap with collaborative polling
Sylvain Didelot   Patrick Carribault   Marc Pérache   William Jalby  
Computing, p. 263-278, 2014

Evaluation of OpenMP Task Scheduling Algorithms for Large NUMA Architectures
Jérôme Clet-Ortega   Patrick Carribault   Marc Pérache  
Euro-Par 2014 Parallel Processing - 20th International Conference, Porto, Portugal, August 25-29, 2014. Proceedings, Springer, p. 596-607, 2014

Optimizing Collective Operations in Hybrid Applications
Aurèle Mahéo   Patrick Carribault   Marc Pérache   William Jalby  
21st European MPI Users’ Group Meeting, EuroMPI/ASIA ’14, Kyoto, Japan - September 09 - 12, 2014, ACM, p. 121, 2014

Data-Management Directory for OpenMP 4.0 and OpenACC
Julien Jaeger   Patrick Carribault   Marc Pérache  
Euro-Par 2013: Parallel Processing Workshops - BigDataCloud, DIHC, FedICI, HeteroPar, HiBB, LSDVE, MHPC, OMHI, PADABS, PROPER, Resilience, ROME, and UCHPC 2013, Aachen, Germany, August 26-27, 2013. Revised Selected Papers, Springer, p. 168-177, 2013

Event Streaming for Online Performance Measurements Reduction
Jean-Baptiste Besnard   Marc Pérache   William Jalby  
42nd International Conference on Parallel Processing, ICPP 2013, Lyon, France, October 1-4, 2013, IEEE Computer Society, p. 985-994, 2013

Introducing kernel-level page reuse for high performance computing
Sébastien Valat   Marc Pérache   William Jalby  
Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness, June, 21, 2013, Seattle, Washington, USA, Co-located with PLDI 2013, ACM, p. 3:1-3:9, 2013

Hierarchical Local Storage: Exploiting Flexible User-Data Sharing Between MPI Tasks
Marc Tchiboukdjian   Patrick Carribault   Marc Pérache  
26th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2012, Shanghai, China, May 21-25, 2012, IEEE Computer Society, p. 366-377, 2012

Adaptive OpenMP for Large NUMA Nodes
Aurèle Mahéo   Souad Koliai   Patrick Carribault   Marc Pérache   William Jalby  
OpenMP in a Heterogeneous World - 8th International Workshop on OpenMP, IWOMP 2012, Rome, Italy, June 11-13, 2012. Proceedings, Springer, p. 254-257, 2012

Improving MPI Communication Overlap with Collaborative Polling
Sylvain Didelot   Patrick Carribault   Marc Pérache   William Jalby  
Recent Advances in the Message Passing Interface - 19th European MPI Users’ Group Meeting, EuroMPI 2012, Vienna, Austria, September 23-26, 2012. Proceedings, Springer, p. 37-46, 2012

Method, computer program and device for managing memory access in a multiprocessor architecture of numa type
Zoltan Menyhart   Marc Pérache  
2011

Thread-Local Storage Extension to Support Thread-Based MPI/OpenMP Applications
Patrick Carribault   Marc Pérache   Hervé Jourdren  
OpenMP in the Petascale Era - 7th International Workshop on OpenMP, IWOMP 2011, Chicago, IL, USA, June 13-15, 2011. Proceedings, Springer, p. 80-93, 2011

User level DB: a debugging API for user-level thread libraries
Kevin Pouget   Marc Pérache   Patrick Carribault   Hervé Jourdren  
24th IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2010, Atlanta, Georgia, USA, 19-23 April 2010 - Workshop Proceedings, IEEE, p. 1-7, 2010

Enabling Low-Overhead Hybrid MPI/OpenMP Parallelism with MPC
Patrick Carribault   Marc Pérache   Hervé Jourdren  
Beyond Loop Level Parallelism in OpenMP: Accelerators, Tasking and More, 6th Internationan Workshop on OpenMP, IWOMP 2010, Tsukuba, Japan, June 14-16, 2010, Proceedings, Springer, p. 1-14, 2010

MPC-MPI: An MPI Implementation Reducing the Overall Memory Consumption
Marc Pérache   Patrick Carribault   Hervé Jourdren  
Recent Advances in Parallel Virtual Machine and Message Passing Interface, 16th European PVM/MPI Users’ Group Meeting, Espoo, Finland, September 7-10, 2009. Proceedings, Springer, p. 94-103, 2009

Efficient Shared Memory Message Passing for Inter-VM Communications
François Diakhaté   Marc Pérache   Raymond Namyst   Hervé Jourdren  
Euro-Par 2008 Workshops - Parallel Processing, VHPC 2008, UNICORE 2008, HPPC 2008, SGS 2008, PROPER 2008, ROIA 2008, and DPA 2008, Las Palmas de Gran Canaria, Spain, August 25-26, 2008, Revised Selected Papers, Springer, p. 53-62, 2008

MPC: A Unified Parallel Runtime for Clusters of NUMA Machines
Marc Pérache   Hervé Jourdren   Raymond Namyst  
Euro-Par 2008 - Parallel Processing, 14th International Euro-Par Conference, Las Palmas de Gran Canaria, Spain, August 26-29, 2008, Proceedings, Springer, p. 78-88, 2008

Fine Tuning Matrix Multiplications on Multicore
Stéphane Zuckerman   Marc Pérache   William Jalby  
High Performance Computing - HiPC 2008, 15th International Conference, Bangalore, India, December 17-20, 2008. Proceedings, Springer, p. 30-41, 2008

Contribution à l’élaboration d’environnements de programmation dédiés au calcul scientifique hautes performances
Marc Pérache  
These de doctorat, spécialité informatique, CEA, Université de Bordeaux, 2006