LiHPC | Laboratory for High Performance Computing and Simulation

Patrick Carribault is a Project Manager in HPC and Quantum Computing. Furthermore, he is a CEA Fellow and holds an HDR in Computer Science. His research focuses on software stack and co-design between parallel applications and high-performance compute architectures. Participating to various academia and industry collaborations, he studies parallel programming models, compilation and optimization of parallel performance targeting current and next-generation of supercomputers.

Patrick Carribault advised more than 10 PhD thesis and published more than 40 articles in international conferences and journals.

To Share or Not to Share: a case for MPI in Shared-Memory
Julien Adam Jean-Baptiste Besnard Adrien Roussel Julien Jaeger Romain Pereira Patrick Carribault Marc Pérache
European MPI Users' Group Meeting, 2024

abstract

Abstract

The evolution of parallel computing architectures presents new challenges for developing efficient parallelized codes. The emergence of heterogeneous systems has given rise to multiple programming models, each requiring careful adaptation to maximize performance. In this context, we propose reevaluating memory layout designs for computational tasks within larger nodes by comparing various architectures. To gain insight into the performance discrepancies between shared memory and shared-address space settings, we systematically measure the bandwidth between cores and sockets using different methodologies. Our findings reveal significant differences in performance, suggesting that MPI running inside UNIX processes may not fully utilize its intranode bandwidth potential. In light of our work in the MPC thread-based MPI runtime, which can leverage shared memory to achieve higher performance due to its optimized layout, we advocate for enabling the use of shared memory within the MPI standard.

Measuring and Interpreting Dependent Task-based Applications Performances
Romain Pereira Thierry Gautier Adrien Roussel Patrick Carribault
15th International Conference on Parallel Processing & Applied Mathematics, 2024

abstract

Abstract

Breaking down the parallel time into work, idleness, and overheads is crucial for assessing the performance of HPC applications, but difficult to measure in asynchronous dependent tasking runtime systems. No existing tools allow its measurement portably and accurately. This paper introduces POT: a tool-suite for dependent task-based applications performance measurement. We focus on its low-disturbance methodology consisting of task modeling, discrete-event tracing, and post-mortem simulation-based analysis. It supports the OMPT standard OpenMP specifications. The paper evaluates the precision of POT's parallel time breakdown analysis on LLVM and MPC implementations and shows that measurement bias may be neglected above 16µs workload per task, portably across two architectures and OpenMP runtime systems

An Overview on Mixing MPI and OpenMP Dependent Tasking on A64FX
Romain Pereira Adrien Roussel Miwako Tsuji Patrick Carribault Sato Mitsuhisa Hitoshi Murai Thierry Gautier
HPCAsia 2024 Workshops Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Workshops, 2024

abstract

Abstract

The adoption of ARM processor architectures is on the rise in the HPC ecosystem. Fugaku supercomputer is a homogeneous ARMbased machine, and is one among the most powerful machine in the world. In the programming world, dependent task-based programming models are gaining tractions due to their many advantages like dynamic load balancing, implicit expression of communication/computation overlap, early-bird communication posting,... MPI and OpenMP are two widespreads programming standards that make possible task-based programming at a distributed memory level. Despite its many advantages, mixed-use of the standard programming models using dependent tasks is still under-evaluated on large-scale machines. In this paper, we provide an overview on mixing OpenMP dependent tasking model with MPI with the state-of-the-art software stack (GCC-13, Clang17, MPC-OMP). We provide the level of performances to expect by porting applications to such mixed-use of the standard on the Fugaku supercomputers, using two benchmarks (Cholesky, HPCCG) and a proxy-application (LULESH). We show that software stack, resource binding and communication progression mechanisms are factors that have a significant impact on performance. On distributed applications, performances reaches up to 80% of effiency for task-based applications like HPCCG. We also point-out a few areas of improvements in OpenMP runtimes.

Experimenting with Hybrid Quantum Optimization in HPC Software Stack for CPU Register Allocation
Brice Chichereau Stéphane Vialle Patrick Carribault
IEEE International Conference on Quantum Computing and Engineering, 2023

abstract

Abstract

Quantum computers exploit the particular behavior of quantum physical systems to solve some problems in a different way than classical computers. We are now approaching the point where quantum computing could provide real advantages over classical methods. The computational capabilities of quantum systems will soon be available in future supercomputer architectures as hardware accelerators called Quantum Processing Units (QPU). From optimizing compilers to task scheduling, the High-Performance Computing (HPC) software stack could benefit from the advantages of quantum computing. We look here at the problem of register allocation, a crucial part of modern optimizing compilers. We propose a simple proof-of-concept hybrid quantum algorithm based on QAOA to solve this problem. We implement the algorithm and integrate it directly into GCC, a well-known modern compiler. The performance of the algorithm is evaluated against the simple Chaitin-Briggs heuristic as well as GCC's register allocator. While our proposed algorithm lags behind GCC's modern heuristics, it is a good first step in the design of useful quantum algorithms for the classical HPC software stack.

Investigating Dependency Graph Discovery Impact on Task-based MPI+OpenMP Applications Performances
Romain Pereira Adrien Roussel Patrick Carribault Thierry Gautier
52nd International Conference on Parallel Processing (ICPP 2023), 2023

abstract

Abstract

The architecture of supercomputers is evolving to expose massive parallelism. MPI and OpenMP are widely used in application codes on the largest supercomputers in the world. The community primarily focused on composing MPI with OpenMP before its version 3.0 introduced task-based programming. Recent advances in OpenMP task model and its interoperability with MPI enabled fine model composition and seamless support for asynchrony. Yet, OpenMP tasking overheads limit the gain of task-based applications over their historical loop parallelization (parallel for construct). This paper identifies the OpenMP task dependency graph discovery speed as a limiting factor in the performance of task-based applications. We study its impact on intra and inter-node performances over two benchmarks (Cholesky, HPCG) and a proxy-application (LULESH). We evaluate the performance impacts of several discovery optimizations, and introduce a persistent task dependency graph reducing overheads by a factor up to 15 at run-time. We measure 2x speedup over parallel for versions weak scaled to 16K cores, due to improved cache memory use and communication overlap, enabled by task refinement and depth-first scheduling.

Suspending OpenMP Tasks on Asynchronous Events: Extending the Taskwait Construct
Romain Pereira Maël Martin Adrien Roussel Thierry Gautier Patrick Carribault
IWOMP 23 - International Workshop on OpenMP, 2023

abstract

Abstract

Many-core and heterogeneous architectures now require programmers to compose multiple asynchronous programming model to fully exploit hardware capabilities. As a shared-memory parallel programming model, OpenMP has the responsibility of orchestrating the suspension and progression of asynchronous operations occurring on a compute node, such as MPI communications or CUDA/HIP streams. Yet, specifications only come with the task detach(event) API to suspend tasks until an asynchronous operation is completed, which presents a few drawbacks. In this paper, we introduce the design and implementation of an extension on the taskwait construct to suspend a task until an asynchronous event completion. It aims to reduce runtime costs induced by the current solution, and to provide a standard API to automate portable task suspension solutions. The results show twice less overheads compared to the existing task detach clause.

Relative Performance Projection on Arm Architectures
Clément Gavoille Hugo Taboada Patrick Carribault Fabrice Dupros Brice Goglin Emmanuel Jeannot
Euro-Par 2022: Parallel Processing - 28th International Conference on Parallel and Distributed Computing, Glasgow, UK, August 22-26, 2022, Proceedings, Springer, p. 85-99, 2022

MPI detach - Towards automatic asynchronous local completion
Joachim Protze Marc-André Hermanns Matthias S. Müller Van Man Nguyen Julien Jaeger Emmanuelle Saillard Patrick Carribault Denis Barthou
Parallel Comput., p. 102859, 2022

Enhancing MPI+OpenMP Task based Applications for Heterogenous Architectures with GPU support
Manuel Ferat Romain Pereira Adrien Roussel Patrick Carribault Luiz Angelo Steffenel Thierry Gautier
IWOMP 2022 - 18th International Workshop on OpenMP, p. 1-14, 2022

abstract

Abstract

Heterogeneous supercomputers are widespread over HPC systems and programming efficient applications on these architectures is a challenge. Task-based programming models are a promising way to tackle this challenge. Since OpenMP 4.0 and 4.5, the target directives enable to offload pieces of code to GPUs and to express it as tasks with dependencies. Therefore, heterogeneous machines can be programmed using MPI+OpenMP(task+target) to exhibit a very high level of concurrent asynchronous operations for which data transfers, kernel executions, communications and CPU computations can be overlapped. Hence, it is possible to suspend tasks performing these asynchronous operations on the CPUs and to overlap their completion with another task execution. Suspended tasks can resume once the associated asynchronous event is completed in an opportunistic way at every scheduling point. We have integrated this feature into the MPC framework and validated it on a AXPY microbenchmark and evaluated on a MPI+OpenMP(tasks) implementation of the LULESH proxy applications. The results show that we are able to improve asynchronism and the overall HPC performance, allowing applications to benefit from asynchronous execution on heterogeneous machines.

ETP4HPC's SRA 5-Strategic Research Agenda for High-Performance Computing in Europe-2022
Michael Malms Laurent Cargemel Estela Suarez Nico Mittenzwey Marc Duranton Sakir Sezer Craig Prunty Pascale Rosse-Laurent Maria Perez-Harnandez Manolis Marazakis Cristiano Malossi Francois Bodin Jean-Francois Lavignon Jean-Philippe Nominé Mark Asch Ovidiu Vermesan Peter Bauer Stephane Requena Alberto Scionti Alexandru Costan Andrea Ferretti Angelos Bilas Ani Anciaux-Sedrakian Anna Queralt Antonio Peña Benjamin Depardon Carmine D'Amico Christophe Calvin Christos Kozanitis Colin Morey Daniel Molka Dario Garcia-Gasulla Dirk Hartmann Edouard Audit Emeric Brun Fabien Chaix France Boillod-Cerneux Gilad Shainer Gilles Wiber Guillaume Colin de Verdière Jacques-Charles Lafoucrière Jean-Marc Denis Jean-Thomas Acquaviva Jordi Guitart Julien Bigot Julita Corbolan Gomez Bautista Arturo Leonardo Lillit Axner Luke Mason Manolis Ploumidis Marc Casas Marc Perache Matthieu Hautreux Miguel Vazquez Nejc Bat Nicolas Bergeret Nicolas Tonello Nils Wedi Olivier Marsden Olivier Terzo Osman Unsal Patrick Carribault Petar Radojkovic Philippe Bricard Philippe Deniel Polyvios Pratikakis Ramon Nou Ricard Borrell Richard Graham Robin Pinning Rossen Apostolov Sabri Pllana Sinead Ryan Somnath Mazumdar Stefano Markidis Sven-Arne Reinemo Thierry Goubier Tiago Quintino Utz-Uwe Haus Valentin Plugaru Valeria Bartsch Vassil Alexandrov Vassilis Papaefstathiou Vicenc Beltran Xavier Martorell Xing Cai Yannis Papaefstathiou Yolanda Becerra
Zenodo, 2022

abstract

Abstract

This document feeds research and development priorities devel-oped by the European HPC ecosystem into EuroHPC’s Research and Innovation Advisory Group with an aim to define the HPC Technology research Work Programme and the calls for proposals included in it and to be launched from 2023 to 2026. This SRA also describes the major trends in the deployment of HPC and HPDA methods and systems, driven by economic and societal needs in Europe, taking into account the changes ex-pected in the technologies and architectures of the expanding underlying IT infrastructure. The goal is to draw a complete pic-ture of the state of the art and the challenges for the next three to four years rather than to focus on specific technologies, implementations or solutions.

Exploring Space-Time Trade-Off in Backtraces
Jean-Baptiste Besnard Julien Adam Allen D. Malony Sameer Shende Julien Jaeger Patrick Carribault Marc Pérache
Tools for High Performance Computing 2018 / 2019, Springer International Publishing, p. 151-168, 2021

abstract

Abstract

The backtrace is one of the most common operations done by profiling and debugging tools. It consists in determining the nesting of functions leading to the current execution state. Frameworks and standard libraries provide facilities enabling this operation, however, it generally incurs both computational and memory costs. Indeed, walking the stack up and then possibly resolving functions pointers (to function names) before storing them can lead to non-negligible costs. In this paper, we propose to explore a means of extracting optimized backtraces with an O(1) storage size by defining the notion of stack tags. We define a new data-structure that we called a hashed-trie used to encode stack traces at runtime through chained hashing. Our process called stack-tagging is implemented in a GCC plugin, enabling its use of C and C++ application. A library enabling the decoding of stack locators though both static and brute-force analysis is also presented. This work introduces a new manner of capturing execution state which greatly simplifies both extraction and storage which are important issues in parallel profiling.

Enhancing Load-Balancing of MPI Applications with Workshare
Thomas Dionisi Stéphane Bouhrour Julien Jaeger Patrick Carribault Marc Pérache
Proceedings of EuroPar 2021, 2021

Communication-Aware Task Scheduling Strategy in Hybrid MPI+OpenMP Applications
Romain Pereira Adrien Roussel Patrick Carribault Thierry Gautier
IWOMP 2021 - 17th International Workshop on OpenMP, p. 1-15, 2021-09

Preliminary Experience with OpenMP Memory Management Implementation
Adrien Roussel Patrick Carribault Julien Jaeger
OpenMP: Portable Multi-Level Parallelism on Modern Systems - 16th International Workshop on OpenMP, IWOMP 2020, Austin, TX, USA, September 22-24, 2020, Proceedings, Springer, p. 313-327, 2020

PARCOACH Extension for Static MPI Nonblocking and Persistent Communication Validation
Van Man Nguyen Emmanuelle Saillard Julien Jaeger Denis Barthou Patrick Carribault
4th IEEE/ACM International Workshop on Software Correctness for HPC Applications, Correctness\@SC 2020, Atlanta, GA, USA, November 11, 2020, IEEE, p. 31-39, 2020

Automatic Code Motion to Extend MPI Nonblocking Overlap Window
Van Man Nguyen Emmanuelle Saillard Julien Jaeger Denis Barthou Patrick Carribault
High Performance Computing - ISC High Performance 2020 International Workshops, Frankfurt, Germany, June 21-25, 2020, Revised Selected Papers, Springer, p. 43-54, 2020

Unifying the Analysis of Performance Event Streams at the Consumer Interface Level
Jean-Baptiste Besnard Allen D. Malony Sameer Shende Marc Pérache Patrick Carribault Julien Jaeger
Tools for High Performance Computing 2017, Springer International Publishing, p. 57-71, 2019

abstract

Abstract

Several instrumentation interfaces have been developed for parallel programs to make observable actions that take place during execution and to make accessible information about the program’s behavior and performance. Following in the footsteps of the successful profiling interface for MPI (PMPI), new rich interfaces to expose internal operation of MPI (MPI-T) and OpenMP (OMPT) runtimes are now in the standards. Taking advantage of these interfaces requires tools to selectively collect events from multiples interfaces by various techniques: function interposition (PMPI), value read (MPI-T), and callbacks (OMPT). In this paper, we present the unified instrumentation pipeline proposed by the MALP infrastructure that can be used to forward a variety of fine-grained events from multiple interfaces online to multi-threaded analysis processes implemented orthogonally with plugins. In essence, our contribution complements “front-end” instrumentation mechanisms by a generic “back-end” event consumption interface that allows “consumer” callbacks to generate performance measurements in various formats for analysis and transport. With such support, online and post-mortem cases become similar from an analysis point of view, making it possible to build more unified and consistent analysis frameworks. The paper describes the approach and demonstrates its benefits with several use cases.

Checkpoint/restart approaches for a thread-based MPI runtime
Julien Adam Maxime Kermarquer Jean-Baptiste Besnard Leonardo Bautista-Gomez Marc Pérache Patrick Carribault Julien Jaeger Allen D. Malony Sameer Shende
Parallel Comput., p. 204-219, 2019

Detecting Non-sibling Dependencies in OpenMP Task-Based Applications
Ricardo Bispo Vieira Antoine Capra Patrick Carribault Julien Jaeger Marc Pérache Adrien Roussel
OpenMP: Conquering the Full Hardware Spectrum - 15th International Workshop on OpenMP, IWOMP 2019, Auckland, New Zealand, September 11-13, 2019, Proceedings, Springer, p. 231-245, 2019

abstract

Abstract

The advent of the multicore era led to the duplication of functional units through an increasing number of cores. To exploit those processors, a shared-memory parallel programming model is one possible direction. Thus, OpenMP is a good candidate to enable different paradigms: data parallelism (including loop-based directives) and control parallelism, through the notion of tasks with dependencies. But this is the programmer responsibility to ensure that data dependencies are complete such as no data races may happen. It might be complex to guarantee that no issue will occur and that all dependencies have been correctly expressed in the context of nested tasks. This paper proposes an algorithm to detect the data dependencies that might be missing on the OpenMP task clauses between tasks that have been generated by different parents. This approach is implemented inside a tool relying on the OMPT interface.

Mixing ranks, tasks, progress and nonblocking collectives
Jean-Baptiste Besnard Julien Jaeger Allen D. Malony Sameer Shende Hugo Taboada Marc Pérache Patrick Carribault
Proceedings of the 26th European MPI Users’ Group Meeting, EuroMPI 2019, Zürich, Switzerland, September 11-13, 2019, ACM, p. 10:1-10:10, 2019

Efficient Communication/Computation Overlap with MPI+OpenMP Runtimes Collaboration
Marc Sergent Mario Dagrada Patrick Carribault Julien Jaeger Marc Pérache Guillaume Papauré
Euro-Par 2018: Parallel Processing - 24th International Conference on Parallel and Distributed Computing, Turin, Italy, August 27-31, 2018, Proceedings, Springer, p. 560-572, 2018

Transparent High-Speed Network Checkpoint/Restart in MPI
Julien Adam Jean-Baptiste Besnard Allen D. Malony Sameer Shende Marc Pérache Patrick Carribault Julien Jaeger
Proceedings of the 25th European MPI Users’ Group Meeting, Barcelona, Spain, September 23-26, 2018, ACM, p. 12:1-12:11, 2018

Profile-guided scope-based data allocation method
Hugo Brunie Julien Jaeger Patrick Carribault Denis Barthou
Proceedings of the International Symposium on Memory Systems, MEMSYS 2018, Old Town Alexandria, VA, USA, October 01-04, 2018, ACM, p. 169-182, 2018

Contemporary High Performance Computing
Mickaël Amiet Patrick Carribault Elisabeth Charon Guillaume Colin Verdière Philippe Deniel Gilles Grospellier Guénolé Harel François Jollet Jacques-Charles Lafoucrière Jacques-Bernard Lekien Stéphane Mathieu Marc Pérache Jean-Christophe Weill Gilles Wiber
Chapman; Hall/CRC, p. 45-74, 2017

Towards a Better Expressiveness of the Speedup Metric in MPI Context
Jean-Baptiste Besnard Allen D. Malony Sameer Shende Marc Pérache Patrick Carribault Julien Jaeger
46th International Conference on Parallel Processing Workshops, ICPP Workshops 2017, Bristol, United Kingdom, August 14-17, 2017, IEEE Computer Society, p. 251-260, 2017

User Co-scheduling for MPI+OpenMP Applications Using OpenMP Semantics
Antoine Capra Patrick Carribault Jean-Baptiste Besnard Allen D. Malony Marc Pérache Julien Jaeger
Scaling OpenMP for Exascale Performance and Portability - 13th International Workshop on OpenMP, IWOMP 2017, Stony Brook, NY, USA, September 20-22, 2017, Proceedings, Springer, p. 203-216, 2017

Resource-Management Study in HPC Runtime-Stacking Context
Arthur Loussert Benoit Welterlen Patrick Carribault Julien Jaeger Marc Pérache Raymond Namyst
29th International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2017, Campinas, Brazil, October 17-20, 2017, IEEE Computer Society, p. 177-184, 2017

Introducing Task-Containers as an Alternative to Runtime-Stacking
Jean-Baptiste Besnard Julien Adam Sameer Shende Marc Pérache Patrick Carribault Julien Jaeger
Proceedings of the 23rd European MPI Users’ Group Meeting, EuroMPI 2016, Edinburgh, United Kingdom, September 25-28, 2016, ACM, p. 51-63, 2016

Fine-grain data management directory for OpenMP 4.0 and OpenACC
Julien Jaeger Patrick Carribault Marc Pérache
Concurr. Comput. Pract. Exp., p. 1528-1539, 2015

An MPI Halo-Cell Implementation for Zero-Copy Abstraction
Jean-Baptiste Besnard Allen D. Malony Sameer Shende Marc Pérache Patrick Carribault Julien Jaeger
Proceedings of the 22nd European MPI Users’ Group Meeting, EuroMPI 2015, Bordeaux, France, September 21-23, 2015, ACM, p. 3:1-3:9, 2015

Correctness Analysis of MPI-3 Non-Blocking Communications in PARCOACH
Julien Jaeger Emmanuelle Saillard Patrick Carribault Denis Barthou
Proceedings of the 22nd European MPI Users' Group Meeting, EuroMPI 2015, Bordeaux, France, September 21-23, 2015, ACM, p. 16:1-16:2, 2015

Improving MPI communication overlap with collaborative polling
Sylvain Didelot Patrick Carribault Marc Pérache William Jalby
Computing, p. 263-278, 2014

Evaluation of OpenMP Task Scheduling Algorithms for Large NUMA Architectures
Jérôme Clet-Ortega Patrick Carribault Marc Pérache
Euro-Par 2014 Parallel Processing - 20th International Conference, Porto, Portugal, August 25-29, 2014. Proceedings, Springer, p. 596-607, 2014

Optimizing Collective Operations in Hybrid Applications
Aurèle Mahéo Patrick Carribault Marc Pérache William Jalby
21st European MPI Users’ Group Meeting, EuroMPI/ASIA ’14, Kyoto, Japan - September 09 - 12, ACM, p. 121, 2014

Data-Management Directory for OpenMP 4.0 and OpenACC
Julien Jaeger Patrick Carribault Marc Pérache
Euro-Par 2013: Parallel Processing Workshops - BigDataCloud, DIHC, FedICI, HeteroPar, HiBB, LSDVE, MHPC, OMHI, PADABS, PROPER, Resilience, ROME, and UCHPC 2013, Aachen, Germany, August 26-27, 2013. Revised Selected Papers, Springer, p. 168-177, 2013

Contemporary High Performance Computing: From Petascale toward Exascale
Jeffer Vetter Jack Dongarra Piot Muszcek Wu-Chun Feng Kirk Cameron Thomas Scoogland Mickaël Amiet Patrick Carribault Elisabeth Charon Philippe Deniel Gilles Grospellier Guenole Harel François Jollet Jacques-Charles Lafoucriere Stephane Mathieu Marc Pérache Jean-Christophe Weill Gilles Wiber Guillaume Colin de Verdiere
Chapman; Hall/CRC, 2013

abstract

Abstract

Contemporary High Performance Computing: From Petascale toward Exascale focuses on the ecosystems surrounding the world’s leading centers for high performance computing (HPC). It covers many of the important factors involved in each ecosystem: computer architectures, software, applications, facilities, and sponsors. The first part of the book examines significant trends in HPC systems, including computer architectures, applications, performance, and software. It discusses the growth from terascale to petascale computing and the influence of the TOP500 and Green500 lists. The second part of the book provides a comprehensive overview of 18 HPC ecosystems from around the world. Each chapter in this section describes programmatic motivation for HPC and their important applications; a flagship HPC system overview covering computer architecture, system software, programming systems, storage, visualization, and analytics support; and an overview of their data center/facility. The last part of the book addresses the role of clouds and grids in HPC, including chapters on the Magellan, FutureGrid, and LLGrid projects. With contributions from top researchers directly involved in designing, deploying, and using these supercomputing systems, this book captures a global picture of the state of the art in HPC.

Hierarchical Local Storage: Exploiting Flexible User-Data Sharing Between MPI Tasks
Marc Tchiboukdjian Patrick Carribault Marc Pérache
26th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2012, Shanghai, China, May 21-25, 2012, IEEE Computer Society, p. 366-377, 2012

Adaptive OpenMP for Large NUMA Nodes
Aurèle Mahéo Souad Koliai Patrick Carribault Marc Pérache William Jalby
OpenMP in a Heterogeneous World - 8th International Workshop on OpenMP, IWOMP 2012, Rome, Italy, June 11-13, 2012. Proceedings, Springer, p. 254-257, 2012

Improving MPI Communication Overlap with Collaborative Polling
Sylvain Didelot Patrick Carribault Marc Pérache William Jalby
Recent Advances in the Message Passing Interface - 19th European MPI Users’ Group Meeting, EuroMPI 2012, Vienna, Austria, September 23-26, 2012. Proceedings, Springer, p. 37-46, 2012

Thread-Local Storage Extension to Support Thread-Based MPI/OpenMP Applications
Patrick Carribault Marc Pérache Hervé Jourdren
OpenMP in the Petascale Era - 7th International Workshop on OpenMP, IWOMP 2011, Chicago, IL, USA, June 13-15, 2011. Proceedings, Springer, p. 80-93, 2011

User level DB: a debugging API for user-level thread libraries
Kevin Pouget Marc Pérache Patrick Carribault Hervé Jourdren
24th IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2010, Atlanta, Georgia, USA, 19-23 April 2010 - Workshop Proceedings, IEEE, p. 1-7, 2010

Enabling Low-Overhead Hybrid MPI/OpenMP Parallelism with MPC
Patrick Carribault Marc Pérache Hervé Jourdren
Beyond Loop Level Parallelism in OpenMP: Accelerators, Tasking and More, 6th Internationan Workshop on OpenMP, IWOMP 2010, Tsukuba, Japan, June 14-16, 2010, Proceedings, Springer, p. 1-14, 2010

MPC-MPI: An MPI Implementation Reducing the Overall Memory Consumption
Marc Pérache Patrick Carribault Hervé Jourdren
Recent Advances in Parallel Virtual Machine and Message Passing Interface, 16th European PVM/MPI Users’ Group Meeting, Espoo, Finland, September 7-10, 2009. Proceedings, Springer, p. 94-103, 2009

Patrick CARRIBAULT

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract