LiHPC | Laboratoire en Informatique Haute Performance pour le Calcul et la Simulation

Ingénieur-Chercheur en Informatique, directeur de recherche CEA, expert Fellow CEA et titulaire d’une HDR en Informatique, Marc Pérache est en charge de coordonner l’adaptation des logiciels et modèles de programmations aux calculateurs actuels et futurs.

Sa thématique de recherche principale de recherche porte sur les supports exécutifs proposant des modèles de programmation parallèle, en particulier MPI en contexte multithreadé. Ces travaux ont pour cibles les architectures massivement parallèles comme les supercalculateurs du TOP 500.

Marc Pérache a encadré 10 thèses (+ 3 en cours) et est co-auteur de plus de 30 articles dans des conférences et journaux.

To Share or Not to Share: a case for MPI in Shared-Memory
Julien Adam Jean-Baptiste Besnard Adrien Roussel Julien Jaeger Romain Pereira Patrick Carribault Marc Pérache
European MPI Users' Group Meeting, 2024

abstract

Abstract

The evolution of parallel computing architectures presents new challenges for developing efficient parallelized codes. The emergence of heterogeneous systems has given rise to multiple programming models, each requiring careful adaptation to maximize performance. In this context, we propose reevaluating memory layout designs for computational tasks within larger nodes by comparing various architectures. To gain insight into the performance discrepancies between shared memory and shared-address space settings, we systematically measure the bandwidth between cores and sockets using different methodologies. Our findings reveal significant differences in performance, suggesting that MPI running inside UNIX processes may not fully utilize its intranode bandwidth potential. In light of our work in the MPC thread-based MPI runtime, which can leverage shared memory to achieve higher performance due to its optimized layout, we advocate for enabling the use of shared memory within the MPI standard.

RED-SEA Project: Towards a new-generation European interconnect
Maria Engracia Gomez Julio Sahuquillo Andrea Biagioni Nikos Chrysos Damien Berton Ottorino Frezza Francesca Lo Cicero Alessandro Lonardo Michele Martinelli Pier Stanislao Paolucci Elena Pastorelli Francesco Simula Matteo Turisini Piero Vicini Roberto Ammendola Carlotta Chiarini Chiara De Luca Fabrizio Capuani Adrián Castelló Jose Duro Eugenio Stabile Enrique Quintana Pascale Bernier-Bruna Claire Chen Pierre-Axel Lagadec Gregoire Pichon Etienne Walter Manolis Katevenis Sokratis Bartzis Orestis Mousouros Pantelis Xirouchakis Vangelis Mageiropoulos Michalis Gianioudis Harisis Loukas Aggelos Ioannou Nikos Kallimanis Miguel Sanchez de la Rosa Gabriel Gomez-Lopez Francisco Alfaro-Cortés Jesus Escudero Sahuquillo Pedro Javier Garcia Francisco J. Quiles Jose L. Sanchez Gaetan De Gassowski Matthieu Hautreaux Stephane Mathieu Gilles Moreau Marc Pérache Hugo Taboada Torsten Hoefler Timo Schneider Matteo Barnaba Giuseppe Piero Brandino Francesco De Giorgi Matteo Poggi Iakovos Mavroidis Yannis Papaefstathiou Nikolaos Tampouratzis Benjamin Kalisch Ulrich Krackhardt Mondrian Nuessle Wolfang Frings Dominik Gottwald Felime Guimaraes Max Holicki Volker Marx Yannik Muller Carsten Clauss Hugo Falter Xu Huang Jennifer Lopez Barillao Thomas Moschny Simon Pickartz
Microprocessors and Microsystems, Volume 110, October 2024, 105102, 2024

abstract

Abstract

RED-SEA is a H2020 EuroHPC project, whose main objective is to prepare a new-generation European Interconnect, capable of powering the EU Exascale systems to come, through an economically viable and technologically efficient interconnect, leveraging European interconnect technology (BXI) associated with standard and mature technology (Ethernet), previous EU-funded initiatives, as well as open standards and compatible APIs. To achieve this objective, the RED-SEA project is being carried out around four key pillars: (i) network architecture and workload requirements-interconnects co-design – aiming at optimizing the fit with the other EuroHPC projects and with the EPI processors; (ii) development of a high-performance, low-latency, seamless bridge with Ethernet; (iii) efficient network resource management, including congestion and Quality-of-Service; and (iv) end-to-end functions implemented at the network edges. This paper presents key achievements and results at the midterm of the project for each key pillar in the way to reach the final project objective. In this regard we can highlight: (i) The definition of the network requirements and architecture as well as a list of benchmarks and applications; (ii) In addition to initially planned IPs progress, BXI3 architecture has evolved to support natively Ethernet at low level, resulting in reduced complexity, with advantages in terms of cost optimization, and power consumption; (iii) The congestion characterization of target applications and proposals to reduce this congestion by the optimization of collective communication primitives, injection throttling and adaptive routing; and (iv) the low-latency high-message rate endpoint functions and their connection with new open technologies.

Generating and Scaling a Multi-Language Test-Suite for MPI
Julien Adam J.B. Besnard Paul Canat Sameer Shende Hugo Taboada Adrien Roussel Marc Pérache Julien Jaeger
EuroMPI'23, 2023

abstract

Abstract

High-Performance Computing (HPC) is currently facing significant challenges. The hardware pressure has become increasingly difficult to manage due to the lack of parallel abstractions in applications. As a result, parallel programs must undergo drastic evolution to effectively exploit underlying hardware parallelism. Failure to do so results in inefficient code. In this pressing environment, parallel runtimes play a critical role, and their esting becomes crucial. This paper focuses on the MPI interface and leverages the MPI binding tools to develop a multi-language test-suite for MPI. By doing so and building on previous work from the Forum’s document editors, we implement a systematic testing of MPI symbols in the context of the Parallel Computing Validation System (PCVS), which is an HPC validation platform dedicated to running and managing test-suites at scale. We first describe PCVS, then outline the process of generating the MPI API test suite, and finally, run these tests at scale. All data sets, code generators, and implementations are made available in open-source to the community. We also set up a dedicated website showcasing the results, which self-updates thanks to the Spack package manager.

MPI Application Binary Interface Standardization
Jeff Hammond Lisandro Dalcin Erik Schnetter Marc Pérache J.B. Besnarb Jed Brown Gonzalo Brito Gadeschi Simon Byrne Joseph Schuchart Hui Zhou
EuroMPI'23, 2023

abstract

Abstract

MPI is the most widely used interface for high-performance computing (HPC) workloads. Its success lies in its embrace of libraries and ability to evolve while maintaining backward compatibility for older codes, enabling them to run on new architectures for many years. In this paper, we propose a new level of MPI compatibility: a standard Application Binary Interface (ABI). We review the history of MPI implementation ABIs, identify the constraints from the MPI standard and ISO C, and summarize recent efforts to develop a standard ABI for MPI. We provide the current proposal from the MPI Forum’s ABI working group, which has been prototyped both within MPICH and as an independent abstraction layer called Mukautuva. We also list several use cases that would benefit from the definition of an ABI while outlining the remaining constraints.

Performance Improvements of Parallel Applicationsthanks to MPI-4.0 Hints
Maxim Moraru Adrien Roussel Hugo Taboada Christophe Jaillet Michael Krajecki Marc Pérache
Proceedings of SBAC-PAD 2022, IEEE, 2022

abstract

Abstract

HPC systems have experienced significant growth over the past years, with modern machines having hundreds of thousands of nodes. Message Passing Interface (MPI) is the de facto standard for distributed computing on these architectures. On the MPI critical path, the message-matching process is one of the most time-consuming operations. In this process, searching for a specific request in a message queue represents a significant part of the communication latency. So far, no miracle algorithm performs well in all cases. This paper explores potential matching specializations thanks to hints introduced in the latest MPI 4.0 standard. We propose a hash-table-based algorithm that performs constant time message-matching for no wildcard requests. This approach is suitable for intensive point-to-point communication phases in many applications (more than 50% of CORAL benchmarks). We demonstrate that our approach can improve the overall execution time of real HPC applications by up to 25%. Also, we analyze the limitations of our method and propose a strategy for identifying the most suitable algorithm for a given application. Indeed, we apply machine learning techniques for classifying applications depending on their message pattern characteristics.

ETP4HPC's SRA 5-Strategic Research Agenda for High-Performance Computing in Europe-2022
Michael Malms Laurent Cargemel Estela Suarez Nico Mittenzwey Marc Duranton Sakir Sezer Craig Prunty Pascale Rosse-Laurent Maria Perez-Harnandez Manolis Marazakis Cristiano Malossi Francois Bodin Jean-Francois Lavignon Jean-Philippe Nominé Mark Asch Ovidiu Vermesan Peter Bauer Stephane Requena Alberto Scionti Alexandru Costan Andrea Ferretti Angelos Bilas Ani Anciaux-Sedrakian Anna Queralt Antonio Peña Benjamin Depardon Carmine D'Amico Christophe Calvin Christos Kozanitis Colin Morey Daniel Molka Dario Garcia-Gasulla Dirk Hartmann Edouard Audit Emeric Brun Fabien Chaix France Boillod-Cerneux Gilad Shainer Gilles Wiber Guillaume Colin de Verdière Jacques-Charles Lafoucrière Jean-Marc Denis Jean-Thomas Acquaviva Jordi Guitart Julien Bigot Julita Corbolan Gomez Bautista Arturo Leonardo Lillit Axner Luke Mason Manolis Ploumidis Marc Casas Marc Perache Matthieu Hautreux Miguel Vazquez Nejc Bat Nicolas Bergeret Nicolas Tonello Nils Wedi Olivier Marsden Olivier Terzo Osman Unsal Patrick Carribault Petar Radojkovic Philippe Bricard Philippe Deniel Polyvios Pratikakis Ramon Nou Ricard Borrell Richard Graham Robin Pinning Rossen Apostolov Sabri Pllana Sinead Ryan Somnath Mazumdar Stefano Markidis Sven-Arne Reinemo Thierry Goubier Tiago Quintino Utz-Uwe Haus Valentin Plugaru Valeria Bartsch Vassil Alexandrov Vassilis Papaefstathiou Vicenc Beltran Xavier Martorell Xing Cai Yannis Papaefstathiou Yolanda Becerra
Zenodo, 2022

abstract

Abstract

This document feeds research and development priorities devel-oped by the European HPC ecosystem into EuroHPC’s Research and Innovation Advisory Group with an aim to define the HPC Technology research Work Programme and the calls for proposals included in it and to be launched from 2023 to 2026. This SRA also describes the major trends in the deployment of HPC and HPDA methods and systems, driven by economic and societal needs in Europe, taking into account the changes ex-pected in the technologies and architectures of the expanding underlying IT infrastructure. The goal is to draw a complete pic-ture of the state of the art and the challenges for the next three to four years rather than to focus on specific technologies, implementations or solutions.

Exploring Space-Time Trade-Off in Backtraces
Jean-Baptiste Besnard Julien Adam Allen D. Malony Sameer Shende Julien Jaeger Patrick Carribault Marc Pérache
Tools for High Performance Computing 2018 / 2019, Springer International Publishing, p. 151-168, 2021

abstract

Abstract

The backtrace is one of the most common operations done by profiling and debugging tools. It consists in determining the nesting of functions leading to the current execution state. Frameworks and standard libraries provide facilities enabling this operation, however, it generally incurs both computational and memory costs. Indeed, walking the stack up and then possibly resolving functions pointers (to function names) before storing them can lead to non-negligible costs. In this paper, we propose to explore a means of extracting optimized backtraces with an O(1) storage size by defining the notion of stack tags. We define a new data-structure that we called a hashed-trie used to encode stack traces at runtime through chained hashing. Our process called stack-tagging is implemented in a GCC plugin, enabling its use of C and C++ application. A library enabling the decoding of stack locators though both static and brute-force analysis is also presented. This work introduces a new manner of capturing execution state which greatly simplifies both extraction and storage which are important issues in parallel profiling.

Benefits of MPI Sessions for GPU MPI applications
Maxim Moraru Adrien Roussel Hugo Taboada Christophe Jaillet Michael Krajecki Marc Pérache
Proceedings of EuroMPI 2021, 2021

abstract

Abstract

Heterogeneous supercomputers are now considered the most valuable solution to reach the Exascale. Nowadays, we can frequently observe that compute nodes are composed of more than one GPU accelerator. Programming such architectures efficiently is challenging. MPI is the defacto standard for distributed computing. CUDAaware libraries were introduced to ease GPU inter-nodes communications. However, they induce some overhead that can degrade overall performances. MPI 4.0 Specification draft introduces the MPI Sessions model which offers the ability to initialize specific resources for a specific component of the application. In this paper, we present a way to reduce the overhead induced by CUDA-aware libraries with a solution inspired by MPI Sessions. In this way, we minimize the overhead induced by GPUs in an MPI context and allow to improve CPU + GPU programs efficiency. We evaluate our approach on various micro-benchmarks and some proxy applications like Lulesh, MiniFE, Quicksilver, and Cloverleaf. We demonstrate how this approach can provide up to a 7x speedup compared to the standard MPI model.

On-the-Fly, Robust Translation of MPI Libraries
Edgar A. Léon Marc Joos Nathan Hanford Adrien Cotte Tony Delforge François Diakhaté Vincent Ducrot Ian Karlin Marc Pérache
Proceedings of Cluster 2021, 2021

Enhancing Load-Balancing of MPI Applications with Workshare
Thomas Dionisi Stéphane Bouhrour Julien Jaeger Patrick Carribault Marc Pérache
Proceedings of EuroPar 2021, 2021

Overlapping MPI communications with Intel TBB computation
Cassandra Rocha Barbosa Pierre Lemarinier Marc Sergent Guillaume Papauré Marc Pérache
2020 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2020, New Orleans, LA, USA, May 18-22, 2020, IEEE, p. 958-966, 2020

Unifying the Analysis of Performance Event Streams at the Consumer Interface Level
Jean-Baptiste Besnard Allen D. Malony Sameer Shende Marc Pérache Patrick Carribault Julien Jaeger
Tools for High Performance Computing 2017, Springer International Publishing, p. 57-71, 2019

abstract

Abstract

Several instrumentation interfaces have been developed for parallel programs to make observable actions that take place during execution and to make accessible information about the program’s behavior and performance. Following in the footsteps of the successful profiling interface for MPI (PMPI), new rich interfaces to expose internal operation of MPI (MPI-T) and OpenMP (OMPT) runtimes are now in the standards. Taking advantage of these interfaces requires tools to selectively collect events from multiples interfaces by various techniques: function interposition (PMPI), value read (MPI-T), and callbacks (OMPT). In this paper, we present the unified instrumentation pipeline proposed by the MALP infrastructure that can be used to forward a variety of fine-grained events from multiple interfaces online to multi-threaded analysis processes implemented orthogonally with plugins. In essence, our contribution complements “front-end” instrumentation mechanisms by a generic “back-end” event consumption interface that allows “consumer” callbacks to generate performance measurements in various formats for analysis and transport. With such support, online and post-mortem cases become similar from an analysis point of view, making it possible to build more unified and consistent analysis frameworks. The paper describes the approach and demonstrates its benefits with several use cases.

Study on progress threads placement and dedicated cores for overlapping MPI nonblocking collectives on manycore processor
Alexandre Denis Julien Jaeger Emmanuel Jeannot Marc Pérache Hugo Taboada
Int. J. High Perform. Comput. Appl., 2019

Checkpoint/restart approaches for a thread-based MPI runtime
Julien Adam Maxime Kermarquer Jean-Baptiste Besnard Leonardo Bautista-Gomez Marc Pérache Patrick Carribault Julien Jaeger Allen D. Malony Sameer Shende
Parallel Comput., p. 204-219, 2019

Detecting Non-sibling Dependencies in OpenMP Task-Based Applications
Ricardo Bispo Vieira Antoine Capra Patrick Carribault Julien Jaeger Marc Pérache Adrien Roussel
OpenMP: Conquering the Full Hardware Spectrum - 15th International Workshop on OpenMP, IWOMP 2019, Auckland, New Zealand, September 11-13, 2019, Proceedings, Springer, p. 231-245, 2019

abstract

Abstract

The advent of the multicore era led to the duplication of functional units through an increasing number of cores. To exploit those processors, a shared-memory parallel programming model is one possible direction. Thus, OpenMP is a good candidate to enable different paradigms: data parallelism (including loop-based directives) and control parallelism, through the notion of tasks with dependencies. But this is the programmer responsibility to ensure that data dependencies are complete such as no data races may happen. It might be complex to guarantee that no issue will occur and that all dependencies have been correctly expressed in the context of nested tasks. This paper proposes an algorithm to detect the data dependencies that might be missing on the OpenMP task clauses between tasks that have been generated by different parents. This approach is implemented inside a tool relying on the OMPT interface.

Mixing ranks, tasks, progress and nonblocking collectives
Jean-Baptiste Besnard Julien Jaeger Allen D. Malony Sameer Shende Hugo Taboada Marc Pérache Patrick Carribault
Proceedings of the 26th European MPI Users’ Group Meeting, EuroMPI 2019, Zürich, Switzerland, September 11-13, 2019, ACM, p. 10:1-10:10, 2019

Dynamic Placement of Progress Thread for Overlapping MPI Non-blocking Collectives on Manycore Processor
Alexandre Denis Julien Jaeger Emmanuel Jeannot Marc Pérache Hugo Taboada
Euro-Par 2018: Parallel Processing - 24th International Conference on Parallel and Distributed Computing, Turin, Italy, August 27-31, 2018, Proceedings, Springer, p. 616-627, 2018

Efficient Communication/Computation Overlap with MPI+OpenMP Runtimes Collaboration
Marc Sergent Mario Dagrada Patrick Carribault Julien Jaeger Marc Pérache Guillaume Papauré
Euro-Par 2018: Parallel Processing - 24th International Conference on Parallel and Distributed Computing, Turin, Italy, August 27-31, 2018, Proceedings, Springer, p. 560-572, 2018

Transparent High-Speed Network Checkpoint/Restart in MPI
Julien Adam Jean-Baptiste Besnard Allen D. Malony Sameer Shende Marc Pérache Patrick Carribault Julien Jaeger
Proceedings of the 25th European MPI Users’ Group Meeting, Barcelona, Spain, September 23-26, 2018, ACM, p. 12:1-12:11, 2018

Contemporary High Performance Computing
Mickaël Amiet Patrick Carribault Elisabeth Charon Guillaume Colin Verdière Philippe Deniel Gilles Grospellier Guénolé Harel François Jollet Jacques-Charles Lafoucrière Jacques-Bernard Lekien Stéphane Mathieu Marc Pérache Jean-Christophe Weill Gilles Wiber
Chapman; Hall/CRC, p. 45-74, 2017

Towards a Better Expressiveness of the Speedup Metric in MPI Context
Jean-Baptiste Besnard Allen D. Malony Sameer Shende Marc Pérache Patrick Carribault Julien Jaeger
46th International Conference on Parallel Processing Workshops, ICPP Workshops 2017, Bristol, United Kingdom, August 14-17, 2017, IEEE Computer Society, p. 251-260, 2017

User Co-scheduling for MPI+OpenMP Applications Using OpenMP Semantics
Antoine Capra Patrick Carribault Jean-Baptiste Besnard Allen D. Malony Marc Pérache Julien Jaeger
Scaling OpenMP for Exascale Performance and Portability - 13th International Workshop on OpenMP, IWOMP 2017, Stony Brook, NY, USA, September 20-22, 2017, Proceedings, Springer, p. 203-216, 2017

Dynamic Load Balancing of Monte Carlo Particle Transport Applications on HPC Clusters
Thomas Gonçalves Marc Pérache Frédéric Desprez Jean-François Méhaut
Parallel Computing is Everywhere, Proceedings of the International Conference on Parallel Computing, ParCo 2017, 12-15 September 2017, Bologna, Italy, IOS Press, p. 465-474, 2017

Resource-Management Study in HPC Runtime-Stacking Context
Arthur Loussert Benoit Welterlen Patrick Carribault Julien Jaeger Marc Pérache Raymond Namyst
29th International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2017, Campinas, Brazil, October 17-20, 2017, IEEE Computer Society, p. 177-184, 2017

Introducing Task-Containers as an Alternative to Runtime-Stacking
Jean-Baptiste Besnard Julien Adam Sameer Shende Marc Pérache Patrick Carribault Julien Jaeger
Proceedings of the 23rd European MPI Users’ Group Meeting, EuroMPI 2016, Edinburgh, United Kingdom, September 25-28, 2016, ACM, p. 51-63, 2016

A Parallel and Resilient Frontend for High Performance Validation Suites
Julien Adam Marc Pérache
High Performance Computing for Computational Science - VECPAR 2016 - 12th International Conference, Porto, Portugal, June 28-30, 2016, Revised Selected Papers, Springer, p. 248-255, 2016

Fine-grain data management directory for OpenMP 4.0 and OpenACC
Julien Jaeger Patrick Carribault Marc Pérache
Concurr. Comput. Pract. Exp., p. 1528-1539, 2015

An MPI Halo-Cell Implementation for Zero-Copy Abstraction
Jean-Baptiste Besnard Allen D. Malony Sameer Shende Marc Pérache Patrick Carribault Julien Jaeger
Proceedings of the 22nd European MPI Users’ Group Meeting, EuroMPI 2015, Bordeaux, France, September 21-23, 2015, ACM, p. 3:1-3:9, 2015

Improving MPI communication overlap with collaborative polling
Sylvain Didelot Patrick Carribault Marc Pérache William Jalby
Computing, p. 263-278, 2014

Evaluation of OpenMP Task Scheduling Algorithms for Large NUMA Architectures
Jérôme Clet-Ortega Patrick Carribault Marc Pérache
Euro-Par 2014 Parallel Processing - 20th International Conference, Porto, Portugal, August 25-29, 2014. Proceedings, Springer, p. 596-607, 2014

Optimizing Collective Operations in Hybrid Applications
Aurèle Mahéo Patrick Carribault Marc Pérache William Jalby
21st European MPI Users’ Group Meeting, EuroMPI/ASIA ’14, Kyoto, Japan - September 09 - 12, ACM, p. 121, 2014

Data-Management Directory for OpenMP 4.0 and OpenACC
Julien Jaeger Patrick Carribault Marc Pérache
Euro-Par 2013: Parallel Processing Workshops - BigDataCloud, DIHC, FedICI, HeteroPar, HiBB, LSDVE, MHPC, OMHI, PADABS, PROPER, Resilience, ROME, and UCHPC 2013, Aachen, Germany, August 26-27, 2013. Revised Selected Papers, Springer, p. 168-177, 2013

Event Streaming for Online Performance Measurements Reduction
Jean-Baptiste Besnard Marc Pérache William Jalby
42nd International Conference on Parallel Processing, ICPP 2013, Lyon, France, October 1-4, 2013, IEEE Computer Society, p. 985-994, 2013

Introducing kernel-level page reuse for high performance computing
Sébastien Valat Marc Pérache William Jalby
Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness, June, 21, 2013, Seattle, Washington, USA, Co-located with PLDI 2013, ACM, p. 3:1-3:9, 2013

Contemporary High Performance Computing: From Petascale toward Exascale
Jeffer Vetter Jack Dongarra Piot Muszcek Wu-Chun Feng Kirk Cameron Thomas Scoogland Mickaël Amiet Patrick Carribault Elisabeth Charon Philippe Deniel Gilles Grospellier Guenole Harel François Jollet Jacques-Charles Lafoucriere Stephane Mathieu Marc Pérache Jean-Christophe Weill Gilles Wiber Guillaume Colin de Verdiere
Chapman; Hall/CRC, 2013

abstract

Abstract

Contemporary High Performance Computing: From Petascale toward Exascale focuses on the ecosystems surrounding the world’s leading centers for high performance computing (HPC). It covers many of the important factors involved in each ecosystem: computer architectures, software, applications, facilities, and sponsors. The first part of the book examines significant trends in HPC systems, including computer architectures, applications, performance, and software. It discusses the growth from terascale to petascale computing and the influence of the TOP500 and Green500 lists. The second part of the book provides a comprehensive overview of 18 HPC ecosystems from around the world. Each chapter in this section describes programmatic motivation for HPC and their important applications; a flagship HPC system overview covering computer architecture, system software, programming systems, storage, visualization, and analytics support; and an overview of their data center/facility. The last part of the book addresses the role of clouds and grids in HPC, including chapters on the Magellan, FutureGrid, and LLGrid projects. With contributions from top researchers directly involved in designing, deploying, and using these supercomputing systems, this book captures a global picture of the state of the art in HPC.

Hierarchical Local Storage: Exploiting Flexible User-Data Sharing Between MPI Tasks
Marc Tchiboukdjian Patrick Carribault Marc Pérache
26th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2012, Shanghai, China, May 21-25, 2012, IEEE Computer Society, p. 366-377, 2012

Adaptive OpenMP for Large NUMA Nodes
Aurèle Mahéo Souad Koliai Patrick Carribault Marc Pérache William Jalby
OpenMP in a Heterogeneous World - 8th International Workshop on OpenMP, IWOMP 2012, Rome, Italy, June 11-13, 2012. Proceedings, Springer, p. 254-257, 2012

Improving MPI Communication Overlap with Collaborative Polling
Sylvain Didelot Patrick Carribault Marc Pérache William Jalby
Recent Advances in the Message Passing Interface - 19th European MPI Users’ Group Meeting, EuroMPI 2012, Vienna, Austria, September 23-26, 2012. Proceedings, Springer, p. 37-46, 2012

Method, computer program and device for managing memory access in a multiprocessor architecture of numa type
Zoltan Menyhart Marc Pérache
2011

Thread-Local Storage Extension to Support Thread-Based MPI/OpenMP Applications
Patrick Carribault Marc Pérache Hervé Jourdren
OpenMP in the Petascale Era - 7th International Workshop on OpenMP, IWOMP 2011, Chicago, IL, USA, June 13-15, 2011. Proceedings, Springer, p. 80-93, 2011

User level DB: a debugging API for user-level thread libraries
Kevin Pouget Marc Pérache Patrick Carribault Hervé Jourdren
24th IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2010, Atlanta, Georgia, USA, 19-23 April 2010 - Workshop Proceedings, IEEE, p. 1-7, 2010

Enabling Low-Overhead Hybrid MPI/OpenMP Parallelism with MPC
Patrick Carribault Marc Pérache Hervé Jourdren
Beyond Loop Level Parallelism in OpenMP: Accelerators, Tasking and More, 6th Internationan Workshop on OpenMP, IWOMP 2010, Tsukuba, Japan, June 14-16, 2010, Proceedings, Springer, p. 1-14, 2010

MPC-MPI: An MPI Implementation Reducing the Overall Memory Consumption
Marc Pérache Patrick Carribault Hervé Jourdren
Recent Advances in Parallel Virtual Machine and Message Passing Interface, 16th European PVM/MPI Users’ Group Meeting, Espoo, Finland, September 7-10, 2009. Proceedings, Springer, p. 94-103, 2009

Efficient Shared Memory Message Passing for Inter-VM Communications
François Diakhaté Marc Pérache Raymond Namyst Hervé Jourdren
Euro-Par 2008 Workshops - Parallel Processing, VHPC 2008, UNICORE 2008, HPPC 2008, SGS 2008, PROPER 2008, ROIA 2008, and DPA 2008, Las Palmas de Gran Canaria, Spain, August 25-26, 2008, Revised Selected Papers, Springer, p. 53-62, 2008

MPC: A Unified Parallel Runtime for Clusters of NUMA Machines
Marc Pérache Hervé Jourdren Raymond Namyst
Euro-Par 2008 - Parallel Processing, 14th International Euro-Par Conference, Las Palmas de Gran Canaria, Spain, August 26-29, 2008, Proceedings, Springer, p. 78-88, 2008

Fine Tuning Matrix Multiplications on Multicore
Stéphane Zuckerman Marc Pérache William Jalby
High Performance Computing - HiPC 2008, 15th International Conference, Bangalore, India, December 17-20, 2008. Proceedings, Springer, p. 30-41, 2008

Contribution à l’élaboration d’environnements de programmation dédiés au calcul scientifique hautes performances
Marc Pérache
These de doctorat, spécialité informatique, CEA, Université de Bordeaux, 2006

Marc PERACHE

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract