Post-Image

Julien JAEGER

Post-Image Post-Image

Dr. Julien Jaeger is a research scientist at LIHPC. He joined the MPC (Multi-Processor Computing) team at CEA in 2012, after defending his PhD thesis in computer science from the University of Versailles Saint-Quentin-En-Yvelines the same year. Since 2019, he has been leading the MPC effort focusing on parallel programming models such as MPI and OpenMP, their scheduling and their interactions on HPC supercomputers. He also actively participates to the MPI Forum, helping to design the next MPI standard.

Generating and Scaling a Multi-Language Test-Suite for MPI
Julien Adam   J.B. Besnard   Paul Canat   Sameer Shende   Hugo Taboada   Adrien Roussel   Marc Pérache   Julien Jaeger  
EuroMPI'23, 2023

abstract

Abstract

High-Performance Computing (HPC) is currently facing significant challenges. The hardware pressure has become increasingly difficult to manage due to the lack of parallel abstractions in applications. As a result, parallel programs must undergo drastic evolution to effectively exploit underlying hardware parallelism. Failure to do so results in inefficient code. In this pressing environment, parallel runtimes play a critical role, and their esting becomes crucial. This paper focuses on the MPI interface and leverages the MPI binding tools to develop a multi-language test-suite for MPI. By doing so and building on previous work from the Forum’s document editors, we implement a systematic testing of MPI symbols in the context of the Parallel Computing Validation System (PCVS), which is an HPC validation platform dedicated to running and managing test-suites at scale. We first describe PCVS, then outline the process of generating the MPI API test suite, and finally, run these tests at scale. All data sets, code generators, and implementations are made available in open-source to the community. We also set up a dedicated website showcasing the results, which self-updates thanks to the Spack package manager.

Towards Achieving Transparent Malleability Thanks to MPI Process Virtualization
Hugo Taboada   Romain Pereira   Julien Jaeger   J.B. Besnard  
ISC High Performance 2023: High Performance Computing pp 28–41, 2023

abstract

Abstract

The field of High-Performance Computing is rapidly evolving, driven by the race for computing power and the emergence of new architectures. Despite these changes, the process of launching programs has remained largely unchanged, even with the rise of hybridization and accelerators. However, there is a need to express more complex deployments for parallel applications to enable more efficient use of these machines. In this paper, we propose a transparent way to express malleability within MPI applications. This process relies on MPI process virtualization, facilitated by a dedicated privatizing compiler and a user-level scheduler. With this framework, using the MPC thread-based MPI context, we demonstrate how code can mold its resources without any software changes, opening the door to transparent MPI malleability. After detailing the implementation and associated interface, we present performance results on representative applications.

A methodology for assessing computation/communication overlap of MPI nonblocking collectives
Alexandre Denis   Julien Jaeger   Emmanuel Jeannot   Florian Reynier  
Concurr. Comput. Pract. Exp., 2022

abstract

Abstract

By allowing computation/communication overlap, MPI nonblocking collectives (NBC) are supposed to improve application scalability and performance. However, it is known that to actually get overlap, the MPI library has to implement progression mechanisms in software or rely on the network hardware. These mechanisms may be present or not, adequate or perfectible, they may have an impact on communication performance or may interfere with computation by stealing CPU cycles. From a user point of view, assessing and understanding the behavior of an MPI library concerning computation/communication overlap is difficult. In this article, we propose a methodology to assess the computation/communication overlap of NBC. We propose new metrics to measure how much communication and computation do overlap, and to evaluate how they interfere with each other. We integrate these metrics into a complete methodology. We compare our methodology with state of the art metrics and benchmarks, and show that ours provides more meaningful informations. We perform experiments on a large panel of MPI implementations and network hardware and show when and why overlap is efficient, nonexistent or even degrades performance.

Towards leveraging collective performance with the support of MPI 4.0 features in MPC
Stéphane Bouhrour   Thibaut Pepin   Julien Jaeger  
Parallel Comput., p. 102860, 2022

MPI detach - Towards automatic asynchronous local completion
Joachim Protze   Marc-André Hermanns   Matthias S. Müller   Van Man Nguyen   Julien Jaeger   Emmanuelle Saillard   Patrick Carribault   Denis Barthou  
Parallel Comput., p. 102859, 2022

One core dedicated to MPI nonblocking communication progression? A model to assess whether it is worth it
Alexandre Denis   Julien Jaeger   Emmanuel Jeannot   Florian Reynier  
22nd IEEE International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2022, Taormina, Italy, May 16-19, 2022, IEEE, p. 736-746, 2022

abstract

Abstract

Overlapping communications with computation is an efficient way to amortize the cost of communications of an HPC application. To do so, it is possible to utilize MPI nonblocking primitives so that communications run in back-ground alongside computation. However, these mechanisms rely on communications actually making progress in the background, which may not be true for all MPI libraries. Some MPI libraries leverage a core dedicated to communications to ensure communication progression. However, taking a core away from the application for such purpose may have a negative impact on the overall execution time. It may be difficult to know when such dedicated core is actually helpful. In this paper, we propose a model for the performance of applications using MPI nonblocking primitives running on top of an MPI library with a dedicated core for communications. This model is used to understand the compromise between computation slowdown due to the communication core not being available for computation, and the communication speed-up thanks to the dedicated core; evaluate whether nonblocking communication is actually obtaining the expected performance in the context of the given application; predict the performance of a given application if ran with a dedicated core. We describe the performance model and evaluate it on different applications. We compare the predictions of the model with actual executions.

Enabling Global MPI Process Addressing in MPI Applications
Jean-Baptiste Besnard   Sameer Shende   Allen D. Malony   Julien Jaeger   Marc Pérache  
EuroMPI/USA'22: 29th European MPI Users' Group Meeting, Chattanooga, TN, USA, September 26 - 28, 2022, ACM, p. 27-36, 2022

Exploring Space-Time Trade-Off in Backtraces
Jean-Baptiste Besnard   Julien Adam   Allen D. Malony   Sameer Shende   Julien Jaeger   Patrick Carribault   Marc Pérache  
Tools for High Performance Computing 2018 / 2019, Springer International Publishing, p. 151-168, 2021

abstract

Abstract

The backtrace is one of the most common operations done by profiling and debugging tools. It consists in determining the nesting of functions leading to the current execution state. Frameworks and standard libraries provide facilities enabling this operation, however, it generally incurs both computational and memory costs. Indeed, walking the stack up and then possibly resolving functions pointers (to function names) before storing them can lead to non-negligible costs. In this paper, we propose to explore a means of extracting optimized backtraces with an O(1) storage size by defining the notion of stack tags. We define a new data-structure that we called a hashed-trie used to encode stack traces at runtime through chained hashing. Our process called stack-tagging is implemented in a GCC plugin, enabling its use of C and C++ application. A library enabling the decoding of stack locators though both static and brute-force analysis is also presented. This work introduces a new manner of capturing execution state which greatly simplifies both extraction and storage which are important issues in parallel profiling.

Enhancing Load-Balancing of MPI Applications with Workshare
Thomas Dionisi   Stéphane Bouhrour   Julien Jaeger   Patrick Carribault   Marc Pérache  
Proceedings of EuroPar 2021, 2021

Partitioned Collective Communication
Daniel J. Holmes   Anthony Skjellum   Julien Jaeger   Ryan E. Grant   Purushotham V. Bangalore   Matthew G. F. Dosanjh   Amanda Bienz   Derek Schafer  
Workshop on Exascale MPI, ExaMPI\@SC 2021, St. Louis, MO, USA, November 14, 2021, IEEE, p. 9-17, 2021

Preliminary Experience with OpenMP Memory Management Implementation
Adrien Roussel   Patrick Carribault   Julien Jaeger  
OpenMP: Portable Multi-Level Parallelism on Modern Systems - 16th International Workshop on OpenMP, IWOMP 2020, Austin, TX, USA, September 22-24, 2020, Proceedings, Springer, p. 313-327, 2020

Implementation and performance evaluation of MPI persistent collectives in MPC: a case study
Stéphane Bouhrour   Julien Jaeger  
EuroMPI/USA '20: 27th European MPI Users' Group Meeting, Virtual Meeting, Austin, TX, USA, September 21-24, 2020, ACM, p. 51-60, 2020

Application-Driven Requirements for Node Resource Management in Next-Generation Systems
Edgar A. León   Balazs Gerofi   Julien Jaeger   Guillaume Mercier   Rolf Riesen   Masamichi Takagi   Brice Goglin  
2020 IEEE/ACM International Workshop on Runtime and Operating Systems for Supercomputers, ROSS\@SC 2020, Atlanta, GA, USA, November 13, 2020, IEEE, p. 1-11, 2020

PARCOACH Extension for Static MPI Nonblocking and Persistent Communication Validation
Van Man Nguyen   Emmanuelle Saillard   Julien Jaeger   Denis Barthou   Patrick Carribault  
4th IEEE/ACM International Workshop on Software Correctness for HPC Applications, Correctness\@SC 2020, Atlanta, GA, USA, November 11, 2020, IEEE, p. 31-39, 2020

Automatic Code Motion to Extend MPI Nonblocking Overlap Window
Van Man Nguyen   Emmanuelle Saillard   Julien Jaeger   Denis Barthou   Patrick Carribault  
High Performance Computing - ISC High Performance 2020 International Workshops, Frankfurt, Germany, June 21-25, 2020, Revised Selected Papers, Springer, p. 43-54, 2020

Unifying the Analysis of Performance Event Streams at the Consumer Interface Level
Jean-Baptiste Besnard   Allen D. Malony   Sameer Shende   Marc Pérache   Patrick Carribault   Julien Jaeger  
Tools for High Performance Computing 2017, Springer International Publishing, p. 57-71, 2019

abstract

Abstract

Several instrumentation interfaces have been developed for parallel programs to make observable actions that take place during execution and to make accessible information about the program’s behavior and performance. Following in the footsteps of the successful profiling interface for MPI (PMPI), new rich interfaces to expose internal operation of MPI (MPI-T) and OpenMP (OMPT) runtimes are now in the standards. Taking advantage of these interfaces requires tools to selectively collect events from multiples interfaces by various techniques: function interposition (PMPI), value read (MPI-T), and callbacks (OMPT). In this paper, we present the unified instrumentation pipeline proposed by the MALP infrastructure that can be used to forward a variety of fine-grained events from multiple interfaces online to multi-threaded analysis processes implemented orthogonally with plugins. In essence, our contribution complements “front-end” instrumentation mechanisms by a generic “back-end” event consumption interface that allows “consumer” callbacks to generate performance measurements in various formats for analysis and transport. With such support, online and post-mortem cases become similar from an analysis point of view, making it possible to build more unified and consistent analysis frameworks. The paper describes the approach and demonstrates its benefits with several use cases.

Study on progress threads placement and dedicated cores for overlapping MPI nonblocking collectives on manycore processor
Alexandre Denis   Julien Jaeger   Emmanuel Jeannot   Marc Pérache   Hugo Taboada  
Int. J. High Perform. Comput. Appl., 2019

Checkpoint/restart approaches for a thread-based MPI runtime
Julien Adam   Maxime Kermarquer   Jean-Baptiste Besnard   Leonardo Bautista-Gomez   Marc Pérache   Patrick Carribault   Julien Jaeger   Allen D. Malony   Sameer Shende  
Parallel Comput., p. 204-219, 2019

Detecting Non-sibling Dependencies in OpenMP Task-Based Applications
Ricardo Bispo Vieira   Antoine Capra   Patrick Carribault   Julien Jaeger   Marc Pérache   Adrien Roussel  
OpenMP: Conquering the Full Hardware Spectrum - 15th International Workshop on OpenMP, IWOMP 2019, Auckland, New Zealand, September 11-13, 2019, Proceedings, Springer, p. 231-245, 2019

abstract

Abstract

The advent of the multicore era led to the duplication of functional units through an increasing number of cores. To exploit those processors, a shared-memory parallel programming model is one possible direction. Thus, OpenMP is a good candidate to enable different paradigms: data parallelism (including loop-based directives) and control parallelism, through the notion of tasks with dependencies. But this is the programmer responsibility to ensure that data dependencies are complete such as no data races may happen. It might be complex to guarantee that no issue will occur and that all dependencies have been correctly expressed in the context of nested tasks. This paper proposes an algorithm to detect the data dependencies that might be missing on the OpenMP task clauses between tasks that have been generated by different parents. This approach is implemented inside a tool relying on the OMPT interface.

Mixing ranks, tasks, progress and nonblocking collectives
Jean-Baptiste Besnard   Julien Jaeger   Allen D. Malony   Sameer Shende   Hugo Taboada   Marc Pérache   Patrick Carribault  
Proceedings of the 26th European MPI Users’ Group Meeting, EuroMPI 2019, Zürich, Switzerland, September 11-13, 2019, ACM, p. 10:1-10:10, 2019

Checkpoint/restart approaches for a thread-based MPI runtime
Julien Adam   Maxime Kermarquer   Jean-Baptiste Besnard   Leonardo Bautista-Gomez   Marc Pérache   Patrick Carribault   Julien Jaeger   Allen D. Malony   Sameer Shende  
CoRR, 2019

Exposition, clarification, and expansion of MPI semantic terms and conventions: is a nonblocking MPI function permitted to block?
Purushotham V. Bangalore   Rolf Rabenseifner   Daniel J. Holmes   Julien Jaeger   Guillaume Mercier   Claudia Blaas-Schenner   Anthony Skjellum  
Proceedings of the 26th European MPI Users' Group Meeting, EuroMPI 2019, Zürich, Switzerland, September 11-13, 2019, ACM, p. 2:1-2:10, 2019

Dynamic Placement of Progress Thread for Overlapping MPI Non-blocking Collectives on Manycore Processor
Alexandre Denis   Julien Jaeger   Emmanuel Jeannot   Marc Pérache   Hugo Taboada  
Euro-Par 2018: Parallel Processing - 24th International Conference on Parallel and Distributed Computing, Turin, Italy, August 27-31, 2018, Proceedings, Springer, p. 616-627, 2018

Efficient Communication/Computation Overlap with MPI+OpenMP Runtimes Collaboration
Marc Sergent   Mario Dagrada   Patrick Carribault   Julien Jaeger   Marc Pérache   Guillaume Papauré  
Euro-Par 2018: Parallel Processing - 24th International Conference on Parallel and Distributed Computing, Turin, Italy, August 27-31, 2018, Proceedings, Springer, p. 560-572, 2018

Transparent High-Speed Network Checkpoint/Restart in MPI
Julien Adam   Jean-Baptiste Besnard   Allen D. Malony   Sameer Shende   Marc Pérache   Patrick Carribault   Julien Jaeger  
Proceedings of the 25th European MPI Users’ Group Meeting, Barcelona, Spain, September 23-26, 2018, ACM, p. 12:1-12:11, 2018

Progress Thread Placement for Overlapping MPI Non-blocking Collectives Using Simultaneous Multi-threading
Alexandre Denis   Julien Jaeger   Hugo Taboada  
Euro-Par 2018: Parallel Processing Workshops - Euro-Par 2018 International Workshops, Turin, Italy, August 27-28, 2018, Revised Selected Papers, Springer, p. 123-133, 2018

Profile-guided scope-based data allocation method
Hugo Brunie   Julien Jaeger   Patrick Carribault   Denis Barthou  
Proceedings of the International Symposium on Memory Systems, MEMSYS 2018, Old Town Alexandria, VA, USA, October 01-04, 2018, ACM, p. 169-182, 2018

Towards a Better Expressiveness of the Speedup Metric in MPI Context
Jean-Baptiste Besnard   Allen D. Malony   Sameer Shende   Marc Pérache   Patrick Carribault   Julien Jaeger  
46th International Conference on Parallel Processing Workshops, ICPP Workshops 2017, Bristol, United Kingdom, August 14-17, 2017, IEEE Computer Society, p. 251-260, 2017

User Co-scheduling for MPI+OpenMP Applications Using OpenMP Semantics
Antoine Capra   Patrick Carribault   Jean-Baptiste Besnard   Allen D. Malony   Marc Pérache   Julien Jaeger  
Scaling OpenMP for Exascale Performance and Portability - 13th International Workshop on OpenMP, IWOMP 2017, Stony Brook, NY, USA, September 20-22, 2017, Proceedings, Springer, p. 203-216, 2017

Resource-Management Study in HPC Runtime-Stacking Context
Arthur Loussert   Benoit Welterlen   Patrick Carribault   Julien Jaeger   Marc Pérache   Raymond Namyst  
29th International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2017, Campinas, Brazil, October 17-20, 2017, IEEE Computer Society, p. 177-184, 2017

Introducing Task-Containers as an Alternative to Runtime-Stacking
Jean-Baptiste Besnard   Julien Adam   Sameer Shende   Marc Pérache   Patrick Carribault   Julien Jaeger  
Proceedings of the 23rd European MPI Users’ Group Meeting, EuroMPI 2016, Edinburgh, United Kingdom, September 25-28, 2016, ACM, p. 51-63, 2016

Fine-grain data management directory for OpenMP 4.0 and OpenACC
Julien Jaeger   Patrick Carribault   Marc Pérache  
Concurr. Comput. Pract. Exp., p. 1528-1539, 2015

An MPI Halo-Cell Implementation for Zero-Copy Abstraction
Jean-Baptiste Besnard   Allen D. Malony   Sameer Shende   Marc Pérache   Patrick Carribault   Julien Jaeger  
Proceedings of the 22nd European MPI Users’ Group Meeting, EuroMPI 2015, Bordeaux, France, September 21-23, 2015, ACM, p. 3:1-3:9, 2015

Correctness Analysis of MPI-3 Non-Blocking Communications in PARCOACH
Julien Jaeger   Emmanuelle Saillard   Patrick Carribault   Denis Barthou  
Proceedings of the 22nd European MPI Users' Group Meeting, EuroMPI 2015, Bordeaux, France, September 21-23, 2015, ACM, p. 16:1-16:2, 2015

Data-Management Directory for OpenMP 4.0 and OpenACC
Julien Jaeger   Patrick Carribault   Marc Pérache  
Euro-Par 2013: Parallel Processing Workshops - BigDataCloud, DIHC, FedICI, HeteroPar, HiBB, LSDVE, MHPC, OMHI, PADABS, PROPER, Resilience, ROME, and UCHPC 2013, Aachen, Germany, August 26-27, 2013. Revised Selected Papers, Springer, p. 168-177, 2013

Binary Instrumentation for Scalable Performance Measurement of OpenMP Applications
Julien Jaeger   Peter Philippen   Eric Petit   Andres Charif Rubial   Christian Rössel   William Jalby   Bernd Mohr  
Parallel Computing: Accelerating Computational Science and Engineering (CSE), Proceedings of the International Conference on Parallel Computing, ParCo 2013, 10-13 September 2013, Garching (near Munich), Germany, IOS Press, p. 783-792, 2013

Transformations source-à-source pour l'optimisation de codes irréguliers et multithreads. (Source-to-source transformations for irregular and multithreaded code optimization)
Julien Jaeger  
Versailles Saint-Quentin-en-Yvelines University, France, 2012

Automatic efficient data layout for multithreaded stencil codes on CPU sand GPUs
Julien Jaeger   Denis Barthou  
19th International Conference on High Performance Computing, HiPC 2012, Pune, India, December 18-22, 2012, IEEE Computer Society, p. 1-10, 2012