Post-Image

Laurent COLOMBET

Post-Image

Research director, PhD in Computer Science and with an authorization to direct research (HDR), Laurent Colombet works in an R&D team at CEA whose objective is to develop new models and new parallelization techniques for HPC digital simulation codes.

Laurent Colombet has (co-)supervised about twenty students, including four theses and one post-doctorate.

Some research topics:

  • Parallelization model by task graph, scheduling and placement of tasks on a multicore node with GPU.
  • In situ analysis system.
  • Architecture of HPC physics codes for Exaflop computers.
  • Parallelization and optimization of artificial intelligence (AI) methods in molecular dynamics for multicore node with GPU.

An important point for me is to propose very “applied” HPC techniques and models, i.e. directly implementable in codes used in production by Physicists.

Supervised theses in progress

  • Estezr EL KHOURY, Exploration des modèles de programmation asynchrones basés sur le C++ moderne pour le portage GPU des applications scientifiques, Université Paris-Saclay

Supervised theses

  • E. CIEREN, Molecular Dynamics for Exascale Supercomputers, PhD CEA/Univ. Bordeaux, 2015.
  • J.-C. PAPIN, A Scheduling and Partitioning Model for Stencil-based Applications on Many-Core Devices, PhD CEA/ENS-Cachan, Paris Saclay, 2016.
  • E. DIRAND, Integration of High-Performance Task-Based In Situ for Molecular Dynamics on Exascale Computers, PhD CEA/Univ. Joseph-Fourier, Grenoble, 2018.
  • R. PRAT, Dynamic load balancing on exaflop supercomputer applied to molecular dynamics, Phd CEA/Univ. Bordeaux, 2019.

Supervised postdoctoral researcher

  • A. GIARD, Implementation of polymer processing in the molecular dynamics code ExaStamp, CEA, 2017.
SPAWN: An Iterative, Potentials-Based, Dynamic Scheduling and Partitioning Tool
Jean-Charles Papin   Christophe Denoual   Laurent Colombet   Raymond Namyst  
INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, SPRINGER/PLENUM PUBLISHERS, p. 81-103, 2021

abstract

Abstract

Many applications of physics modeling use regular meshes on which computations of highly variable cost over time can occur. Distributing the underlying cells over manycore architectures is a critical load balancing step that should be performed the less frequently possible. Graph partitioning tools are known to be very effective for such problems, but they exhibit scalability problems as the number of cores and the number of cells increase. We introduce a dynamic task scheduling and mesh partitioning approach inspired by physical particle interactions. Our method virtually moves cores over a 2D/3D mesh of tasks and uses a Voronoi domain decomposition to balance workload. Displacements of cores are the result of force computations using a carefully chosen pair potential. We evaluate our method against graph partitioning tools and existing task schedulers with a representative physical application, and demonstrate the relevance of our approach.

AMR-based molecular dynamics for non-uniform, highly dynamic particle simulations
Raphael Prat   Thierry Carrard   Laurent Soulard   Olivier Durand   Raymond Namyst   Laurent Colombet  
COMPUTER PHYSICS COMMUNICATIONS, ELSEVIER, 2020

abstract

Abstract

Accurate simulations of metal under heavy shocks, leading to fragmentation and ejection of particles, cannot be achieved by simply hydrodynamic models and require to be performed at atomic scale using molecular dynamics methods. In order to cope with billions of particles exposed to short range interactions, such molecular dynamics methods need to be highly optimized over massively parallel supercomputers. In this paper, we propose to leverage Adaptive Mesh Refinement techniques to improve efficiency of molecular dynamics code on highly heterogeneous particle configurations. We introduce a series of techniques that optimize the force computation loop using multi-threading and vectorization-friendly data structures. Our design is guided by the need for load balancing and adaptivity raised by highly dynamic particle sets. We analyze performance results on several simulation scenarios, such as the production of an ejecta cloud from shock-loaded metallic surfaces, using a large number of nodes equipped by Intel Xeon Phi Knights Landing processors. Performance obtained with our new Molecular Dynamics code achieves speedups greater than 1.38 against the state-of-the-art LAMMPS implementation. (C) 2020 Published by Elsevier B.V.

Influence of the phase transitions of shock-loaded tin on microjetting and ejecta production using molecular dynamics simulations
O. Durand   L. Soulard   L. Colombet   R. Prat  
JOURNAL OF APPLIED PHYSICS, AMER INST PHYSICS, 2020

abstract

Abstract

We perform very large scale molecular dynamics (MD) simulations to investigate the ejection process from shock-loaded tin surfaces in regimes where the metal first undergoes solid to solid phase transitions and then melts on release. In these conditions, a classical two-wave structure propagates within the metal. When it interacts with the surface, our MD simulations reveal very different behaviors. If the surface geometry is perfectly flat or contains almost flat perturbations (sinusoidal type), a solid cap made of crystallites forms at the free surface, over a thickness of a few tens of nanometers. This surface cap melts more slowly than the bulk, and as a result, the ejection process is greatly slowed down. If the surface geometry contains V-shape geometrical perturbations, the oblique interaction of the incident shock wave with the planar interface of the defect leads to a sharp increase of temperature at the defect's bottom. At this place, the metal undergoes a solid to liquid phase change over the entire length of the groove, and this promotes the ejection of matter in the form of sheets of liquid metal. However, this phase change is not spatially uniform, and the sheets keep in memory this process by exhibiting a non-uniform leading edge and large ripples. These ripples grow over time, which ends up causing the fragmentation of the sheets as they develop. In this case, the fragmentation is non-uniform, and it differs from the rather uniform fragmentation process observed when the metal directly melts upon receiving the shock.

Comparative simulations of microjetting using atomistic and continuous approaches in the presence of viscosity and surface tension
O. Durand   S. Jaouen   L. Soulard   O. Heuze   L. Colombet   E. Cieren  
JOURNAL OF APPLIED PHYSICS, AMER INST PHYSICS, 2017

abstract

Abstract

We compare, at similar scales, the processes of microjetting and ejecta production from shocked roughened metal surfaces by using atomistic and continuous approaches. The atomistic approach is based on very large scale molecular dynamics (MD) simulations with systems containing up to 700 x 10(6) atoms. The continuous approach is based on Eulerian hydrodynamics simulations with adaptive mesh refinement; the simulations take into account the effects of viscosity and surface tension, and the equation of state is calculated from the MD simulations. The microjetting is generated by shock-loading above its fusion point a three-dimensional tin crystal with an initial sinusoidal free surface perturbation, the crystal being set in contact with a vacuum. Several samples with homothetic wavelengths and amplitudes of defect are simulated in order to investigate the influence of viscosity and surface tension of the metal. The simulations show that the hydrodynamic code reproduces with very good agreement the profiles, calculated from the MD simulations, of the ejected mass and velocity along the jet. Both codes also exhibit a similar fragmentation phenomenology of the metallic liquid sheets ejected, although the fragmentation seed is different. We show in particular, that it depends on the mesh size in the continuous approach. Published by AIP Publishing.

Molecular dynamics simulations of shock compressed heterogeneous materials. I. The porous case
L. Soulard   N. Pineau   J. Clerouin   L. Colombet  
JOURNAL OF APPLIED PHYSICS, AMER INST PHYSICS, 2015

abstract

Abstract

The propagation of an incident shock and subsequent rarefaction and compression waves in a porous media are analysed from a set of large scale molecular dynamics simulations. The porous material is modelized by a collection of spherical pores, empty or filled with dense gaseous argon, enclosed in a copper matrix. We observe that the pore collapse induces a strong local disorder in the matrix even for shock intensities below the melting point of shocked copper. Various mechanisms are considered and a detailed analysis of the numerical results shows that the melting around an isolated pore is mainly due to the plastic work induced by the collapse: a result that can be extended to more complicated pore shapes. The systematic study of the influence of the shock intensity, the pore size, and the presence of a filling gas shows that the melting is mainly inhibited by the presence of the gas. The final structure strongly depends on the interactions between the waves resulting from the various reflections of the initial shock at the sample boundaries, implying that the evaluation of the incident shock intensity based on post-mortem analyses requires a knowledge of the full history of the sample. (C) 2015 AIP Publishing LLC.

Molecular dynamics simulations of shock compressed heterogeneous materials. II. The graphite/diamond transition case for astrophysics applications
N. Pineau   L. Soulard   L. Colombet   T. Carrard   A. Pelle   Ph. Gillet   J. Clerouin  
JOURNAL OF APPLIED PHYSICS, AMER INST PHYSICS, 2015

abstract

Abstract

We present a series of molecular dynamics simulations of the shock compression of copper matrices containing a single graphite inclusion: these model systems can be related to some specific carbon-rich rocks which, after a meteoritic impact, are found to contain small fractions of nanodiamonds embedded in graphite in the vicinity of high impedance minerals. We show that the graphite to diamond transformation occurs readily for nanometer-sized graphite inclusions, via a shock accumulation process, provided the pressure threshold of the bulk graphite/diamond transition is overcome, independently of the shape or size of the inclusion. Although high diamond yields (similar to 80%) are found after a few picoseconds in all cases, the transition is non-isotropic and depends substantially on the relative orientation of the graphite stack with respect to the shock propagation, leading to distinct nucleation processes and size-distributions of the diamond grains. A substantial regraphitization process occurs upon release and only inclusions with favorable orientations likely lead to the preservation of a fraction of this diamond phase. These results agree qualitatively well with the recent experimental observations of meteoritic impact samples. (C) 2015 AIP Publishing LLC.

Speedup and efficiency of large-size applications on heterogeneous networks
L. Colombet   L. Desbat  
Theoretical Computer Science, p. 31-44, 1998

abstract

Abstract

Program environments are now commonly used for parallelism on networks of workstations. There is a need for simple and consistent tools to measure algorithm performance on heterogeneous networks. In this work we propose a generalization to heterogeneous networks of the classical efficiency formula E(N) = S(N)N, where S(N) is the speedup on N processors.

Which approach to parallelizing scientific codes --- That is the question
Jean-Yves Berthou   Laurent Colombet  
Parallel Computing, p. 165-180, 1997

abstract

Abstract

We present in this paper the strong points and limitations of semi-automatic parallelization, data parallel programming and message passing programming. We apply these on two numerical algorithms namely a bi-dimensional Fourier transform algorithm and a conjugate gradient programs. We implemented this program for each of the different methods on a Cray T3D. The results of these experiments demonstrate the accuracy of our proposition that when the three methods are combined, efficiency, portability and easiness of parallel programming may be achieved.

Methods to Overlap Communications in Parallel Numerical Algorithms
Christophe Calvin   Laurent Colombet   Philippe Michallon  
International Journal of Foundations of Computer Science, p. 211-235, 1997

Speedup and efficiency of large size applications on heterogeneous networks
L. Colombet   L. Desbat  
Euro-Par'96 Parallel Processing, Springer Berlin Heidelberg, p. 651-664, 1996

abstract

Abstract

Program environments are now commonly used for parallelism on networks of workstations. That is the reason why there is a need for simple and consistent tools to measure algorithm performance on heterogeneous networks. In this work we propose a generalization to heterogeneous networks of the classical efficiency formula E(N)=S(N)/N, where S(N) is the speedup on N processors.

Parallel matrix-vector product on rings with a minimum of communications
L. Colombet   Ph. Michallon   D. Trystram  
Parallel Computing, p. 289-310, 1996

abstract

Abstract

We propose in this paper a new parallel algorithm for computing the matrix-vector product on a ring of p processors. This solution allows to overlap as much communications as possible. Some simulations and experiments on a Paragon are given in order to confirm the interest in this algorithm.

Performance evaluation and modeling of collective communications on Cray T3D
C. Calvin   L. Colombet  
Parallel Computing, p. 1413-1427, 1996

abstract

Abstract

We present in this paper the results of various communication benchmarks on a Cray T3D MPP system. They are composed of most-used communication schemes in parallel applications and numerical kernels. They have been implemented using PVM message-passing libraries on the Cray T3D system. For each of these benchmarks, we propose a model depending on the size of the message communicated and the number of processors involved. We verify that the error between the proposed model and the measures is very small (0.8% in average for point-to-point communications and 3% in average for collective communications).

Overlapping techniques of communications
C. Calvin   L. Colombet   P. Michallon  
High-Performance Computing and Networking, Springer Berlin Heidelberg, p. 600-605, 1995

abstract

Abstract

We present in this paper general techniques for overlapping communications in parallel numerical kernels. We describe first some dependencies schemes which can be found in most of numerical parallel algorithms and we apply on these schemes methods based on the change of the granularity of the computational tasks. The choice of the granularity in order to obtain a good overlap depends on the main parameters of the target machines. We apply the precedent techniques of overlapping on classical numerical kernels, namely the matrix-vector product and the bi-dimensional FFT, and implemented them on a T3D and a Paragon. The results of these experiments demonstrate the accuracy of this approach.

Towards mixed computation/communication in parallel scientific libraries
C. Calvin   L. Colombet   F. Desprez   B. Jargot   P. Michallon   B. Tourancheau   D. Trystram  
Parallel Processing: CONPAR 94 --- VAPP VI, Springer Berlin Heidelberg, p. 605-615, 1994

abstract

Abstract

This paper presents an overlapping technique of communications by computations based on pipelined communications. This allows to improve the execution time of most parallel numerical algorithms. Some simple examples are developed to illustrate the efficiency of this technique matrix-vector product and bi-dimensional Fast Fourier Transform. Moreover, we propose an unified formalism to express easily the pipelined versions of these algorithms. Finally, we report some experiments on various parallel machines.

Star modeling on IBM RS6000 networks using PVM
L. Colombet   L. Desbat   F. Menard  
Proceedings The 2nd International Symposium on High Performance Distributed Computing, p. 121-128, 1993