Directeur de recherche, titulaire d’un doctorat en Informatique et d’une habilitation à diriger des recherches (HDR), Laurent Colombet travaille au sein d’une équipe de R&D du CEA dont l’objectif est de développer de nouveaux modèles et de nouvelles techniques de parallélisation pour les codes de simulation numérique HPC.
Laurent Colombet a (co-)encadré une vingtaine d’étudiants, dont quatre thèses et un post-doctorat.
Quelques thèmes de recherche
- Modèle de parallélisation par graphe de tâches, ordonnancement et placement de tâches sur un nœud multi-cœurs avec accélérateurs GPU.
- Système d’analyse in situ.
- Architecture de codes de physique HPC pour les calculateurs exaflopiques.
- Parallélisation et optimisation de méthodes d’intelligence artificielle (IA) en dynamique moléculaire pour nœud multi-cœurs avec accélérateurs GPU.
Un point important pour moi est de proposer des techniques et des modèles HPC très « appliqués », c’est-à-dire directement implantables dans des codes utilisés en production par des Physiciens.
Thèses co-encadrées en cours
- Estezr EL KHOURY, Exploration des modèles de programmation asynchrones basés sur le C++ moderne pour le portage GPU des applications scientifiques, Université Paris-Saclay
Thèses co-encadrées
- E. CIEREN, Dynamique moléculaire pour les machines Exascale, Thèse de doctorat CEA/Univ. Bordeaux, 2015.
- J.-C. PAPIN, Modèle d’ordonnancement et de partitionnement pour applications à maillages et calculs réguliers dans le cadre d’accélérateurs de type ManyCore, Thèse de doctorat CEA/ENS-Cachan, Paris Saclay, 2016.
- E. DIRAND, Développement d’un système in situ à base de tâches pour un code de dynamique moléculaire classique adapté aux machines exaflopiques, Thèse de doctorat CEA/Univ. Joseph-Fourier, Grenoble, 2018.
- R. PRAT, Equilibrage dynamique de charge sur supercalculateur exaflopique appliqué à la dynamique moléculaire, Thèse de doctorat CEA/Univ. Bordeaux, 2019.
Post-doctorats co-encadrés
- A. GIARD, Implantation du traitement des polymères dans le code de dynamique moléculaire ExaStamp, CEA, 2017.
INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, SPRINGER/PLENUM PUBLISHERS, p. 81-103, 2021
abstract
Abstract
Many applications of physics modeling use regular meshes on which computations of highly variable cost over time can occur. Distributing the underlying cells over manycore architectures is a critical load balancing step that should be performed the less frequently possible. Graph partitioning tools are known to be very effective for such problems, but they exhibit scalability problems as the number of cores and the number of cells increase. We introduce a dynamic task scheduling and mesh partitioning approach inspired by physical particle interactions. Our method virtually moves cores over a 2D/3D mesh of tasks and uses a Voronoi domain decomposition to balance workload. Displacements of cores are the result of force computations using a carefully chosen pair potential. We evaluate our method against graph partitioning tools and existing task schedulers with a representative physical application, and demonstrate the relevance of our approach.
COMPUTER PHYSICS COMMUNICATIONS, ELSEVIER, 2020
abstract
Abstract
Accurate simulations of metal under heavy shocks, leading to fragmentation and ejection of particles, cannot be achieved by simply hydrodynamic models and require to be performed at atomic scale using molecular dynamics methods. In order to cope with billions of particles exposed to short range interactions, such molecular dynamics methods need to be highly optimized over massively parallel supercomputers. In this paper, we propose to leverage Adaptive Mesh Refinement techniques to improve efficiency of molecular dynamics code on highly heterogeneous particle configurations. We introduce a series of techniques that optimize the force computation loop using multi-threading and vectorization-friendly data structures. Our design is guided by the need for load balancing and adaptivity raised by highly dynamic particle sets. We analyze performance results on several simulation scenarios, such as the production of an ejecta cloud from shock-loaded metallic surfaces, using a large number of nodes equipped by Intel Xeon Phi Knights Landing processors. Performance obtained with our new Molecular Dynamics code achieves speedups greater than 1.38 against the state-of-the-art LAMMPS implementation. (C) 2020 Published by Elsevier B.V.
JOURNAL OF APPLIED PHYSICS, AMER INST PHYSICS, 2020
abstract
Abstract
We perform very large scale molecular dynamics (MD) simulations to investigate the ejection process from shock-loaded tin surfaces in regimes where the metal first undergoes solid to solid phase transitions and then melts on release. In these conditions, a classical two-wave structure propagates within the metal. When it interacts with the surface, our MD simulations reveal very different behaviors. If the surface geometry is perfectly flat or contains almost flat perturbations (sinusoidal type), a solid cap made of crystallites forms at the free surface, over a thickness of a few tens of nanometers. This surface cap melts more slowly than the bulk, and as a result, the ejection process is greatly slowed down. If the surface geometry contains V-shape geometrical perturbations, the oblique interaction of the incident shock wave with the planar interface of the defect leads to a sharp increase of temperature at the defect's bottom. At this place, the metal undergoes a solid to liquid phase change over the entire length of the groove, and this promotes the ejection of matter in the form of sheets of liquid metal. However, this phase change is not spatially uniform, and the sheets keep in memory this process by exhibiting a non-uniform leading edge and large ripples. These ripples grow over time, which ends up causing the fragmentation of the sheets as they develop. In this case, the fragmentation is non-uniform, and it differs from the rather uniform fragmentation process observed when the metal directly melts upon receiving the shock.
JOURNAL OF APPLIED PHYSICS, AMER INST PHYSICS, 2017
abstract
Abstract
We compare, at similar scales, the processes of microjetting and ejecta production from shocked roughened metal surfaces by using atomistic and continuous approaches. The atomistic approach is based on very large scale molecular dynamics (MD) simulations with systems containing up to 700 x 10(6) atoms. The continuous approach is based on Eulerian hydrodynamics simulations with adaptive mesh refinement; the simulations take into account the effects of viscosity and surface tension, and the equation of state is calculated from the MD simulations. The microjetting is generated by shock-loading above its fusion point a three-dimensional tin crystal with an initial sinusoidal free surface perturbation, the crystal being set in contact with a vacuum. Several samples with homothetic wavelengths and amplitudes of defect are simulated in order to investigate the influence of viscosity and surface tension of the metal. The simulations show that the hydrodynamic code reproduces with very good agreement the profiles, calculated from the MD simulations, of the ejected mass and velocity along the jet. Both codes also exhibit a similar fragmentation phenomenology of the metallic liquid sheets ejected, although the fragmentation seed is different. We show in particular, that it depends on the mesh size in the continuous approach. Published by AIP Publishing.
JOURNAL OF APPLIED PHYSICS, AMER INST PHYSICS, 2015
abstract
Abstract
The propagation of an incident shock and subsequent rarefaction and compression waves in a porous media are analysed from a set of large scale molecular dynamics simulations. The porous material is modelized by a collection of spherical pores, empty or filled with dense gaseous argon, enclosed in a copper matrix. We observe that the pore collapse induces a strong local disorder in the matrix even for shock intensities below the melting point of shocked copper. Various mechanisms are considered and a detailed analysis of the numerical results shows that the melting around an isolated pore is mainly due to the plastic work induced by the collapse: a result that can be extended to more complicated pore shapes. The systematic study of the influence of the shock intensity, the pore size, and the presence of a filling gas shows that the melting is mainly inhibited by the presence of the gas. The final structure strongly depends on the interactions between the waves resulting from the various reflections of the initial shock at the sample boundaries, implying that the evaluation of the incident shock intensity based on post-mortem analyses requires a knowledge of the full history of the sample. (C) 2015 AIP Publishing LLC.
JOURNAL OF APPLIED PHYSICS, AMER INST PHYSICS, 2015
abstract
Abstract
We present a series of molecular dynamics simulations of the shock compression of copper matrices containing a single graphite inclusion: these model systems can be related to some specific carbon-rich rocks which, after a meteoritic impact, are found to contain small fractions of nanodiamonds embedded in graphite in the vicinity of high impedance minerals. We show that the graphite to diamond transformation occurs readily for nanometer-sized graphite inclusions, via a shock accumulation process, provided the pressure threshold of the bulk graphite/diamond transition is overcome, independently of the shape or size of the inclusion. Although high diamond yields (similar to 80%) are found after a few picoseconds in all cases, the transition is non-isotropic and depends substantially on the relative orientation of the graphite stack with respect to the shock propagation, leading to distinct nucleation processes and size-distributions of the diamond grains. A substantial regraphitization process occurs upon release and only inclusions with favorable orientations likely lead to the preservation of a fraction of this diamond phase. These results agree qualitatively well with the recent experimental observations of meteoritic impact samples. (C) 2015 AIP Publishing LLC.
Theoretical Computer Science, p. 31-44, 1998
abstract
Abstract
Program environments are now commonly used for parallelism on networks of workstations. There is a need for simple and consistent tools to measure algorithm performance on heterogeneous networks. In this work we propose a generalization to heterogeneous networks of the classical efficiency formula E(N) = S(N)N, where S(N) is the speedup on N processors.
Parallel Computing, p. 165-180, 1997
abstract
Abstract
We present in this paper the strong points and limitations of semi-automatic parallelization, data parallel programming and message passing programming. We apply these on two numerical algorithms namely a bi-dimensional Fourier transform algorithm and a conjugate gradient programs. We implemented this program for each of the different methods on a Cray T3D. The results of these experiments demonstrate the accuracy of our proposition that when the three methods are combined, efficiency, portability and easiness of parallel programming may be achieved.
Euro-Par'96 Parallel Processing, Springer Berlin Heidelberg, p. 651-664, 1996
abstract
Abstract
Program environments are now commonly used for parallelism on networks of workstations. That is the reason why there is a need for simple and consistent tools to measure algorithm performance on heterogeneous networks. In this work we propose a generalization to heterogeneous networks of the classical efficiency formula E(N)=S(N)/N, where S(N) is the speedup on N processors.
Parallel Computing, p. 289-310, 1996
abstract
Abstract
We propose in this paper a new parallel algorithm for computing the matrix-vector product on a ring of p processors. This solution allows to overlap as much communications as possible. Some simulations and experiments on a Paragon are given in order to confirm the interest in this algorithm.
Parallel Computing, p. 1413-1427, 1996
abstract
Abstract
We present in this paper the results of various communication benchmarks on a Cray T3D MPP system. They are composed of most-used communication schemes in parallel applications and numerical kernels. They have been implemented using PVM message-passing libraries on the Cray T3D system. For each of these benchmarks, we propose a model depending on the size of the message communicated and the number of processors involved. We verify that the error between the proposed model and the measures is very small (0.8% in average for point-to-point communications and 3% in average for collective communications).
High-Performance Computing and Networking, Springer Berlin Heidelberg, p. 600-605, 1995
abstract
Abstract
We present in this paper general techniques for overlapping communications in parallel numerical kernels. We describe first some dependencies schemes which can be found in most of numerical parallel algorithms and we apply on these schemes methods based on the change of the granularity of the computational tasks. The choice of the granularity in order to obtain a good overlap depends on the main parameters of the target machines. We apply the precedent techniques of overlapping on classical numerical kernels, namely the matrix-vector product and the bi-dimensional FFT, and implemented them on a T3D and a Paragon. The results of these experiments demonstrate the accuracy of this approach.
Parallel Processing: CONPAR 94 --- VAPP VI, Springer Berlin Heidelberg, p. 605-615, 1994
abstract
Abstract
This paper presents an overlapping technique of communications by computations based on pipelined communications. This allows to improve the execution time of most parallel numerical algorithms. Some simple examples are developed to illustrate the efficiency of this technique matrix-vector product and bi-dimensional Fast Fourier Transform. Moreover, we propose an unified formalism to express easily the pipelined versions of these algorithms. Finally, we report some experiments on various parallel machines.