Parallel programming with directives
Both, the hybrid organization of the memory (shared or / and distributed) and the heterogeneity of the computation units (CPU, GPU, ...) of the platforms for parallel execution make the expression of the parallelism dependent on the features of each architecture.
It leads to high costs of production of the parallel programs and of their maintenance as well.
To overcome this drawback, HP2 considers and experiments a parallel programming model common to various architectures.
Programming by manual insertion into a sequential program of directives considerably reduces the cost of development of the parallel code.
Such a programming model is operational for shared memory platforms, for a long time.
As this technique is within the reach of a non-expert of the parallelism, the designer of the algorithm can insert himself the annotations, and so easily optimizes the performances of the so produced parallel code.
Furthermore, the conservation of the structure of the legacy sequential code facilitates the maintenance of the parallel program.
To validate the efficiency of this programming model for various architectures, HP2 develops a tool: STEP (System of Transformation for the Parallel Execution), which is able, from an annotated source code, to automatically produce a parallel source program well suited to each type of architecture.
Adaptive scheduling for distributed resources
Some applications such as: the search for a given pattern in a text, the video encoding, the join in databases,..., can easily be processed in parallel by just cutting into chunks the datastream to be processed, then by allocating the chunks to several processing units (PU).
For this model of application, the processing of a chunk is supposed to be independent from that of any other one.
The model of platform to schedule the processing of the chunks is supposed to be a heterogeneous distributed memory platform.
HP2 considers a scheduler : AS4DR (Adaptive Scheduler for Distributed Resources), able to asymptotically maximize the throughput of the produced results, by adapting the scheduling to both the unawareness and the variations over time of the execution parameters (the computation speed of the PUs and the communication speed of the links between the PUs).
Performance analysis for parallel applications
The complexity of the architecture of HPC facilities (that combines multicore processors with GPUs using NUMA architectures), as well as the diversity of the programming models (MPI + OpenMP, MPI + CUDA, etc.) make the development of efficient parallel applications difficult.
Optimizing an application thus requires a huge analyzing effort.
In order to ease the analysis of the performances of parallel applications, HP2 develops an extensible framework, called EZTrace, to generate execution trace files of parallel applications.
These trace files can be analyzed in order to find possible bottlenecks in the execution.