In this tutorial we demonstrate how to perform load balancing to (re-)distribute elements between processors in parallel, distributed computations. Following a brief discussion of the underlying methodology, we illustrate the application in the adaptive driven cavity problem where the spatially non-uniform refinement of the mesh leads to a significant load imbalance.
Most of the driver code is identical to the codes discussed in the tutorials explaining the serial and distributed, parallel solution of the problem. Therefore we only discuss the changes required to perform load balancing on the problem.
The initial distribution of a problem via a call to
attempts to distribute the elements in the Problem's mesh over the available processors such that (i) each processor stores approximately the same number of elements and (ii) the anticipated volume of inter-processor communication required to synchronise the solution across processor boundaries is minimised. Typically, this procedure works very well in the sense that the assembly times for the Jacobian matrix in a distributed computation scale extremely well with the number of processors.
A re-distribution of elements may be required because
As with other methods within
oomph-lib, load balancing is implemented such that only minimal user intervention is required. The function
may be called at any point following the distribution of the problem. The only change required to an existing driver code is the provision (via overloading of a broken virtual function in the Problem base class) of the function
This function must
Typically this requires no more than a straightforward cut-and-paste of code from the problem constructor into the
Problem::build_mesh() function; see also the discussion What goes into the build_mesh() function? for more details.
The load balancing routines then perform the following steps:
Datavalues from the old to the new (global) meshes.
We note that for "structured" meshes (i.e. meshes whose refinement pattern is represented by "tree forests") only complete trees can be moved between processors. If the mesh was refined uniformly after being distributed, a more fine-grained tree-forest (which may allow better load balancing) can be built by calling
Problem::prune_halo_elements_and_nodes() as discussed in another tutorial.
In this section we outline the required changes to the parallel version of the adaptive driven cavity problem so that the problem can use the load balancing method described above.
The figure below demonstrates the advantages of load balancing in that problem: The left hand panel shows the distribution of the mesh across four processors (indicated by the colours) after three spatially adaptive solves. Note how the singularities in the bottom corners result in a strongly non-uniform spatial refinement which leads to a significant load imbalance because the "pink" and "grey" processors contain far more elements than then "green" and "cyan" ones. The right hand panel shows the distribution of the mesh across four processors when load balancing is performed in between the second and third mesh adaptation.
As discussed above, the function
Problem::build_mesh() is created most easily by moving the code that (i) creates the mesh, (ii) applies the relevant boundary conditions, and (iii) completes the build of all the elements in the problem from the problem constructor.
The problem constructor becomes very short since the bulk of the code has been moved into the
The driver code demonstrates some of the different options available for the
Problem::load_balance(...)enables the output of extended statistics.
–validatecommand line flag, we therefore call the
Problem::distribute(...)function with a pre-determined (and deterministic but non-optimal) distribution of the elements. The use of METIS during the load balancing operations can be bypassed by calling
The main rule regarding what should (and should not) be moved from the problem constructor into the
build_mesh() function is that following the return from the
build_mesh() function, all meshes should be re-generated (and refined to the same degree as when
Problem::distribute() was called), their constituent elements made fully functional, and all boundary conditions applied.
It is not necessary (and would, in fact, be undesirable) to re-create
GeomObjects that are used to define curvilinear mesh boundaries. We recommend generating such objects once (in the Problem constructor) and making them available to the
build_mesh() function by storing pointers to them in the problem's private member data.
It is not necessary to re-generate the error estimator, though the pointer to it (and any non-default target errors) need to be passed to all (newly re-generated) adaptive meshes.
Since load balancing re-generates the meshes – and thus their constituent elements and nodes – pointers to such objects must be re-assigned on return from
Problem::load_balance(). In our experience such pointers tend to be used predominantly in
Failure to re-assign any dangling pointers will cause segmentation faults – recompile
oomph-lib with debugging enabled and use ddd to see where the code crashes.
Problem::build_mesh()function (e.g. by renaming it
my_build_mesh(), say, so that it no longer overloads the function the
Problem::build_mesh()shown above (accidentally) illustrates a common problem with a mere cut-and-paste approach – it creates a memory leak! Where is it and how would you fix it?
demo_driversdirectory contains a few additional driver codes that employ load balancing. These codes exist mainly for self-test purposes and do not have separate tutorials. It may be instructive to compare the different versions of these codes to further clarify the modifications required to enable load balancing. We suggest you compare
A pdf version of this document is available.