StarPU Handbook - StarPU Extensions
Loading...
Searching...
No Matches
MPI Fault Tolerance Support

Functions

int starpu_mpi_checkpoint_init (void)
 
int starpu_mpi_checkpoint_shutdown (void)
 
int starpu_mpi_checkpoint_template_register (starpu_mpi_checkpoint_template_t *cp_template, int cp_id, int cp_domain,...)
 
int starpu_mpi_checkpoint_template_create (starpu_mpi_checkpoint_template_t *cp_template, int cp_id, int cp_domain)
 
int starpu_mpi_checkpoint_template_add_entry (starpu_mpi_checkpoint_template_t *cp_template,...)
 
int starpu_mpi_checkpoint_template_freeze (starpu_mpi_checkpoint_template_t *cp_template)
 
int starpu_mpi_checkpoint_template_submit (starpu_mpi_checkpoint_template_t cp_template, int prio)
 
int starpu_mpi_checkpoint_template_print (starpu_mpi_checkpoint_template_t cp_template)
 

Detailed Description

Function Documentation

◆ starpu_mpi_checkpoint_init()

int starpu_mpi_checkpoint_init ( void  )

Initialise the checkpoint mechanism

◆ starpu_mpi_checkpoint_shutdown()

int starpu_mpi_checkpoint_shutdown ( void  )

Shutdown the checkpoint mechanism

◆ starpu_mpi_checkpoint_template_register()

int starpu_mpi_checkpoint_template_register ( starpu_mpi_checkpoint_template_t *  cp_template,
int  cp_id,
int  cp_domain,
  ... 
)

Wrapped function to register a checkpoint template cp_template with the given arguments. It is then ready to use with starpu_mpi_checkpoint_template_submit() during the program execution. This command executes starpu_mpi_checkpoint_template_create(), adds the given checkpoint entry and freezes the checkpoint, and therefore can no longer be modified. A unique checkpoint id cp_id is requested from the user in order to create several templates and to match with a corresponding ::starpu_mpi_init_from_checkpoint() (not implemented yet).

The arguments following the cp_template and the cp_id can be of the following types:

  • STARPU_R followed by a data handle and the backup rank;
  • STARPU_DATA_ARRAY followed by an array of data handles, its number of elements and a backup rank (non functional);
  • STARPU_VALUE followed by a pointer to the unregistered value, its size in bytes, a unique tag (as the ones given for data handle registering) and the function giving the back up rank of the rank argument : int(backup_of)(int) .
  • The argument list must be ended by the value 0.

◆ starpu_mpi_checkpoint_template_create()

int starpu_mpi_checkpoint_template_create ( starpu_mpi_checkpoint_template_t *  cp_template,
int  cp_id,
int  cp_domain 
)

Create a new checkpoint template. A unique checkpoint id cp_id is requested from the user in order to create several templates and to match with a corresponding ::starpu_mpi_init_from_checkpoint() (not implemented yet). Note a template must be frozen with starpu_mpi_checkpoint_template_freeze() in order to use it with starpu_mpi_checkpoint_template_submit().

◆ starpu_mpi_checkpoint_template_add_entry()

int starpu_mpi_checkpoint_template_add_entry ( starpu_mpi_checkpoint_template_t *  cp_template,
  ... 
)

Add a single entry to a checkpoint template previously created with starpu_mpi_checkpoint_template_create(). As many entries can be added to a template with as many argument to a single function call, or with as many calls to this function. Once all the entry added, the template must be frozen before using starpu_mpi_checkpoint_template_submit().

The arguments following the cp_template can be of the following types:

  • STARPU_R followed by a data handle and the backup rank;
  • (non functional) STARPU_DATA_ARRAY followed by an array of data handles, its number of elements and a backup rank (non functional);
  • STARPU_VALUE followed by a pointer to the unregistered value, its size in bytes, a unique tag (as the ones given for data handle registering) and the function giving the back up rank of the rank argument : int(backup_of)(int) .
  • The argument list must be ended by the value 0.

◆ starpu_mpi_checkpoint_template_freeze()

int starpu_mpi_checkpoint_template_freeze ( starpu_mpi_checkpoint_template_t *  cp_template)

Freeze the given template. A frozen template can no longer be modified with starpu_mpi_checkpoint_template_add_entry(). A template must be frozen before using starpu_mpi_checkpoint_template_submit().

◆ starpu_mpi_checkpoint_template_submit()

int starpu_mpi_checkpoint_template_submit ( starpu_mpi_checkpoint_template_t  cp_template,
int  prio 
)

Submit the checkpoint to StarPU, and can be seen as a cut in the task graph. StarPU will save the data as currently described in the submission. Note that the data external to StarPu (STARPU_VALUE) will be saved with the current value at submission time (when starpu_mpi_checkpoint_template_submit() is called). The data internal to StarPU (aka handles given with STARPU_R) will be saved with their value at execution time (when the task submitted before the starpu_mpi_checkpoint_template_submit() have been executed, and before this data is modified by the tasks submitted after the starpu_mpi_checkpoint_template_submit())