A plugin implementing a wrapper around the User-Level Failure-Mitigation (ULFM) feature of the upcoming MPI 4 standard. This plugin and the accompanying example is tested with OpenMPI 5.0.2.
More...
|
|
| UserLevelFailureMitigation () |
| | Default constructor; sets the error handler of MPI_COMM_WORLD (!) to MPI_ERRORS_RETURN. Although the standard allows setting the error handler for only a specific communicator; neither MPICH nor OpenMPI currently (March 2024) support this.
|
| |
|
void | revoke () |
| | Revokes the current communicator.
|
| |
| uint32_t | ack_failed (uint32_t const num_to_ack) |
| | Acknowledges that the application intends to ignore the effect of currently known failures on wildcard receive completions and agreement return values.
|
| |
| uint32_t | num_ack_failed () |
| | Gets the number of acknowledged failures.
|
| |
| uint32_t | ack_all_failed () |
| | Acknowledge all failures.
|
| |
| Comm | shrink () |
| | Creates a new communicator from this communicator, excluding the failed processes.
|
| |
| int | agree (int flag) |
| | Agrees on a flag from all live processes and distributes the result back to all live processes, even after process failures.
|
| |
| bool | agree (bool flag) |
| | Agrees on a boolean flag from all live processes and distributes the result back to all live processes, even after process failures.
|
| |
| Group | get_failed () |
| | Obtains the group of currently failed processes.
|
| |
| bool | is_revoked () |
| | Checks if this communicator has been revoked.
|
| |
|
void | mpi_error_handler (int const ret, std::string const &callee) const |
| | Overwrite the on-MPI-error handler to throw appropriate exceptions for then hardware faults happened.
|
| |
template<
typename Comm,
template< typename... >
typename DefaultContainerType>
class kamping::plugin::UserLevelFailureMitigation< Comm, DefaultContainerType >
A plugin implementing a wrapper around the User-Level Failure-Mitigation (ULFM) feature of the upcoming MPI 4 standard. This plugin and the accompanying example is tested with OpenMPI 5.0.2.