KaMPIng 0.1.1
Flexible and (near) zero-overhead C++ bindings for MPI
Loading...
Searching...
No Matches
kamping::plugin::UserLevelFailureMitigation< Comm, DefaultContainerType > Class Template Reference

A plugin implementing a wrapper around the User-Level Failure-Mitigation (ULFM) feature of the upcoming MPI 4 standard. This plugin and the accompanying example is tested with OpenMPI 5.0.2. More...

#include <ulfm.hpp>

Inheritance diagram for kamping::plugin::UserLevelFailureMitigation< Comm, DefaultContainerType >:
Collaboration diagram for kamping::plugin::UserLevelFailureMitigation< Comm, DefaultContainerType >:

Public Member Functions

 UserLevelFailureMitigation ()
 Default constructor; sets the error handler of MPI_COMM_WORLD (!) to MPI_ERRORS_RETURN. Although the standard allows setting the error handler for only a specific communicator; neither MPICH nor OpenMPI currently (March 2024) support this.
 
void revoke ()
 Revokes the current communicator.
 
uint32_t ack_failed (uint32_t const num_to_ack)
 Acknowledges that the application intends to ignore the effect of currently known failures on wildcard receive completions and agreement return values.
 
uint32_t num_ack_failed ()
 Gets the number of acknowledged failures.
 
uint32_t ack_all_failed ()
 Acknowledge all failures.
 
Comm shrink ()
 Creates a new communicator from this communicator, excluding the failed processes.
 
int agree (int flag)
 Agrees on a flag from all live processes and distributes the result back to all live processes, even after process failures.
 
bool agree (bool flag)
 Agrees on a boolean flag from all live processes and distributes the result back to all live processes, even after process failures.
 
Group get_failed ()
 Obtains the group of currently failed processes.
 
bool is_revoked ()
 Checks if this communicator has been revoked.
 
void mpi_error_handler (int const ret, std::string const &callee) const
 Overwrite the on-MPI-error handler to throw appropriate exceptions for then hardware faults happened.
 

Detailed Description

template<typename Comm, template< typename... > typename DefaultContainerType>
class kamping::plugin::UserLevelFailureMitigation< Comm, DefaultContainerType >

A plugin implementing a wrapper around the User-Level Failure-Mitigation (ULFM) feature of the upcoming MPI 4 standard. This plugin and the accompanying example is tested with OpenMPI 5.0.2.

Member Function Documentation

◆ ack_all_failed()

template<typename Comm , template< typename... > typename DefaultContainerType>
uint32_t kamping::plugin::UserLevelFailureMitigation< Comm, DefaultContainerType >::ack_all_failed ( )
inline

Acknowledge all failures.

Returns
The overall number of failures acknowledged.

◆ ack_failed()

template<typename Comm , template< typename... > typename DefaultContainerType>
uint32_t kamping::plugin::UserLevelFailureMitigation< Comm, DefaultContainerType >::ack_failed ( uint32_t const num_to_ack)
inline

Acknowledges that the application intends to ignore the effect of currently known failures on wildcard receive completions and agreement return values.

Parameters
num_to_ackThe number of failures to acknowledge.
Returns
The overall number of failures acknowledged.

◆ agree() [1/2]

template<typename Comm , template< typename... > typename DefaultContainerType>
bool kamping::plugin::UserLevelFailureMitigation< Comm, DefaultContainerType >::agree ( bool flag)
inline

Agrees on a boolean flag from all live processes and distributes the result back to all live processes, even after process failures.

Parameters
flagThe flag to agree on.
Returns
The bitwise AND over the contributed input values of flag.

◆ agree() [2/2]

template<typename Comm , template< typename... > typename DefaultContainerType>
int kamping::plugin::UserLevelFailureMitigation< Comm, DefaultContainerType >::agree ( int flag)
inline

Agrees on a flag from all live processes and distributes the result back to all live processes, even after process failures.

Parameters
flagThe flag to agree on.
Returns
The bitwise AND over the contributed input values of flag.

◆ get_failed()

template<typename Comm , template< typename... > typename DefaultContainerType>
Group kamping::plugin::UserLevelFailureMitigation< Comm, DefaultContainerType >::get_failed ( )
inline

Obtains the group of currently failed processes.

Returns
The group of currently failed processes.

◆ is_revoked()

template<typename Comm , template< typename... > typename DefaultContainerType>
bool kamping::plugin::UserLevelFailureMitigation< Comm, DefaultContainerType >::is_revoked ( )
inline

Checks if this communicator has been revoked.

Returns
True if the communicator has been revoked, false otherwise.

◆ num_ack_failed()

template<typename Comm , template< typename... > typename DefaultContainerType>
uint32_t kamping::plugin::UserLevelFailureMitigation< Comm, DefaultContainerType >::num_ack_failed ( )
inline

Gets the number of acknowledged failures.

Returns
The number of acknowledged failures.

◆ shrink()

template<typename Comm , template< typename... > typename DefaultContainerType>
Comm kamping::plugin::UserLevelFailureMitigation< Comm, DefaultContainerType >::shrink ( )
inline

Creates a new communicator from this communicator, excluding the failed processes.

Returns
The new communicator.

The documentation for this class was generated from the following file: