Facoltà di scienze informatiche

Solving atomic multicast when groups crash

Schiper, Nicolas ; Pedone, Fernando

In this paper, we study the atomic multicast problem, a fundamental abstraction for building faulttolerant systems. In the atomic multicast problem, the system is divided into non-empty and disjoint groups of processes. Multicast messages may be addressed to any subset of groups, each message possibly being multicast to a different subset. Several papers previously studied this problem either in... Plus

Ajouter à la liste personnelle
    Summary
    In this paper, we study the atomic multicast problem, a fundamental abstraction for building faulttolerant systems. In the atomic multicast problem, the system is divided into non-empty and disjoint groups of processes. Multicast messages may be addressed to any subset of groups, each message possibly being multicast to a different subset. Several papers previously studied this problem either in local area networks [3, 9, 20] or wide area networks [13, 21]. However, none of them considered atomic multicast when groups may crash. We present two atomic multicast algorithms that tolerate the crash of groups. The first algorithm tolerates an arbitrary number of failures, is genuine (i.e., to deliver a message m, only addressees of m are involved in the protocol), and uses the perfect failures detector P. We show that among realistic failure detectors, i.e., those that do not predict the future, P is necessary to solve genuine atomic multicast if we do not bound the number of processes that may fail. Thus, P is the weakest realistic failure detector for solving genuine atomic multicast when an arbitrary number of processes may crash. Our second algorithm is non-genuine and less resilient to process failures than the first algorithm but has several advantages: (i) it requires perfect failure detection within groups only, and not across the system, (ii) as we show in the paper it can be modified to rely on unreliable failure detection at the cost of a weaker liveness guarantee, and (iii) it is fast, messages addressed to multiple groups may be delivered within two inter-group message delays only.