Faculté informatique et communications IC, Section d'informatique, Institut d'informatique fondamentale IIF (Laboratoire de systèmes répartis LSR)

Atomic broadcast : a fault-tolerant token based algorithm and performance evaluations

Ekwall, Nils Richard ; Schiper, André (Dir.)

Thèse sciences Ecole polytechnique fédérale de Lausanne EPFL : 2007 ; no 3811.

Ajouter à la liste personnelle
    Within only a couple of generations, the so-called digital revolution has taken the world by storm: today, almost all human beings interact, directly or indirectly, at some point in their life, with a computer system. Computers are present on our desks, computer systems control the antilock braking system and the stability control in cars, they collect usage statistics in elevators in order to anticipate maintenance and repair operations. Computer systems also operate critical systems, such as nuclear power plants, airplane control systems or space rockets. Furthermore, computer systems are not only omnipresent, but also increasingly networked. As the use of computer systems has increased dramatically over the past decades, the needs and expectations associated with these systems have also increased. In particular, one of the critical points of a system is its availability (the fraction of the time during which the system provides a service to the users): the costs and negative publicity of a system outage (be it a commercial web site or a stock exchange for example) are often considerable. Fault tolerance is one of the approaches to designing a highly-available system: a fault tolerant system is designed in such a way that the failure of one of the components of the system does not compromise the functionality of the system as a whole. Replication is one of the common fault tolerance techniques. Instead of having a single machine (a replica) providing a service, the system is composed of several replicas running the service and connected through a network. If one of the replicas fails, the service is still provided by the remaining replicas. The replication technique is interesting as it can be achieved by using software running on commodity hardware, thus avoiding the high cost of special purpose hardware. Replication, although intuitive to understand, is complex to implement in practice, as the replicas have to interact in order to ensure the consistency of the system as a whole. Group communication simplifies the replication problem, by hiding issues such as the communication between the replicas, the crashes of one or several replicas and the synchronization of the replicas. In this thesis, we start by comparing two replication techniques – group communication and quorum systems – and identifying in which case either technique should be used. Atomic broadcast (a group communication primitive at the heart of this work) allows replicas to broadcast messages to each other and then deliver them in the same total order, even if replicas broadcast messages quasi simultaneously. Atomic broadcast is especially useful for replication: since all replicas deliver messages in the same order, their state is kept consistent. After the comparison between the replication techniques, we present an atomic broadcast algorithm designed to perform well when the system is heavily loaded and that allows to quickly detect crashed replicas (by minimizing the consequences of wrongly suspecting a non-crashed replica). The presentation of the algorithm includes simulation results comparing the performance of the new algorithm to previously proposed atomic broadcast algorithms. The second part of the thesis focuses on the experimental performance evaluation of the new algorithm in several settings. We start by comparing four atomic broadcast algorithms in a local area network. We then compare three of the four algorithms in a wide area network, with sites in Switzerland, Japan and France, and where the round trip time between the sites varies between 4 and 300 ms. Finally, we evaluate the impact of the size of the system (the number if replicas) on the performance of the algorithms.