Faculté des sciences

## SPLAY : a toolkit for the design and evaluation of large scale distributed systems

### Thèse de doctorat : Université de Neuchâtel, 2014.

This thesis presents SPLAY, an integrated system that facilitates the design, deployment and testing oflarge-scale distributed applications. SPLAY covers all aspects of the development and evaluation chain. It allows developers to express algorithms in a concise, simple language that highly resembles pseudo-code found in research papers. The execution environment has low overheads and... More

# Export as

Résumé
Cette thèse présente SPLAY, un système intégré qui facilite la conception, le déploiement et les expérimentations des systèmes distribués à grande échelle. SPLAY couvre toutes les étapes du développement à l'évaluation.
Il permet à des développeurs d'exprimer des algorithmes de manière simple et concise dans un langage proche du pseudo-code que l'on peut trouver dans les publications scientifiques. L'environnement d'exécution est léger et fournit un ensemble de librairies répondant aux principaux besoins pour la conception de systèmes distribués.
Les applications SPLAY sont exécutées par un ensemble de processus distribués sur un ou plusieurs systèmes de test. Ils exécutent ensuite l'application au sein d'un environnement confiné, ce qui permet d'utiliser SPLAY sans risques même sur des plates-formes non dédiées en plus des environnements classiques tels que PlanetLab ou ModelNet.
Nous illustrons l'intérêt de SPLAY pour la recherche sur les systèmes distribués à l'aide de deux exemples représentatifs.
Tout d'abord, nous décrivons la conception et l'évaluation de PULP, un protocole de dissémination efficace qui combine le meilleur des approches pousser'' et tirer''. PULP exploite l'efficacité de l'approche pousser'' tout en en limitant la redondance par l'usage de l'approche tirer'' dont la fréquence est conditionnée par des informations complémentaires jointes aux paquets de données.
Finalement, nous présentons la conception et l'évaluation d'un système d'aide à la recherche, SYS, qui collecte les recherche des utilisateurs et les accès effectués afin de construire un profil d'utilisateur et de documents. Au fil du temps, le système crée des collections triées de documents qui permettent d'améliorer la qualité des recherches en fournissant des résultats complémentaires correspondant aux domaines d'intérêt de l'utilisateur.
Summary
This thesis presents SPLAY, an integrated system that facilitates the design, deployment and testing oflarge-scale distributed applications.
SPLAY covers all aspects of the development and evaluation chain.
It allows developers to express algorithms in a concise, simple language that highly resembles pseudo-code found in research papers. The execution environment has low overheads and footprint, and provides a comprehensive set of libraries for common distributed systems operations.
SPLAY applications are run by a set of daemons distributed on one or several testbeds. They execute in a sandboxed environment that shields the host system and enables SPLAY to also be used on non-dedicated platforms, in addition to classical testbeds like PlanetLab or ModelNet.
A controller manages applications, offering multi-criteria resources selection, deployment control, and churn management by reproducing the system's dynamics from traces or synthetic descriptions. SPLAY's features, usefulness, performance and scalability are evaluated using deployment of representative experiments on PlanetLab and ModelNet clusters.
We illustrate the interest of SPLAY for distributed systems research by covering two representative examples. First, we present the design and evaluation of PULP, an efficient generic push-pull dissemination protocol which combines the best of pull-based and push-based approaches. PULP exploits the efficiency of push approaches, while limiting redundant messages and therefore imposing a low overhead, as pull protocols do. PULP leverages the dissemination of multiple messages from diverse sources: by exploiting the push phase of messages to transmit information about other disseminations, PULP enables an efficient pulling of other messages, which themselves help in turn with the dissemination of pending messages.
Finally, we present the design and evaluation of a collaborative search companion system, SYS, that collects user search queries and accesses feedback to build user and document-centric profiling information. Over time, the system constructs ranked collections of elements that maintain the required information diversity and enhance the user search experience by presenting additional results tailored to the user interest space. This collaborative search companion requires a supporting architecture adapted to large user populations generating high request loads. To that end, it integrates mechanisms for ensuring scalability and load balancing of the service under varying loads and user interest distributions.
Abstract of PULP:
Gossip-based protocols provide a simple, scalable, and robust way to disseminate messages in large-scale systems. In such protocols, messages are spread in an epidemic manner. Gossiping may take place between nodes using push, pull, or a combination. Push-based systems achieve reasonable latency and high resilience to failures but may impose an unnecessarily large redundancy and overhead on the system. At the other extreme, pull-based protocols impose a lower overhead on the network at the price of increased latencies. A few hybrid approaches have been proposed---typically pushing control messages and pulling data---to avoid the redundancy of high-volume content and single-source streams. Yet, to the best of our knowledge, no other system intermingles push and pull in a multiple-senders scenario, in such a way that data messages of one help in carrying control messages of the other and in adaptively adjusting its rate of operation, further reducing overall cost and improving both on delays and robustness.
In this paper, we propose an efficient generic push-pull dissemination protocol, PULP, which combines the best of both worlds. PULP exploits the efficiency of push approaches, while limiting redundant messages and therefore imposing a low overhead, as pull protocols do. PULP leverages the dissemination of multiple messages from diverse sources: by exploiting the push phase of messages to transmit information about other disseminations, PULP enables an efficient pulling of other messages, which themselves help in turn with the dissemination of pending messages. We deployed PULP on a cluster and on PlanetLab. Our results demonstrate that PULP achieves an appealing trade-off between coverage, message redundancy, and propagation delay.
Abstract of cofeed
Popular search engines essentially rely on information about the structure of the graph of linked elements to find the most relevant results for a given query. While this approach is satisfactory for popular interest domains or when the user expectations follow the main trend, it is very sensitive to the case of ambiguous queries, where queries can have answers over several different domains. Elements pertaining to an implicitly targeted interest domain with low popularity are usually ranked lower than expected by the user. This is a consequence of the poor usage of user-centric information in search engines. Leveraging semantic information can help avoid such situations by proposing complementary results that are carefully tailored to match user interests.
This paper proposes a collaborative search companion system, SYS, that collects user search queries and accesses feedback to build user- and document-centric profiling information. Over time, the system constructs ranked collections of elements that maintain the required information diversity and enhance the user search experience by presenting additional results tailored to the user interest space. This collaborative search companion requires a supporting architecture adapted to large user populations generating high request loads. To that end, it integrates mechanisms for ensuring scalability and load balancing of the service under varying loads and user interest distributions. Experiments with a deployed prototype highlight the efficiency of the system by analyzing improvement in search relevance, computational cost, scalability and load balance.